Hi all, I work on a project that uses Arrow streaming format to transfer data between Java processes. We're also following the progress on Java support for Plasma, and may decide use Plasma also.
We typically uses a pattern like this to fill Arrow vectors from Java arrays: ---- int[] inputValues = ...; boolean[] nullInputValues = ...; org.apache.arrow.vector.IntVector vector = ...; for(int i = 0; i < inputValues.size; i++) { if(nullInputValues[i]) { vector.setNull(i); } else { vector.set(i, inputValues[i]); } } ---- Obviously the JIT won't be able to vectorize this loop. Does anyone know if there is another way to achieve this which would be vectorized? Here is a pseudo-code mockup of what I was thinking about, is this approach worth pursuing? The idea is to try to convert input into Arrow format in a vectorized loop, and then use sun.misc.Unsafe to copy the converted on-heap input to an off-heap valueBuffer. I'll ignore the details of the validityBuffer here, since it would follow along the same lines: ---- int[] inputValues = ...; org.apache.arrow.vector.IntVector vector = ...; for(int i = 0; i < inputValues.size; i++) { //convert inputValues[i] to little-endian //this conversion can be SIMD vectorized? } UNSAFE.copyMemory( inputValues, 0, null, vector.getDataBuffer().memoryAddress(), sizeof(Integer.class) * inputValues.size ); ---- Thanks for any feedback about details I may be misunderstanding, which would make this approach infeasible.