Hi all,
  I work on a project that uses Arrow streaming format to transfer data
between Java processes.
We're also following the progress on Java support for Plasma, and may
decide use Plasma also.

We typically uses a pattern like this to fill Arrow vectors from Java
arrays:
----
int[] inputValues = ...;
boolean[] nullInputValues = ...;

org.apache.arrow.vector.IntVector vector = ...;
for(int i = 0; i < inputValues.size; i++) {
  if(nullInputValues[i]) {
    vector.setNull(i);
  } else {
    vector.set(i, inputValues[i]);
  }
}
----

Obviously the JIT won't be able to vectorize this loop. Does anyone know if
there is another way to achieve this which
would be vectorized?

Here is a pseudo-code mockup of what I was thinking about, is this approach
worth pursuing?

The idea is to try to convert input into Arrow format in a vectorized loop,
and then use sun.misc.Unsafe to copy the
converted on-heap input to an off-heap valueBuffer.

I'll ignore the details of the validityBuffer here, since it would follow
along the same lines:

----
int[] inputValues = ...;
org.apache.arrow.vector.IntVector vector = ...;

for(int i = 0; i < inputValues.size; i++) {
  //convert inputValues[i] to little-endian
  //this conversion can be SIMD vectorized?
}
UNSAFE.copyMemory(
  inputValues,
  0,
  null,
  vector.getDataBuffer().memoryAddress(),
  sizeof(Integer.class) * inputValues.size
);
----

Thanks for any feedback about details I may be misunderstanding, which
would make this approach infeasible.

Reply via email to