Eric, you can take a look here
<https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/common/ht2/Pivots.java#L176>
how we try to optimize the copy (validity and data) in/out of vectors. We
try to start with word-wise copy (64 column values and thus 64 validity
bits) and then accordingly branch. Similar to this there are other examples
of manipulating off heap buffers through PlatformDependent APIs -- which I
think is same as using sun.misc.UNSAFE as the former eventually uses the
latter underneath.

In my opinion, we should take a look at  vector APIs and see where can we
possibly eliminate branches. I did some of it earlier -- as an example see
this
<https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/IntVector.java#L144>
for branch free cell-to-cell copy between two columns. The idea is to copy
junk data disregarding the validity bit. As long as the validity bit is
copied correctly, we are good.

Couple of other things have been there on my todo list but haven't yet
gotten to them. Like for your example, we should remove the branch here
<https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/IntVector.java#L277>
The caller is already telling us that validity bit is set or not (1 or 0 in
parameter isSet). So just following is enough to set the value (Null or non
null) at any cell and I think this will speed up the tight for loop in your
application. There is no need to check whether isSet is 1 or 0. We should
be simply setting it.

PS:

I have long been a proponent of implementing native SIMD acceleration
support into Arrow libraries and have plugged this shamelessly in emails
every now and then (like this one). Use cases like yours and several others
can then be natively supported by Arrow for most optimal execution.

On several occasions we have seen topic of SIMD acceleration support with
Arrow coming up on mailing list and I think it's high time we should do
something about it and build kernels  which can do simple tight loop
operations like sum (vector1, vector2, num_values) and several others
extremely efficiently.


On Mon, Jul 23, 2018 at 4:44 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> hi Eric,
>
> Antoine recently did some work on faster bitsetting in C++ by
> unrolling the main loop to set one byte at a time
>
> https://github.com/apache/arrow/blob/27b869ae5df31f3be61e76e99999d9
> 6ea7d9b557/cpp/src/arrow/util/bit-util.h#L598
>
> This yielded major speedups when setting a lot of bits. A similar
> strategy should be possible in Java for your use case. We speculated
> that it could be made even faster by eliminating the branch in the
> bit-setting assignments (the g() | left_branch : right_branch
> statements). If you dig around in the Dremio codebase you can find
> plenty of low level off-heap memory manipulation that may be helpful
> (others may be able to comment).
>
> If some utilities could be developed here in the Arrow Java codebase
> for common benefit, that would be great.
>
> Otherwise copying the values data without branching is an obvious
> optimization. Others may have ideas
>
> - Wes
>
> On Mon, Jul 23, 2018 at 5:50 PM, Eric Wohlstadter <wohls...@gmail.com>
> wrote:
> > Hi all,
> >   I work on a project that uses Arrow streaming format to transfer data
> > between Java processes.
> > We're also following the progress on Java support for Plasma, and may
> > decide use Plasma also.
> >
> > We typically uses a pattern like this to fill Arrow vectors from Java
> > arrays:
> > ----
> > int[] inputValues = ...;
> > boolean[] nullInputValues = ...;
> >
> > org.apache.arrow.vector.IntVector vector = ...;
> > for(int i = 0; i < inputValues.size; i++) {
> >   if(nullInputValues[i]) {
> >     vector.setNull(i);
> >   } else {
> >     vector.set(i, inputValues[i]);
> >   }
> > }
> > ----
> >
> > Obviously the JIT won't be able to vectorize this loop. Does anyone know
> if
> > there is another way to achieve this which
> > would be vectorized?
> >
> > Here is a pseudo-code mockup of what I was thinking about, is this
> approach
> > worth pursuing?
> >
> > The idea is to try to convert input into Arrow format in a vectorized
> loop,
> > and then use sun.misc.Unsafe to copy the
> > converted on-heap input to an off-heap valueBuffer.
> >
> > I'll ignore the details of the validityBuffer here, since it would follow
> > along the same lines:
> >
> > ----
> > int[] inputValues = ...;
> > org.apache.arrow.vector.IntVector vector = ...;
> >
> > for(int i = 0; i < inputValues.size; i++) {
> >   //convert inputValues[i] to little-endian
> >   //this conversion can be SIMD vectorized?
> > }
> > UNSAFE.copyMemory(
> >   inputValues,
> >   0,
> >   null,
> >   vector.getDataBuffer().memoryAddress(),
> >   sizeof(Integer.class) * inputValues.size
> > );
> > ----
> >
> > Thanks for any feedback about details I may be misunderstanding, which
> > would make this approach infeasible.
>

Reply via email to