Re: Actual vectorization execution

weijie tong Fri, 29 Jun 2018 21:57:26 -0700

Yes, Panama is a OpenJDK project ,Intel is the prime contributor. Some team
of our company has tried this feature . I have a plan to try this feature
to Drill. As @aman points out, it really will take some work to validate
its stable and the newer version JDK.

Gandiva brings a great idea to utilize LLVM to generate dynamic code. But
to a JVM project , it can not cross the JNI to use C++ code. So Gandiva
designed to operate on a batch data to compensate the JNI invocation cost.
And it maybe only suit to physical operators like Project , Filter to work
as one RecordBatch one JNI invocation to operate on a batch off-heap data .
To HashAggregate ,HashJoin which operates a group row data, the JNI
invocation times will too frequent to cause a performance cost. The
Panama's intrinsic vector API has no JNI cost and gives us more flexible
ability to code.

I don't know whether we have a plan to move to Arrow. If not, we can
reference Gandiva's idea to let some operator or partition of its logic to
worked as C++ code.

I agree with Paul's description of current Drill's execution. Current is
only memory column data, not actual vectorized execution. Even a
for-loop-a-column-data will not guarantee to generate SIMD code by the JVM
JIT. Of course ,this still a good performance compared to others ,as this
will be cpu pipeline friendly and a data local programing behavior. I have
researched some papers like[1] to find a way to let operators like
HashAggregate , HashJoin to be a vectorized execution. But java lacks the
ability to directly generate vectorized code. It makes me worry that a
rewrite operator will not have any notable performance promotion. So I look
forward to Panama.

[1] http://www.cs.columbia.edu/~orestis/sigmod15.pdf

On Sat, Jun 30, 2018 at 4:08 AM Paul Rogers <[email protected]>
wrote:

> Hi Weijie,
>
> As it turns out, vectorized processing in Drill is more aspirational than
> operational at this point in time.
>
> The code used in Drill is not actually vector-based even though the data
> itself is columnar. Drill generally does row-wise operations because
> row-wise operations fit the SQL semantics better than column-wise
> operations. There is generally a "loop over all rows" block of code that
> calls into a "do something for column a" block, followed by a "do something
> for column b" block, etc.
>
> For vectorized processing, the loops have to be inverted: "do all column
> a" followed by "do all column b". That is not always possible, however.
>
> Further, many of Drill's readers produce Nullable types. In this case,
> every value carries a null/not-null flag which must be checked for each
> data value. It is unlikely that CPU instructions exist for this case.
>
> So, a first step is to research how various operators could be vectorized.
> For example, how would we handle a "WHERE x = 10" case in a way that would
> benefit from vectorization? How about a "SUM(x)" case?
>
> Once that is sorted out (there are likely research papers that explain how
> others have done it), we can move onto changing the generated code (the
> loop-over-all-rows code) to use the newer design.
>
> Thanks,
> - Paul
>
>
>
>     On Friday, June 29, 2018, 10:30:04 AM PDT, Aman Sinha <
> [email protected]> wrote:
>
>  Hi Weijie,  the Panama project is an OpenJDK initialitve, right [1] ? not
> Intel specific.
> It would be quite a bit of work to test and certify with Intel's JVM which
> may be still in the experimental stage.
> Also, you may have seen the Gandiva project for Apache Arrow which aims to
> improve vectorization for operations
> on Arrow buffers (this requires integration with Arrow).
>
> I assume the test program or workload you were running was already written
> to exploit vectorization.  Have you also looked into
> Drill's code-gen to see which ones are amenable to vectorization ?  We
> could start with some small use case and expand.
>
> [1]
> http://www.oracle.com/technetwork/java/jvmls2016-ajila-vidstedt-3125545.pdf
>
> On Fri, Jun 29, 2018 at 3:23 AM weijie tong <[email protected]>
> wrote:
>
> > HI all:
> >
> >  I have investigate some vector friendly java codes's jit assembly code
> by
> > the JITWatch tool . Then I found that JVM did not generate the expected
> AVX
> > code.According to some conclusion from the JVM expert , JVM only supply
> > some restrict usage case to generate AVX code.
> >
> >    I found Intel have fired a project called panama, which supply the
> > intrinsic vector API to actual execute AVX code. Here is the reference (
> >
> >
> https://software.intel.com/en-us/articles/vector-api-developer-program-for-java
> > )
> > . It also supports offheap calculation.  From our JVM team's message, the
> > vector api will be released at JDK11.
> >
> >    So I wonder whether we can distribute Intel's current JVM as a
> supplied
> > default JVM to users (like spark distribution with a default scala) and
> as
> > a option to rewrite parts of our operator codes according to this new
> > vector api.
> >
>

Re: Actual vectorization execution

Reply via email to