Yes, Panama is a OpenJDK project ,Intel is the prime contributor. Some team of our company has tried this feature . I have a plan to try this feature to Drill. As @aman points out, it really will take some work to validate its stable and the newer version JDK.
Gandiva brings a great idea to utilize LLVM to generate dynamic code. But to a JVM project , it can not cross the JNI to use C++ code. So Gandiva designed to operate on a batch data to compensate the JNI invocation cost. And it maybe only suit to physical operators like Project , Filter to work as one RecordBatch one JNI invocation to operate on a batch off-heap data . To HashAggregate ,HashJoin which operates a group row data, the JNI invocation times will too frequent to cause a performance cost. The Panama's intrinsic vector API has no JNI cost and gives us more flexible ability to code. I don't know whether we have a plan to move to Arrow. If not, we can reference Gandiva's idea to let some operator or partition of its logic to worked as C++ code. I agree with Paul's description of current Drill's execution. Current is only memory column data, not actual vectorized execution. Even a for-loop-a-column-data will not guarantee to generate SIMD code by the JVM JIT. Of course ,this still a good performance compared to others ,as this will be cpu pipeline friendly and a data local programing behavior. I have researched some papers like[1] to find a way to let operators like HashAggregate , HashJoin to be a vectorized execution. But java lacks the ability to directly generate vectorized code. It makes me worry that a rewrite operator will not have any notable performance promotion. So I look forward to Panama. [1] http://www.cs.columbia.edu/~orestis/sigmod15.pdf On Sat, Jun 30, 2018 at 4:08 AM Paul Rogers <[email protected]> wrote: > Hi Weijie, > > As it turns out, vectorized processing in Drill is more aspirational than > operational at this point in time. > > The code used in Drill is not actually vector-based even though the data > itself is columnar. Drill generally does row-wise operations because > row-wise operations fit the SQL semantics better than column-wise > operations. There is generally a "loop over all rows" block of code that > calls into a "do something for column a" block, followed by a "do something > for column b" block, etc. > > For vectorized processing, the loops have to be inverted: "do all column > a" followed by "do all column b". That is not always possible, however. > > Further, many of Drill's readers produce Nullable types. In this case, > every value carries a null/not-null flag which must be checked for each > data value. It is unlikely that CPU instructions exist for this case. > > So, a first step is to research how various operators could be vectorized. > For example, how would we handle a "WHERE x = 10" case in a way that would > benefit from vectorization? How about a "SUM(x)" case? > > Once that is sorted out (there are likely research papers that explain how > others have done it), we can move onto changing the generated code (the > loop-over-all-rows code) to use the newer design. > > Thanks, > - Paul > > > > On Friday, June 29, 2018, 10:30:04 AM PDT, Aman Sinha < > [email protected]> wrote: > > Hi Weijie, the Panama project is an OpenJDK initialitve, right [1] ? not > Intel specific. > It would be quite a bit of work to test and certify with Intel's JVM which > may be still in the experimental stage. > Also, you may have seen the Gandiva project for Apache Arrow which aims to > improve vectorization for operations > on Arrow buffers (this requires integration with Arrow). > > I assume the test program or workload you were running was already written > to exploit vectorization. Have you also looked into > Drill's code-gen to see which ones are amenable to vectorization ? We > could start with some small use case and expand. > > [1] > http://www.oracle.com/technetwork/java/jvmls2016-ajila-vidstedt-3125545.pdf > > On Fri, Jun 29, 2018 at 3:23 AM weijie tong <[email protected]> > wrote: > > > HI all: > > > > I have investigate some vector friendly java codes's jit assembly code > by > > the JITWatch tool . Then I found that JVM did not generate the expected > AVX > > code.According to some conclusion from the JVM expert , JVM only supply > > some restrict usage case to generate AVX code. > > > > I found Intel have fired a project called panama, which supply the > > intrinsic vector API to actual execute AVX code. Here is the reference ( > > > > > https://software.intel.com/en-us/articles/vector-api-developer-program-for-java > > ) > > . It also supports offheap calculation. From our JVM team's message, the > > vector api will be released at JDK11. > > > > So I wonder whether we can distribute Intel's current JVM as a > supplied > > default JVM to users (like spark distribution with a default scala) and > as > > a option to rewrite parts of our operator codes according to this new > > vector api. > > >
