Re: Actual vectorization execution

Paul Rogers Fri, 29 Jun 2018 13:08:55 -0700

Hi Weijie,

As it turns out, vectorized processing in Drill is more aspirational than 
operational at this point in time.

The code used in Drill is not actually vector-based even though the data itself 
is columnar. Drill generally does row-wise operations because row-wise 
operations fit the SQL semantics better than column-wise operations. There is 
generally a "loop over all rows" block of code that calls into a "do something 
for column a" block, followed by a "do something for column b" block, etc.

For vectorized processing, the loops have to be inverted: "do all column a" 
followed by "do all column b". That is not always possible, however.

Further, many of Drill's readers produce Nullable types. In this case, every 
value carries a null/not-null flag which must be checked for each data value. 
It is unlikely that CPU instructions exist for this case.

So, a first step is to research how various operators could be vectorized. For 
example, how would we handle a "WHERE x = 10" case in a way that would benefit 
from vectorization? How about a "SUM(x)" case?

Once that is sorted out (there are likely research papers that explain how 
others have done it), we can move onto changing the generated code (the 
loop-over-all-rows code) to use the newer design.

Thanks,
- Paul

    On Friday, June 29, 2018, 10:30:04 AM PDT, Aman Sinha <amansi...@gmail.com> 
wrote:  

 Hi Weijie,  the Panama project is an OpenJDK initialitve, right [1] ? not
Intel specific.
It would be quite a bit of work to test and certify with Intel's JVM which
may be still in the experimental stage.
Also, you may have seen the Gandiva project for Apache Arrow which aims to
improve vectorization for operations
on Arrow buffers (this requires integration with Arrow).

I assume the test program or workload you were running was already written
to exploit vectorization.  Have you also looked into
Drill's code-gen to see which ones are amenable to vectorization ?  We
could start with some small use case and expand.

[1]
http://www.oracle.com/technetwork/java/jvmls2016-ajila-vidstedt-3125545.pdf

On Fri, Jun 29, 2018 at 3:23 AM weijie tong <tongweijie...@gmail.com> wrote:

> HI all:
>
>  I have investigate some vector friendly java codes's jit assembly code by
> the JITWatch tool . Then I found that JVM did not generate the expected AVX
> code.According to some conclusion from the JVM expert , JVM only supply
> some restrict usage case to generate AVX code.
>
>    I found Intel have fired a project called panama, which supply the
> intrinsic vector API to actual execute AVX code. Here is the reference (
>
> https://software.intel.com/en-us/articles/vector-api-developer-program-for-java
> )
> . It also supports offheap calculation.  From our JVM team's message, the
> vector api will be released at JDK11.
>
>    So I wonder whether we can distribute Intel's current JVM as a supplied
> default JVM to users (like spark distribution with a default scala) and as
> a option to rewrite parts of our operator codes according to this new
> vector api.
>

Re: Actual vectorization execution

Reply via email to