Re: [Discuss] Integrate Arrow gandiva into Drill

weijie tong Fri, 05 Apr 2019 17:03:00 -0700

got it. Thanks to all of you!

On Sat, Apr 6, 2019 at 4:24 AM Karthikeyan Manivannan <kmanivan...@mapr.com>
wrote:


> Hi Weijie,
>
> You are right. Before DRILL-6340 the purpose of the hasRemainder() logic
> was not clear. projector.projectRecords() always took in the
> incomingRowCount as the argument and returned the same value in
> non-exceptional paths. So, I think the whole hasReaminder() was dead-code
> then. I did not investigate it further because I knew that under DRILL-6340
> that code would definitely be necessary.
>
> Karthik
>
>
> On Fri, Apr 5, 2019 at 9:27 AM Sorabh Hamirwasia <sohami.apa...@gmail.com>
> wrote:
>
> > Hi Weijie,
> > I think the only case in which that line will be executed is if there is
> > any UDF like flatten operation which results in producing multiple rows
> for
> > each input row. Even though currently Flatten is a separate operator in
> > Drill but I think that code is there to handle such cases.
> >
> > Thanks,
> > Sorabh
> >
> > On Fri, Apr 5, 2019 at 6:08 AM weijie tong <tongweijie...@gmail.com>
> > wrote:
> >
> > > The first appearance of the comparison code is at DRILL-620 :
> > >
> > >
> >
> https://github.com/apache/drill/commit/a2355d42dbff51b858fc28540915cf793f1c0fac#diff-e87beb3f2aa0fbc06b07b1d55c3d3536
> > > . Before DRILL-6340 , according to the ProjectorTemplate's
> projectRecords
> > > method and its actual input parameter values , I think  the line 234 of
> > > ProjectRecordBatch will never be executed. Untill DRILL-6340 , we
> control
> > > the output batch memory size, that part of code finally come into use.
> > >
> > > If I was wrong, please let me know.
> > >
> > > On Fri, Apr 5, 2019 at 12:15 AM weijie tong <tongweijie...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for the reply, But it seems the code has been there even
> before
> > > > DRILL-6340.
> > > >
> > > > On Thu, Apr 4, 2019 at 10:45 PM Vova Vysotskyi <vvo...@gmail.com>
> > wrote:
> > > >
> > > >> Hi Weijie,
> > > >>
> > > >> It is possible if maxOuputRecordCount (received from
> > > >> memoryManager.getOutputRowCount()) is less than incomingRecordCount.
> > > >> For more details please see DRILL-6340
> > > >> <https://issues.apache.org/jira/browse/DRILL-6340> and design
> > document
> > > >> <
> > > >>
> > >
> >
> https://docs.google.com/document/d/1h0WsQsen6xqqAyyYSrtiAniQpVZGmQNQqC1I2DJaxAA/edit?usp=sharing
> > > >> >
> > > >> attached to this Jira.
> > > >>
> > > >> Kind regards,
> > > >> Volodymyr Vysotskyi
> > > >>
> > > >>
> > > >> On Thu, Apr 4, 2019 at 5:17 PM weijie tong <tongweijie...@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >> > I have a doubt about the ProjectRecordBatch implementation.  Hope
> > > >> someone
> > > >> > could give an explanation about that. To the line 234 of
> > > >> > ProjectRecordBatch, at what case,the projector output row size
> less
> > > than
> > > >> > the input size ?
> > > >> >
> > > >> > On Thu, Apr 4, 2019 at 5:11 PM weijie tong <
> tongweijie...@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Igor:
> > > >> > > That's a good idea! It could resolve that issue. The basic
> > question
> > > >> has
> > > >> > > solved. To use the official Arrow,  there's still two issues
> > needed
> > > >> to be
> > > >> > > contributed to Arrow, that I will do:
> > > >> > > 1. make gcc lib static linked into the jni dynamic lib.
> > > >> > >   Without this, it will require the platform installed right
> > version
> > > >> gcc
> > > >> > > 2. add convertToNull function to gandiva
> > > >> > >  This could make some project expression with convertToNull
> > function
> > > >> to
> > > >> > be
> > > >> > > gandiva executed
> > > >> > >
> > > >> > > Of course, without these two issues solved, I still could give
> an
> > > >> > > integration implementation.
> > > >> > >
> > > >> > > BTW, once the integration is done. How do we supply the gandiva
> > jni
> > > >> lib ?
> > > >> > > Leave it to user to build it ? or we supply different platform
> > > >> > > distributions?
> > > >> > >
> > > >> > >
> > > >> > > On Thu, Apr 4, 2019 at 3:53 PM Igor Guzenko <
> > > >> ihor.huzenko....@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > >> Hello Weijie,
> > > >> > >>
> > > >> > >> Did you try to create same package as in Arrow, but in Drill
> and
> > > use
> > > >> > >> wrapper class around target for publishing
> > > >> > >> desired methods with package access ?
> > > >> > >>
> > > >> > >> Thanks, Igor
> > > >> > >>
> > > >> > >> On Thu, Apr 4, 2019 at 9:51 AM weijie tong <
> > > tongweijie...@gmail.com>
> > > >> > >> wrote:
> > > >> > >> >
> > > >> > >> > HI :
> > > >> > >> >
> > > >> > >> > Gandiva is a sub project of Arrow. Arrow gandiva using LLVM
> > > codegen
> > > >> > and
> > > >> > >> > simd skill could achieve better query performance.  Arrow and
> > > Drill
> > > >> > has
> > > >> > >> > similar column memory format. The main difference now is the
> > null
> > > >> > >> > representation. Also Arrow has made great changes to the
> > > >> ValueVector.
> > > >> > To
> > > >> > >> > adopt Arrow to replace Drill's VV has been discussed before.
> > That
> > > >> > would
> > > >> > >> be
> > > >> > >> > a great job. But to leverage gandiva , by working at the
> > physical
> > > >> > memory
> > > >> > >> > address level , this work could be little relatively.
> > > >> > >> >
> > > >> > >> > Now I have done the integration work at our own branch by
> make
> > > some
> > > >> > >> changes
> > > >> > >> > to the Arrow branch, and issued DRILL-7087 and ARROW-4819.
> The
> > > main
> > > >> > >> changes
> > > >> > >> > to ARROW-4819 is to make some package level method to be
> > public.
> > > >> But
> > > >> > >> arrow
> > > >> > >> > community seems not plan to accept this change. Their advice
> is
> > > to
> > > >> > have
> > > >> > >> a
> > > >> > >> > arrow branch.
> > > >> > >> >
> > > >> > >> > So what do you think?
> > > >> > >> >
> > > >> > >> > 1、Have a self branch of Arrow.
> > > >> > >> > 2、waiting for the Arrow integration completely.
> > > >> > >> > or some other ideas?
> > > >> > >>
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [Discuss] Integrate Arrow gandiva into Drill

Reply via email to