Re: About integration of drill and arrow

Paul Rogers Mon, 09 Dec 2019 10:54:06 -0800

Hi All,

Would be good to do some design brainstorming around this.

Integration with other tools depends on the APIs (the first two items I 
mentioned.) Last time I checked (more than a year ago), memory layout of Arrow 
is close to that in Drill; so conversion is around "packaging" and metadata, 
which can be encapsulated in an API.

Converting internals is a major undertaking. We have large amounts of complex, 
critical code that works directly with the details of value vectors. My thought 
was to first convert code to use the column readers/writers we've developed. 
Then, once all internal code uses that abstraction, we can replace the 
underlying vector implementation with Arrow. This lets us work in small stages, 
each of which is deliverable by itself.

The other approach is to change all code that works directly with Drill vectors 
to instead work with Arrow. Because that code is so detailed and fragile, that 
is a huge, risky project.

There are other approaches as well. Would be good to explore them before we 
dive into a major project.

Thanks,
- Paul

    On Monday, December 9, 2019, 07:07:31 AM PST, Charles Givre 
<cgi...@gmail.com> wrote:  

 Hi Igor, 
That would be really great if you could see that through to completion.  IMHO, 
the value from this is not so much performance related but rather the ability 
to use Drill to gather and prep data and seamlessly "hand it off" to other 
platforms for machine learning.  
-- C

> On Dec 9, 2019, at 5:48 AM, Igor Guzenko <ihor.huzenko....@gmail.com> wrote:
> 
> Hello Nai and Paul,
> 
> I would like to contribute full Apache Arrow integration.
> 
> Thanks,
> Igor
> 
> On Mon, Dec 9, 2019 at 8:56 AM Paul Rogers <par0...@yahoo.com.invalid>
> wrote:
> 
>> Hi Nai Yan,
>> 
>> Integration is still in the discussion stages. Work has been progressing
>> on some foundations which would help that integration.
>> 
>> At the Developer's Day we talked about several ways to integrate. These
>> include:
>> 
>> 1. A storage plugin to read Arrow buffers from some source so that you
>> could use Arrow data in a Drill query.
>> 
>> 2. A new Drill client API that produces Arrow buffers from a Drill query
>> so that an Arrow-based tool can consume Arrow data from Drill.
>> 
>> 3. Replacement of the Drill value vectors internally with Arrow buffers.
>> 
>> The first two are relatively straightforward; they just need someone to
>> contribute an implementation. The third is a major long-term project
>> because of the way Drill value vectors and Arrow vectors have diverged.
>> 
>> 
>> I wonder, which of these use cases is of interest to you? How might you
>> use that integration in you project?
>> 
>> 
>> Thanks,
>> - Paul
>> 
>> 
>> 
>>    On Sunday, December 8, 2019, 10:33:23 PM PST, Nai Yan. <
>> zhaon...@gmail.com> wrote:
>> 
>> Greetings,
>>      As mentioned in Drill develper Day 2018, there's a plan for Drill to
>> integrate Arrow (gandiva from Dremio). I was wondering how is going.
>> 
>>      Thanks in adavance.
>> 
>> 
>> 
>> Nai Yan
>>

Re: About integration of drill and arrow

Reply via email to