hi Micah -- I think having support for this in some way in the IPC
protocol makes sense (it seems slightly less important for the C API
but worth thinking about). It's helpful to know that Dremio (a big
Arrow user) already employs various filters / selection vectors.

The question is how mechanically, would it be some extra buffers at
the start or end of the record batch body (probably have to be at the
end of the body for forward compatibility reasons)?

On Sun, Jan 26, 2020 at 1:16 PM Jacques Nadeau <jacq...@apache.org> wrote:
>
> At Dremio, we use four main types of selection vector/bitmaps:
>
> Dense Format (record valid or not, no ordering)
> - single bit (bitmap)
>
> Sparse formats (identifies valid records as well as their order)
> - 2 byte (for record batches up to 2^16 records).
> - 4 byte (for 2^16 batches of 2^16 records);
> - 6 byte (for 2^32 batches of 2^16 records);
>
> We've considered introducing a couple more. I imagine for other use cases,
> where people use much larger batches of records, different requirements
> would be necessary. My reason for sharing is it seems like this may be
> use-case specific. I'd also note that at the IPC level, you'd generally
> want to contract batches before dropping them on the wire (or at least that
> is what we typically do).
>
> On Fri, Jan 24, 2020 at 11:23 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > I was thinking selection vector/bitmap (possibly with different encodings),
> > but really nothing for now.  Ordinarily, I'd lean towards YAGNI but there
> > isn't a good way to add this in easily in a forward compatible way unless
> > we add a placeholder enum/table for 1.0 (the default option would be no
> > filter and wouldn't change the packaged data at all).
> >
> > On Fri, Jan 24, 2020 at 4:55 AM Francois Saint-Jacques <
> > fsaintjacq...@gmail.com> wrote:
> >
> > > By filter, you mean a filter expression, or a selection vector/bitmap?
> > >
> > > On Thu, Jan 23, 2020 at 11:38 PM Micah Kornfield <emkornfi...@gmail.com>
> > > wrote:
> > > >
> > > > One of the things that I think got overlooked in the conversation on
> > > having
> > > > a slice offset in the C API was a suggestion from Jacques of perhaps
> > > > generalizing the concept to an arbitrary "filter" for arrays/record
> > > batches.
> > > >
> > > > I believe this point was also discussed in the past as well.  I'm not
> > > > advocating for adding it now but I'm curious if people feel we should
> > add
> > > > something to Schema.fbs for forward compatibility,  in case we wish to
> > > > support this use-case in the future.
> > > >
> > > > Thanks,
> > > > Micah
> > >
> >

Reply via email to