>
> > 1) contribute the missing support ourselves
> I actually think we might need to proceed with this option.


I agree. I am willing to help with this and explore and try different
approaches. I would start looking into the JNI approach. Contributing back
to lz4-java or adding this to Arrow.

Best,
Benjamin


On Wed, Mar 17, 2021 at 5:51 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> >
> > 1) contribute the missing support ourselves
>
>
> I actually think we might need to proceed with this option.  Even more
> unfortunate, is I think the best place at the moment for the contribution
> to live is within Arrow.  Fortunately, i think a port of the existing
> Apache Commons library for off-heap use should be relatively easy.  We can
> reach out to Apache Commons to see if they would be interested in this
> contribution but I would guess not, since I don't think there is a lot off
> off-heap logic in the library in general (but my knowledge is stale here).
>
> 2) use another LZ4 library for Java
>
>
> We are using the only library I could find that seems to have full support
> for LZ4 Frame data.  Unfortunately it is purely on-heap which I believe is
> the source of the performance problems.
>
> On Wed, Mar 17, 2021 at 7:15 AM Antoine Pitrou <anto...@python.org> wrote:
>
> >
> > If you look at
> >
> >
> https://github.com/lz4/lz4-java/graphs/contributors?from=2019-12-28&to=2021-03-17&type=c
> ,
> >
> > lz4-java seems to be receiving very little maintenance.  So I think
> > there are two possible avenues:
> >
> > 1) contribute the missing support ourselves
> > 2) use another LZ4 library for Java
> >
> > Solution #2 seems more reasonable to me.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 11/03/2021 à 21:05, Micah Kornfield a écrit :
> > > FYI, I opened up https://github.com/lz4/lz4-java/issues/176 to discuss
> > > support for dependent frames.
> > >
> > > On Thu, Mar 11, 2021 at 11:59 AM David Li <lidav...@apache.org> wrote:
> > >
> > >> At least for Flight, I don't think we'd use that. Right now the way
> > >> compression is supported is the same way as with Feather, i.e. the
> body
> > >> buffers in each individual record batch sent on the wire are
> compressed,
> > >> but not the stream as a whole. (And so far we haven't found a
> compelling
> > >> benefit for compression in Flight in general.)
> > >>
> > >> Best,
> > >> David
> > >>
> > >> On Thu, Mar 11, 2021, at 14:34, Antoine Pitrou wrote:
> > >>>
> > >>> Le 11/03/2021 à 19:54, Micah Kornfield a écrit :
> > >>>>>
> > >>>>> Indeed, I don't think it was discussed publicly.  The LZ4 frame
> > format
> > >>>>> has several things going for it:
> > >>>>> - it allows streaming compression and decompression (meaning you
> can
> > >>>>> avoid loading a huge compressed buffer at once)
> > >>>>
> > >>>> Is this something we make use of or intend to make use of?
> > >>>
> > >>> Good question.  Currently we don't.  Perhaps David Li wants to answer
> > >>> this, since he's been working a lot on Flight.
> > >>>
> > >>>>> - it embeds the decompressed size, allowing exact allocation of the
> > >>>>> decompressed buffer
> > >>>>
> > >>>> IIUC, We already do this in the IPC specification (the first 8 bytes
> > >> of the
> > >>>> compressed buffer are used for this).
> > >>>
> > >>> Ah, you're right.  It doesn't matter then.
> > >>>
> > >>>> - it has an optional checksum
> > >>>>
> > >>>> This seems like a good thing, so probably worth keeping (although it
> > >> would
> > >>>> be the only place where we do checksums today).
> > >>>
> > >>> (or of course we could add an optional higher-level checksum in the
> IPC
> > >>> format)
> > >>>
> > >>> Regards
> > >>>
> > >>> Antoine.
> > >>>
> > >>
> > >
> >
>

Reply via email to