> > > 1) contribute the missing support ourselves > I actually think we might need to proceed with this option.
I agree. I am willing to help with this and explore and try different approaches. I would start looking into the JNI approach. Contributing back to lz4-java or adding this to Arrow. Best, Benjamin On Wed, Mar 17, 2021 at 5:51 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > > > > 1) contribute the missing support ourselves > > > I actually think we might need to proceed with this option. Even more > unfortunate, is I think the best place at the moment for the contribution > to live is within Arrow. Fortunately, i think a port of the existing > Apache Commons library for off-heap use should be relatively easy. We can > reach out to Apache Commons to see if they would be interested in this > contribution but I would guess not, since I don't think there is a lot off > off-heap logic in the library in general (but my knowledge is stale here). > > 2) use another LZ4 library for Java > > > We are using the only library I could find that seems to have full support > for LZ4 Frame data. Unfortunately it is purely on-heap which I believe is > the source of the performance problems. > > On Wed, Mar 17, 2021 at 7:15 AM Antoine Pitrou <anto...@python.org> wrote: > > > > > If you look at > > > > > https://github.com/lz4/lz4-java/graphs/contributors?from=2019-12-28&to=2021-03-17&type=c > , > > > > lz4-java seems to be receiving very little maintenance. So I think > > there are two possible avenues: > > > > 1) contribute the missing support ourselves > > 2) use another LZ4 library for Java > > > > Solution #2 seems more reasonable to me. > > > > Regards > > > > Antoine. > > > > > > Le 11/03/2021 à 21:05, Micah Kornfield a écrit : > > > FYI, I opened up https://github.com/lz4/lz4-java/issues/176 to discuss > > > support for dependent frames. > > > > > > On Thu, Mar 11, 2021 at 11:59 AM David Li <lidav...@apache.org> wrote: > > > > > >> At least for Flight, I don't think we'd use that. Right now the way > > >> compression is supported is the same way as with Feather, i.e. the > body > > >> buffers in each individual record batch sent on the wire are > compressed, > > >> but not the stream as a whole. (And so far we haven't found a > compelling > > >> benefit for compression in Flight in general.) > > >> > > >> Best, > > >> David > > >> > > >> On Thu, Mar 11, 2021, at 14:34, Antoine Pitrou wrote: > > >>> > > >>> Le 11/03/2021 à 19:54, Micah Kornfield a écrit : > > >>>>> > > >>>>> Indeed, I don't think it was discussed publicly. The LZ4 frame > > format > > >>>>> has several things going for it: > > >>>>> - it allows streaming compression and decompression (meaning you > can > > >>>>> avoid loading a huge compressed buffer at once) > > >>>> > > >>>> Is this something we make use of or intend to make use of? > > >>> > > >>> Good question. Currently we don't. Perhaps David Li wants to answer > > >>> this, since he's been working a lot on Flight. > > >>> > > >>>>> - it embeds the decompressed size, allowing exact allocation of the > > >>>>> decompressed buffer > > >>>> > > >>>> IIUC, We already do this in the IPC specification (the first 8 bytes > > >> of the > > >>>> compressed buffer are used for this). > > >>> > > >>> Ah, you're right. It doesn't matter then. > > >>> > > >>>> - it has an optional checksum > > >>>> > > >>>> This seems like a good thing, so probably worth keeping (although it > > >> would > > >>>> be the only place where we do checksums today). > > >>> > > >>> (or of course we could add an optional higher-level checksum in the > IPC > > >>> format) > > >>> > > >>> Regards > > >>> > > >>> Antoine. > > >>> > > >> > > > > > >