I recommend that you direct these questions to u...@arrow.apache.org
(https://mail-archives.apache.org/mod_mbox/arrow-user/).


On Fri, Jan 29, 2021 at 7:07 AM Joris Peeters
<joris.mg.peet...@gmail.com> wrote:
>
> Hello,
>
> I'm writing an HTTP server in Java that provides Arrow data to users. For
> performance, I keep the most-recently-used Arrow batches in an in-memory
> cache. A batch is wrapped in a "DataBatch" Java object containing the
> schema and field vectors.
>
> I'm looking for a good memory management strategy here, given the situation
> that,
> - batches can be evicted the in-memory cache, and the underlying memory
> should be cleared up as quickly as possible, *if nothing else is using them*
> ,
> - data retrieved from the cache undergoes a zero-copy path with filters etc
> (which are views on the underlying data) before being sent out, so it can
> still be in use when it gets cache-evicted, as there are multiple
> simultaneous threads.
>
> I'm used to C++, where this scenario would seem relatively unchallenging,
> as we'd keep std::shared_ptr's and just clean up everything in the
> destructor.
>
> In Java, however, it seems that,
> - Object#finalize is deprecated, and not super-reliable anyway,
> - GC might only happen when there is pressure on the Java heap, but the
> Arrow data is allocated in Netty buffers.
>
> I wonder if people have encountered this scenario before, and what approach
> was favoured. Some ideas,
> - Manually maintain a ref-count and free when it goes to zero. This seems
> brittle in the face of errors etc, that could lead to leaks,
> - Use the PhantomReference mechanism. Would appear to suffer from the same
> potential delay in GC, though, i.e. my Java object is just a little holder
> for the underlying FieldVectors.  Perhaps there's a way of saying that
> these DataBatch object should be GC'd often?
> - Make a copy of the data when it gets retrieved from the cache, so an
> eviction from the cache means it can always be safely removed. Seems very
> wasteful, and not very scalable if there are other reusal paths.
> - Allocate the buffers in a way that counts towards heap memory pressure.
>
> Any thoughts are appreciated! I'm not a Java expert at all, so may be
> missing obvious things, or thinking about it non-idiomatically.
>
> Best,
> -J

Reply via email to