Hi Everyone,
I have committed the initial work of vectored IO in the feature branch.
https://github.com/apache/hadoop/commits/feature-vectored-io
Created an epic to track the future work
https://issues.apache.org/jira/browse/HADOOP-18103.

Please try this out and review and let me know what you think. Especially
if someone can change the parquet library to use the new vectored API and
run some spark jobs to see the improvements.
Paralllely, I am working with Rajesh to do a poc in orc and running hive
queries.


Thanks,
Mukund




On Thu, Jan 13, 2022 at 12:27 AM Steve Loughran <ste...@cloudera.com.invalid>
wrote:

> DirectBufferPool does the weak refs effectively, but it only recycles
> buffers of the same size; the elastic one will return the first largest
> buffer.
>
> I think we may want to have a variant of the ElasticBufferPool which uses
> weak refs..we can adopt that and leave the (old) one alone. It is marked as
> public and stable after all
>
> I would also like a new method ByteBufferPool.release() which, when
> implementations support it, does a cleanup -essentially
> resetting the maps. interface "default" methods make it safer to add these.
>
> Handling buffet recycling really matters for the new vectored IO work which
> is very ready to go in.
> I want to create a stabilisation branch in the ASF repo with the code as
> is, then mukund can add a new PR with the buffer pooling
> improvements...that will make it easier for people to cherry pick if they
> want.
>
> On Wed, 12 Jan 2022 at 18:23, Chris Nauroth <cnaur...@apache.org> wrote:
>
> > Thanks for discussing this, Mukund.
> >
> > Another difference between the 2 classes is that ElasticByteBufferPool
> > supports a caller's preference for either on-heap or off-heap buffers and
> > internally calls either allocate or allocateDirect. Do you intend to
> > preserve that functionality too in a single merged class?
> >
> > Chris Nauroth
> >
> >
> > On Wed, Jan 12, 2022 at 1:25 AM Mukund Madhav Thakur
> > <mtha...@cloudera.com.invalid> wrote:
> >
> > > Hello Everyone,
> > > I was just browsing through the code while doing my Vectored IO stuff.
> It
> > > seems like ElasticByteBufferPool is an ever growing pool and memory is
> > not
> > > getting released as there is no WeakReference being maintained in the
> > pool.
> > > This can cause memory leaks in the production environment.  This is
> > widely
> > > used in places like StripedReconstructor,
> > > DFSStripedInputStream, BlockBlobAppendStream etc.
> > >
> > > I would suggest we use DirectBufferPool class for direct buffer pooling
> > as
> > > it already is keeping WeakReference for the buffers. Although, we will
> > have
> > > to make this implement the ByteBufferPool interface and implement the
> > > corresponding methods. Happy to make the API changes once finalized.
> > >
> > > Thanks,
> > > Mukund
> > >
> >
>

Reply via email to