Hi Jason, I vote for the name *BatchIterator* which instantly brings a
notion of an iterator over a underlying data. Having next method in an
iterator also make perfect sense and in general everyone is much familiar
with this naming convention as well.

Yash

On Sat, Dec 20, 2014 at 7:05 AM, Jason Altekruse <[email protected]>
wrote:

> bump
>
> On Fri, Dec 5, 2014 at 5:33 PM, Jason Altekruse <[email protected]>
> wrote:
>
> > Hello Drillers,
> >
> > I am currently working on trying to write documentation to describe our
> > current interface and implementation patterns used in RecordBatch and its
> > subclasses. These classes currently contain the implementations of all of
> > our physical operators, subclasses include FilterRecordBatch,
> HashAggBatch,
> > etc.
> >
> > This naming convention has been a point of confusion for many developers
> > as they get up to speed on Drill and begin to piece together the control
> > flow of a query. The name "RecordBatch" implies that the class is
> logically
> > a data structure, that holds a batch of records.
> >
> > During execution, each downsteam operator (which implements the
> > RecordBatch interface) will be able to access all of the data in the
> > current batches (the actual data structure) from the operator(s)
> > immediately preceding it. In this sense, calling this class a RecordBatch
> > is not entirely inaccurate, as it is providing a reference into the
> current
> > data.
> >
> > The place where it gets confusing, is that it does not just hold data.
> > Each RecordBatch has a next() method, which is used to retrieve the next
> > batch of records (the data structure). The way this is possible is that
> the
> > data is shared with consumers of the interface in the form of a vector
> > container object, which wraps value vectors. A call to next will swap out
> > the data in the vector containers with new data.
> >
> > I was talking with a few members of the dev team about this problem and
> we
> > were all in agreement that the interface and its implementations should
> be
> > renamed. We tried to talk further about the overall model and decided
> that
> > some refactoring/ encapsulation may come along with this re-naming as we
> > clarify these concepts.
> >
> > I would like to propose the beginning of this discussion with our
> > candidates for new names of the interface. The three that stood out for
> us
> > were BatchIterator, BatchStream, and BatchCursor. These all represent a
> > logical wrapper around data that will be accessed by a consumer over
> time,
> > and will be accessed in discrete chunks at some level. Each has existing
> > conventions that define them, and some might be more appropriate than
> > others for the current implementation used by Drill.
> >
> > Please share your thoughts on the best possible new name for RecordBatch.
> >
> > Thanks,
> > Jason
> >
> >
>

Reply via email to