+1 to keeping the result set loader.
Also, IMO, the parquet effort should move to using the result set loader (I
believe Salim has a plan to do so).

On Sun, Jul 15, 2018 at 6:50 PM, Paul Rogers <par0...@yahoo.com.invalid>
wrote:

> Hi All,
>
> Over the last six months I've been slowly trying to get the "result set
> loader" work committed to Drill. As a recap, this was supposed to provide a
> uniform way to optimally pack a record batch up to a proscribed memory
> limit. This technique is particularly useful in readers which do not have
> much information about incoming data sizes.
>
> In the mean time, the team has done a great job using the "sizer" approach
> to get a good-enough solution for all internal operators. The sizer simply
> uses statistics about incoming batches to predict outgoing batch size.
>
> At the same time, work has been done to create a one-off solution for
> Parquet. Since Parquet is, by far, Drill's most important data source, this
> means we have the reader problem is solved for the most critical use case.
>
> A time goes on, I get less and less time to maintain the result set loader
> code. My knowledge of team priorities and of Drill code drifts out of date.
>
> So, the question for the group is, is the result set loader work still
> needed? If not, we can wait to do the remaining commits until a compelling
> need presents itself. If it is needed, it would be good to know how the
> team plans to use it so that we stay in sync.
>
> Thanks,
>
> - Paul
>
>

Reply via email to