This seems like a form of sideband communication. I think we should have a
framework for this type of thing in general rather than a one-off for this
particular need. Other forms of sideband might be small table bloomfilter
generation and pushdown into hbase, separate file assignment/partitioning
providers balancing/generating scanner workloads, statistics generation for
adaptive execution, etc.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Dec 1, 2015 at 11:35 AM, Hsuan Yi Chu <hyi...@maprtech.com> wrote:

> I am trying to deal with the following scenario:
>
> A bunch of minor fragments are doing things in parallel. Each of them could
> skip some records. Since the downstream minor fragment needs to know the
> sum of skipped-record-counts (in order to just display or see if the number
> exceeds the threshold) in the upstreams, each upstream minor fragment needs
> to pass this scalar with RecordBatch.
>
> Since this seems impacting the protocol of RecordBatch, I am looking for
> some advice here.
>
> Thanks.
>

Reply via email to