This seems like a form of sideband communication. I think we should have a framework for this type of thing in general rather than a one-off for this particular need. Other forms of sideband might be small table bloomfilter generation and pushdown into hbase, separate file assignment/partitioning providers balancing/generating scanner workloads, statistics generation for adaptive execution, etc.
-- Jacques Nadeau CTO and Co-Founder, Dremio On Tue, Dec 1, 2015 at 11:35 AM, Hsuan Yi Chu <hyi...@maprtech.com> wrote: > I am trying to deal with the following scenario: > > A bunch of minor fragments are doing things in parallel. Each of them could > skip some records. Since the downstream minor fragment needs to know the > sum of skipped-record-counts (in order to just display or see if the number > exceeds the threshold) in the upstreams, each upstream minor fragment needs > to pass this scalar with RecordBatch. > > Since this seems impacting the protocol of RecordBatch, I am looking for > some advice here. > > Thanks. >