+1 for a sideband mechanism. Sideband can also allow correlated restart of sub-queries.
In sideband use cases you described, the messages ran in the opposite direction to the data. Would the sideband also run in the same direction as the data? If so it could carry warnings, rejected rows, progress indications, and (for online aggregation[1]) notifications that a better approximate query result is available. Julian [1] https://en.wikipedia.org/wiki/Online_aggregation > On Dec 1, 2015, at 1:51 PM, Jacques Nadeau <[email protected]> wrote: > > This seems like a form of sideband communication. I think we should have a > framework for this type of thing in general rather than a one-off for this > particular need. Other forms of sideband might be small table bloomfilter > generation and pushdown into hbase, separate file assignment/partitioning > providers balancing/generating scanner workloads, statistics generation for > adaptive execution, etc. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Tue, Dec 1, 2015 at 11:35 AM, Hsuan Yi Chu <[email protected]> wrote: > >> I am trying to deal with the following scenario: >> >> A bunch of minor fragments are doing things in parallel. Each of them could >> skip some records. Since the downstream minor fragment needs to know the >> sum of skipped-record-counts (in order to just display or see if the number >> exceeds the threshold) in the upstreams, each upstream minor fragment needs >> to pass this scalar with RecordBatch. >> >> Since this seems impacting the protocol of RecordBatch, I am looking for >> some advice here. >> >> Thanks. >>
