Hi John and Wes,

A few thoughts:
One of the issues which we didn't get into in prior discussions, is the
proposal is essentially changing the unit of exchange from RecordBatches to
a segment of a RecordBatch.

I think I brought this up earlier in discussions, an interesting idea that
Trill [1], a columnar streaming engine, illustrates.  Over the time horizon
of desired latency if you aren't receiving enough messages to take
advantage of columnar analytics, a system probably has enough time to
compact batches after the fact for later analysis and conversely if you are
receiving many events you naturally get reasonable batch sizes without
having to do further work.


> I'm objecting to RecordBatch.length being inconsistent with the
> constituent field lengths, that's where the danger lies. If all of the
> lengths are consistent, no code changes are necessary.

John, is it  a viable solution to keep all length in sync for the use case
you are imagining?

A solution I like less, but might be viable: formally specify a negative
constant that signifies length should be inherited from RowBatch length
(this could only be used on top level fields).

I contend that it can only be useful and will never be harmful.  What are
> the counter-examples of concrete harm?


I'm not sure there is anything obviously wrong, however changes to
semantics are always dangerous.  One  blemish on the current proposal  is
one can't determine easily if a mismatch in row-length is a programming
error or intentional.

[1]
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/trill-vldb2015.pdf

On Wed, Oct 16, 2019 at 4:41 PM John Muehlhausen <j...@jgm.org> wrote:

> "that's where the danger lies"
>
> What danger?  I have no idea what the specific danger is, assuming that all
> reference implementations have test cases that hedge around this.
>
> I contend that it can only be useful and will never be harmful.  What are
> the counter-examples of concrete harm?
>

Reply via email to