Hey Bryan,

Thanks for looking into this issue. I would vote that we should
validate each batch independently, so we can catch issues related to
the structure of the data and not just the content. C++ doesn't do any
detection of empty batches per se, but on both ends it reads all the
data into a table, which would eliminate any empty batches.

It also wouldn't be reasonable to stop sending batches that are empty,
because Flight lets you attach metadata to batches, and so an empty
batch might still have metadata that the client or server wants.

Best,
David

On 2/24/20, Bryan Cutler <cutl...@gmail.com> wrote:
> While looking into Null type testing for ARROW-7899, a couple small issues
> came up regarding Flight integration testing with empty batches (row count
> == 0) that could be worked out with a quick discussion. It seems there is a
> small difference between the C++ and Java Flight servers when there are
> empty record batches at the end of a stream, more details in PR
> https://github.com/apache/arrow/pull/6476.
>
> The Java server sends all record batches, even the empty ones, and the test
> client verifies each of these batches matches the batches read from a JSON
> file. The C++ servers seems to recognize if the end of the stream is only
> empty batches (please correct me if I'm wrong) and will not serve them.
> This seems reasonable, as there is no more actual data left in the stream.
> The C++ test client reads all batches into a table, does the same for the
> JSON file, and compares final Tables. I also noticed that empty batches in
> the middle of the stream will be served.  My questions are:
>
> 1) What is the expected behavior of a Flight server for empty record
> batches, can they be ignored and not sent to the Client?
>
> 2) Is it good enough to test against a final concatenation of all batches
> in the stream or should each batch be verified individually to ensure the
> server is sending out correctly batched data?
>
> Thanks,
> Bryan
>

Reply via email to