Re: [DISCUSS] Flight testing inconsistency for empty batches

2020-02-28 Thread Bryan Cutler
Thanks all, I agree with validating each record batch independently. I made
https://issues.apache.org/jira/browse/ARROW-7966 to ensure this, and that
will hopefully iron out any kinks in the different implementations.

Thanks,
Bryan

On Wed, Feb 26, 2020 at 3:13 PM Wes McKinney  wrote:

> I agree with independent validation.
>
> On Tue, Feb 25, 2020 at 2:55 PM David Li  wrote:
> >
> > Hey Bryan,
> >
> > Thanks for looking into this issue. I would vote that we should
> > validate each batch independently, so we can catch issues related to
> > the structure of the data and not just the content. C++ doesn't do any
> > detection of empty batches per se, but on both ends it reads all the
> > data into a table, which would eliminate any empty batches.
> >
> > It also wouldn't be reasonable to stop sending batches that are empty,
> > because Flight lets you attach metadata to batches, and so an empty
> > batch might still have metadata that the client or server wants.
> >
> > Best,
> > David
> >
> > On 2/24/20, Bryan Cutler  wrote:
> > > While looking into Null type testing for ARROW-7899, a couple small
> issues
> > > came up regarding Flight integration testing with empty batches (row
> count
> > > == 0) that could be worked out with a quick discussion. It seems there
> is a
> > > small difference between the C++ and Java Flight servers when there are
> > > empty record batches at the end of a stream, more details in PR
> > > https://github.com/apache/arrow/pull/6476.
> > >
> > > The Java server sends all record batches, even the empty ones, and the
> test
> > > client verifies each of these batches matches the batches read from a
> JSON
> > > file. The C++ servers seems to recognize if the end of the stream is
> only
> > > empty batches (please correct me if I'm wrong) and will not serve them.
> > > This seems reasonable, as there is no more actual data left in the
> stream.
> > > The C++ test client reads all batches into a table, does the same for
> the
> > > JSON file, and compares final Tables. I also noticed that empty
> batches in
> > > the middle of the stream will be served.  My questions are:
> > >
> > > 1) What is the expected behavior of a Flight server for empty record
> > > batches, can they be ignored and not sent to the Client?
> > >
> > > 2) Is it good enough to test against a final concatenation of all
> batches
> > > in the stream or should each batch be verified individually to ensure
> the
> > > server is sending out correctly batched data?
> > >
> > > Thanks,
> > > Bryan
> > >
>


Re: [DISCUSS] Flight testing inconsistency for empty batches

2020-02-26 Thread Wes McKinney
I agree with independent validation.

On Tue, Feb 25, 2020 at 2:55 PM David Li  wrote:
>
> Hey Bryan,
>
> Thanks for looking into this issue. I would vote that we should
> validate each batch independently, so we can catch issues related to
> the structure of the data and not just the content. C++ doesn't do any
> detection of empty batches per se, but on both ends it reads all the
> data into a table, which would eliminate any empty batches.
>
> It also wouldn't be reasonable to stop sending batches that are empty,
> because Flight lets you attach metadata to batches, and so an empty
> batch might still have metadata that the client or server wants.
>
> Best,
> David
>
> On 2/24/20, Bryan Cutler  wrote:
> > While looking into Null type testing for ARROW-7899, a couple small issues
> > came up regarding Flight integration testing with empty batches (row count
> > == 0) that could be worked out with a quick discussion. It seems there is a
> > small difference between the C++ and Java Flight servers when there are
> > empty record batches at the end of a stream, more details in PR
> > https://github.com/apache/arrow/pull/6476.
> >
> > The Java server sends all record batches, even the empty ones, and the test
> > client verifies each of these batches matches the batches read from a JSON
> > file. The C++ servers seems to recognize if the end of the stream is only
> > empty batches (please correct me if I'm wrong) and will not serve them.
> > This seems reasonable, as there is no more actual data left in the stream.
> > The C++ test client reads all batches into a table, does the same for the
> > JSON file, and compares final Tables. I also noticed that empty batches in
> > the middle of the stream will be served.  My questions are:
> >
> > 1) What is the expected behavior of a Flight server for empty record
> > batches, can they be ignored and not sent to the Client?
> >
> > 2) Is it good enough to test against a final concatenation of all batches
> > in the stream or should each batch be verified individually to ensure the
> > server is sending out correctly batched data?
> >
> > Thanks,
> > Bryan
> >


Re: [DISCUSS] Flight testing inconsistency for empty batches

2020-02-25 Thread David Li
Hey Bryan,

Thanks for looking into this issue. I would vote that we should
validate each batch independently, so we can catch issues related to
the structure of the data and not just the content. C++ doesn't do any
detection of empty batches per se, but on both ends it reads all the
data into a table, which would eliminate any empty batches.

It also wouldn't be reasonable to stop sending batches that are empty,
because Flight lets you attach metadata to batches, and so an empty
batch might still have metadata that the client or server wants.

Best,
David

On 2/24/20, Bryan Cutler  wrote:
> While looking into Null type testing for ARROW-7899, a couple small issues
> came up regarding Flight integration testing with empty batches (row count
> == 0) that could be worked out with a quick discussion. It seems there is a
> small difference between the C++ and Java Flight servers when there are
> empty record batches at the end of a stream, more details in PR
> https://github.com/apache/arrow/pull/6476.
>
> The Java server sends all record batches, even the empty ones, and the test
> client verifies each of these batches matches the batches read from a JSON
> file. The C++ servers seems to recognize if the end of the stream is only
> empty batches (please correct me if I'm wrong) and will not serve them.
> This seems reasonable, as there is no more actual data left in the stream.
> The C++ test client reads all batches into a table, does the same for the
> JSON file, and compares final Tables. I also noticed that empty batches in
> the middle of the stream will be served.  My questions are:
>
> 1) What is the expected behavior of a Flight server for empty record
> batches, can they be ignored and not sent to the Client?
>
> 2) Is it good enough to test against a final concatenation of all batches
> in the stream or should each batch be verified individually to ensure the
> server is sending out correctly batched data?
>
> Thanks,
> Bryan
>


[DISCUSS] Flight testing inconsistency for empty batches

2020-02-24 Thread Bryan Cutler
While looking into Null type testing for ARROW-7899, a couple small issues
came up regarding Flight integration testing with empty batches (row count
== 0) that could be worked out with a quick discussion. It seems there is a
small difference between the C++ and Java Flight servers when there are
empty record batches at the end of a stream, more details in PR
https://github.com/apache/arrow/pull/6476.

The Java server sends all record batches, even the empty ones, and the test
client verifies each of these batches matches the batches read from a JSON
file. The C++ servers seems to recognize if the end of the stream is only
empty batches (please correct me if I'm wrong) and will not serve them.
This seems reasonable, as there is no more actual data left in the stream.
The C++ test client reads all batches into a table, does the same for the
JSON file, and compares final Tables. I also noticed that empty batches in
the middle of the stream will be served.  My questions are:

1) What is the expected behavior of a Flight server for empty record
batches, can they be ignored and not sent to the Client?

2) Is it good enough to test against a final concatenation of all batches
in the stream or should each batch be verified individually to ensure the
server is sending out correctly batched data?

Thanks,
Bryan