IIRC I don't think flight requires equal size batches, this seems to be an
issue in Dremio's implementation.

On Wed, Oct 18, 2023 at 8:25 AM Gunther Rademacher <g...@gmx.de> wrote:

> When my Java Flight client receives batches of records (as in the sample
> code shown at the bottom), Dremio sends them in batches with a fixed
> number of records (apparently 3968 by default). My problem is that both
> batch size and record size determine the memory requirement. So with a
> fixed batch size, available memory may remain unused for small records,
> while it may overflow for large records. My goal is to be able to stream
> records of arbitrary size within some limited amount of direct memory.
>
> The batch size calculation in Dremio seems to be done here:
>
> https://github.com/dremio/dremio-oss/blob/be47367c523e4eded35df4e8fff725f53160558e/sabot/kernel/src/main/java/com/dremio/exec/planner/physical/PhysicalPlanCreator.java#L185-L202
>
> Obviously it also considers TARGET_BATCH_SIZE_BYTES. But setting the
> corresponding support property did not achieve the desired result in my
> tests, possibly because the "estimatedRecordSize" isn't calculated
> adequately.
>
> My question is, does the number of records in a batch have to be fixed
> at all? Couldn't a server adapt it dynamically to the actual data, while
> sending the stream?
>
> My client code is this (all batches except for the last come with the
> same number of "rowsInBatch"):
>
>     final VectorLoader vectorLoader = new VectorLoader(vectorSchemaRoot);
>     for (;;) {
>         if (! stream.next())
>             break;
>         if (! stream.hasRoot())
>             break;
>         try (ArrowRecordBatch currentRecordBatch =
>                new VectorUnloader(stream.getRoot()).getRecordBatch()) {
>
>             final int rowsInBatch = writeRows(stream.getRoot());
>             rowCount += rowsInBatch;
>             vectorLoader.load(currentRecordBatch);
>         }
>     }
>

Reply via email to