[Flight][Java][Dremio] Could the batch size adapt dynamically to actual data?

Gunther Rademacher Wed, 18 Oct 2023 08:25:53 -0700

When my Java Flight client receives batches of records (as in the sample
code shown at the bottom), Dremio sends them in batches with a fixed
number of records (apparently 3968 by default). My problem is that both
batch size and record size determine the memory requirement. So with a
fixed batch size, available memory may remain unused for small records,
while it may overflow for large records. My goal is to be able to stream
records of arbitrary size within some limited amount of direct memory.


The batch size calculation in Dremio seems to be done here:
https://github.com/dremio/dremio-oss/blob/be47367c523e4eded35df4e8fff725f53160558e/sabot/kernel/src/main/java/com/dremio/exec/planner/physical/PhysicalPlanCreator.java#L185-L202

Obviously it also considers TARGET_BATCH_SIZE_BYTES. But setting the
corresponding support property did not achieve the desired result in my
tests, possibly because the "estimatedRecordSize" isn't calculated
adequately.

My question is, does the number of records in a batch have to be fixed
at all? Couldn't a server adapt it dynamically to the actual data, while
sending the stream?

My client code is this (all batches except for the last come with the
same number of "rowsInBatch"):

   final VectorLoader vectorLoader = new VectorLoader(vectorSchemaRoot);
   for (;;) {
       if (! stream.next())
           break;
       if (! stream.hasRoot())
           break;
       try (ArrowRecordBatch currentRecordBatch =
              new VectorUnloader(stream.getRoot()).getRecordBatch()) {

           final int rowsInBatch = writeRows(stream.getRoot());
           rowCount += rowsInBatch;
           vectorLoader.load(currentRecordBatch);
       }
   }

[Flight][Java][Dremio] Could the batch size adapt dynamically to actual data?

Reply via email to