IIRC I don't think flight requires equal size batches, this seems to be an issue in Dremio's implementation.
On Wed, Oct 18, 2023 at 8:25 AM Gunther Rademacher <g...@gmx.de> wrote: > When my Java Flight client receives batches of records (as in the sample > code shown at the bottom), Dremio sends them in batches with a fixed > number of records (apparently 3968 by default). My problem is that both > batch size and record size determine the memory requirement. So with a > fixed batch size, available memory may remain unused for small records, > while it may overflow for large records. My goal is to be able to stream > records of arbitrary size within some limited amount of direct memory. > > The batch size calculation in Dremio seems to be done here: > > https://github.com/dremio/dremio-oss/blob/be47367c523e4eded35df4e8fff725f53160558e/sabot/kernel/src/main/java/com/dremio/exec/planner/physical/PhysicalPlanCreator.java#L185-L202 > > Obviously it also considers TARGET_BATCH_SIZE_BYTES. But setting the > corresponding support property did not achieve the desired result in my > tests, possibly because the "estimatedRecordSize" isn't calculated > adequately. > > My question is, does the number of records in a batch have to be fixed > at all? Couldn't a server adapt it dynamically to the actual data, while > sending the stream? > > My client code is this (all batches except for the last come with the > same number of "rowsInBatch"): > > final VectorLoader vectorLoader = new VectorLoader(vectorSchemaRoot); > for (;;) { > if (! stream.next()) > break; > if (! stream.hasRoot()) > break; > try (ArrowRecordBatch currentRecordBatch = > new VectorUnloader(stream.getRoot()).getRecordBatch()) { > > final int rowsInBatch = writeRows(stream.getRoot()); > rowCount += rowsInBatch; > vectorLoader.load(currentRecordBatch); > } > } >