When my Java Flight client receives batches of records (as in the sample code shown at the bottom), Dremio sends them in batches with a fixed number of records (apparently 3968 by default). My problem is that both batch size and record size determine the memory requirement. So with a fixed batch size, available memory may remain unused for small records, while it may overflow for large records. My goal is to be able to stream records of arbitrary size within some limited amount of direct memory.
The batch size calculation in Dremio seems to be done here: https://github.com/dremio/dremio-oss/blob/be47367c523e4eded35df4e8fff725f53160558e/sabot/kernel/src/main/java/com/dremio/exec/planner/physical/PhysicalPlanCreator.java#L185-L202 Obviously it also considers TARGET_BATCH_SIZE_BYTES. But setting the corresponding support property did not achieve the desired result in my tests, possibly because the "estimatedRecordSize" isn't calculated adequately. My question is, does the number of records in a batch have to be fixed at all? Couldn't a server adapt it dynamically to the actual data, while sending the stream? My client code is this (all batches except for the last come with the same number of "rowsInBatch"): final VectorLoader vectorLoader = new VectorLoader(vectorSchemaRoot); for (;;) { if (! stream.next()) break; if (! stream.hasRoot()) break; try (ArrowRecordBatch currentRecordBatch = new VectorUnloader(stream.getRoot()).getRecordBatch()) { final int rowsInBatch = writeRows(stream.getRoot()); rowCount += rowsInBatch; vectorLoader.load(currentRecordBatch); } }