paul-rogers commented on a change in pull request #2000: DRILL-7607: support
dynamic credit based flow control
URL: https://github.com/apache/drill/pull/2000#discussion_r385998974
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/UnlimitedRawBatchBuffer.java
##########
@@ -90,14 +100,39 @@ public boolean isEmpty() {
@Override
public void add(RawFragmentBatch batch) {
+ int recordCount = batch.getHeader().getDef().getRecordCount();
+ long bathByteSize = batch.getByteCount();
+ if (recordCount != 0) {
+ //skip first header batch
+ totalBatchSize += bathByteSize;
+ sampleTimes++;
+ }
+ if (sampleTimes == maxSampleTimes) {
+ long averageBathSize = totalBatchSize / sampleTimes;
+ //make a decision
+ long limit = context.getAllocator().getLimit();
Review comment:
Another issue is the question of how many of these receivers exist per
Drillbit. I don't know the answer. If I have 5 minor fragments on this
Drillbit, will all 5 have their own flow control calcs? Will I have 5 fragments
each trying to use 50% of 10GB for a total of 20GB of buffering? Will this be a
problem?
Also, how can this algorithm go wrong? Suppose I have a set of files
organized by time. I do a time range query. The first few batches might have
very few rows because the filter is picking up just a few early arrivals. We
see three batches, say, where the filter had low selectivity, of a few dozen
rows, then decide we can hold many batches.
Later, the scan hits the bulk of my time ranges and the batches have far
fewer rows filtered out. Suddenly, we need far more memory for these
low-selectivity batches.
Do we need a safety valve that says that we will back off if we suddenly see
large batches?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services