paul-rogers commented on a change in pull request #2000: DRILL-7607: support
dynamic credit based flow control
URL: https://github.com/apache/drill/pull/2000#discussion_r385997586
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/UnlimitedRawBatchBuffer.java
##########
@@ -90,14 +100,39 @@ public boolean isEmpty() {
@Override
public void add(RawFragmentBatch batch) {
+ int recordCount = batch.getHeader().getDef().getRecordCount();
+ long bathByteSize = batch.getByteCount();
+ if (recordCount != 0) {
+ //skip first header batch
+ totalBatchSize += bathByteSize;
+ sampleTimes++;
+ }
+ if (sampleTimes == maxSampleTimes) {
+ long averageBathSize = totalBatchSize / sampleTimes;
+ //make a decision
+ long limit = context.getAllocator().getLimit();
Review comment:
Unless this has been adjusted, the limit is generally meaningless; there is
no code that sets this to any reasonable value (unless something has changed
recently.) It generally defaults to 10 GB, which is far too large.
If we use the default, will each receiver try to use the 10 GB of memory?
More than 1 or 2 queries will cause OOM on many installations.
To provide a bit more background. Unless things changed recently, if the
query queue is off, there is no memory budget per query. But, if queuing is on,
there is code that allocates memory to the buffering operators (sort, join,
etc.) but not (last I looked) to exchanges. (It clearly should do such an
allocation.) So, if we use the full 10GB (or whatever value) here, the query
may far exceed the budget and the queuing mechanism will still cause the system
to exhaust memory before we hit the query limit.
In short, this memory needs to be allocated somehow as part of memory
planning. I don't see any code in this PR which will do so.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services