paul-rogers commented on a change in pull request #2000: DRILL-7607: support 
dynamic credit based flow control
URL: https://github.com/apache/drill/pull/2000#discussion_r385997586
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/UnlimitedRawBatchBuffer.java
 ##########
 @@ -90,14 +100,39 @@ public boolean isEmpty() {
 
     @Override
     public void add(RawFragmentBatch batch) {
+      int recordCount = batch.getHeader().getDef().getRecordCount();
+      long bathByteSize = batch.getByteCount();
+      if (recordCount != 0) {
+        //skip first header batch
+        totalBatchSize += bathByteSize;
+        sampleTimes++;
+      }
+      if (sampleTimes == maxSampleTimes) {
+        long averageBathSize = totalBatchSize / sampleTimes;
+        //make a decision
+        long limit = context.getAllocator().getLimit();
 
 Review comment:
   Unless this has been adjusted, the limit is generally meaningless; there is 
no code that sets this to any reasonable value (unless something has changed 
recently.) It generally defaults to 10 GB, which is far too large.
   
   If we use the default, will each receiver try to use the 10 GB of memory? 
More than 1 or 2 queries will cause OOM on many installations.
   
   To provide a bit more background. Unless things changed recently, if the 
query queue is off, there is no memory budget per query. But, if queuing is on, 
there is code that allocates memory to the buffering operators (sort, join, 
etc.) but not (last I looked) to exchanges. (It clearly should do such an 
allocation.) So, if we use the full 10GB (or whatever value) here, the query 
may far exceed the budget and the queuing mechanism will still cause the system 
to exhaust memory before we hit the query limit.
   
   In short, this memory needs to be allocated somehow as part of memory 
planning. I don't see any code in this PR which will do so.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to