Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/750#discussion_r102650409
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/CompliantTextRecordReader.java
 ---
    @@ -118,12 +118,21 @@ public boolean apply(@Nullable SchemaPath path) {
        * @param outputMutator  Used to create the schema in the output record 
batch
        * @throws ExecutionSetupException
        */
    +  @SuppressWarnings("resource")
       @Override
       public void setup(OperatorContext context, OutputMutator outputMutator) 
throws ExecutionSetupException {
     
         oContext = context;
    -    readBuffer = context.getManagedBuffer(READ_BUFFER);
    -    whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);
    +    // Note: DO NOT use managed buffers here. They remain in existence
    +    // until the fragment is shut down. The buffers here are large.
    --- End diff --
    
    The reason is a bit different. The original call allocates a managed 
buffer: it is freed only when the fragment context shuts down at the end of 
query execution. But, if we read many files (5000 in one test case), then we 
leave 5000 buffers in existence for the whole query.
    
    Instead, we want to take control over buffer lifetime. We allocate a 
regular (not managed) buffer ourselves, and then release it when this reader 
closes.
    
    That way, instead of accumulating 5000 buffers of 1 MB each, we have only 
one 1 MB buffer in existence at any one time.
    
    Of course, a further refinement would be to allocate the buffer on the 
ScanBatch and have all 5000 readers sequentially share that same buffer. But, I 
was not sure that any performance benefit was worth the cost in extra code 
complexity...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to