Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/750#discussion_r102650409 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/CompliantTextRecordReader.java --- @@ -118,12 +118,21 @@ public boolean apply(@Nullable SchemaPath path) { * @param outputMutator Used to create the schema in the output record batch * @throws ExecutionSetupException */ + @SuppressWarnings("resource") @Override public void setup(OperatorContext context, OutputMutator outputMutator) throws ExecutionSetupException { oContext = context; - readBuffer = context.getManagedBuffer(READ_BUFFER); - whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER); + // Note: DO NOT use managed buffers here. They remain in existence + // until the fragment is shut down. The buffers here are large. --- End diff -- The reason is a bit different. The original call allocates a managed buffer: it is freed only when the fragment context shuts down at the end of query execution. But, if we read many files (5000 in one test case), then we leave 5000 buffers in existence for the whole query. Instead, we want to take control over buffer lifetime. We allocate a regular (not managed) buffer ourselves, and then release it when this reader closes. That way, instead of accumulating 5000 buffers of 1 MB each, we have only one 1 MB buffer in existence at any one time. Of course, a further refinement would be to allocate the buffer on the ScanBatch and have all 5000 readers sequentially share that same buffer. But, I was not sure that any performance benefit was worth the cost in extra code complexity...
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---