Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/838#discussion_r117536637
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
currentReader.allocate(mutator.fieldVectorMap());
} catch (OutOfMemoryException e) {
- logger.debug("Caught Out of Memory Exception", e);
clearFieldVectorMap();
- return IterOutcome.OUT_OF_MEMORY;
+ throw UserException.memoryError(e).build(logger);
--- End diff --
As it turns out, the idea of the OUT_OF_MEMORY return code works better in
theory than in practice. No reader correctly handles this case. Let's say we
have three columns (a, b, c). Let say that column c needs to double its vector,
but hits OOM. No reader has the internal state needed to hold onto the value
for c, unwind the call stack, then on the next next() call, rewind back to the
point of writing c into the in-flight row.
Moving forward, we want to take a broader approach to memory: budget
sufficient memory that readers can work. Modify the mutators so that they
enforce batch size limits so that the reader operates within its budget.
As we move to that approach, the OUT_OF_MEMORY status will be retired.
The JIRA mentions another JIRA that holds a spec for all this stuff;
something we discussed six months ago, but did not have time to implement then.
This all merits a complete discussion; maybe we can discuss the overall
approach in that other JIRA.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---