[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-06-02 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035817#comment-16035817
 ] 

Jinfeng Ni commented on DRILL-5512:
---

Committer is [~sudheeshkatkam] (lost the trace after rebasing).  

> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035803#comment-16035803
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/838


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021460#comment-16021460
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r118045758
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

As it turns out, in addition to fixing ScanBatch, there is a need for a 
parallel implementation to support a the size-aware vector "writer". That new 
version was modified to allow extensive unit testing of all error conditions. 
Can't do that as easily with ScanBatch itself because it has messy references 
to other parts of Drill, which have references, which have references... Pretty 
soon all of Drill is needed to do the tests, which means using the current 
injection framework (which we could do.)


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021378#comment-16021378
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/838
  
+1


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021377#comment-16021377
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r118032231
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

I agree. I was hoping there are tests to show current handling of 
OUT_OF_MEMORY as well.


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018684#comment-16018684
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117622601
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

You raise two good points.

The first is that Drill tends to use unchecked exceptions because, well, 
they are convenient and Drill must handle such unchecked exceptions as NPE, 
illegal state, illegal argument and so on.

But, normal Java practice is to declare exceptions so that callers know 
what they should handle. See DRILL-5386 asks to rationalize exception use. 
Comments in that ticket from project veterans shows some degree of resistance 
to the idea of checked exceptions. So, yes we must expect any unchecked 
exception from any method. This is why operators should handle all exceptions, 
and why we need code to sort out exceptions based on type.

The analysis of OOM is correct, but omits context. It is seldom (never?) 
that a sort sits directly above a scan. Seems most often there is a filter or 
project between them. If the scan hits OOM, it is not likely because it has 
exhausted the memory limit on the scan operator: each operator defaults to 10 
GB limit. Instead, it is likely that overall system memory is exhausted. So, 
what is likely to happen? Each operator between the scan and sort must handle 
the OUT_OF_MEMORY status by bubbling it up the call stack. Let's assume that 
works.

The sort now wants to spill. Spilling is an activity that requires memory 
to perform. Spilling requires a merge phase to combine the records from 
buffered batches in sort order so that the spilled run is sorted. That is, the 
sort must allocate a batch, often many MB in size. (Spilled runs must be 
sizable so we can limit the number of spilled runs merged in the final merge 
phase.)

So, the sort tries to allocate vectors for the merge batch and... The 
allocation fails. Why? Because we are out of system memory -- that's why the 
scanner triggered an OOM.

I can find no code that sets up this out-of-system-memory condition to 
verify that existing code works. I think we were taking it on faith that this 
behavior actually works.

Moving forward, we are working towards a workable solution. Assign the scan 
some amount of memory, and limit batches to fit within that memory. Give the 
sort a certain amount of memory, and have it manage within that memory so that 
when a spill occurs, the sort has sufficient memory to create the required 
merge batches as part of the spill.

Finally, let's consider the case you outlined: the scan fails with OOM on 
the initial allocation. The initial allocation is often small; the scan goes 
through many vector doublings to read the full complement of rows. (At least, 
thats what I've observed empirically; perhaps the original intent was 
different.) Let's say we tried to recover from an initial allocation failure.

Say we have a row with five columns. We allocate three, but fail on the 
fourth. Say the fourth is a Varchar: has two vectors: offset and data. The 
current logic releases the partially-allocated vectors, which is good. 
OUT_OF_MEMORY is returned and the vectors are reallocated if memory could be 
released. Sounds good.

But, most queries run in multiple threads. If one hits OOM, then the others 
probably will as well. The actual system memory is a shared resource, but there 
is no coordination. A scan might release its partially-allocated vectors so the 
sort can, in theory, spill. But, that memory might immediately be grabbed by 
some other thread, resulting in a sort spill OOM. In practice, however, the 
initial vector allocations are much smaller than the merge batch, so it is only 
slightly useful to free up the initial allocation. That initial allocation, 
plus luck that some other thread has freed enough memory, might allow us to 
spill. But it is a crap-shoot.

In short, even if this logic might possibly work in some scenarios in a 
single-threaded query, it is too chaotic to work in general with many threads. 
And, of course no tests exist for either case so we are just guessing.

All-in-all, the above argues strongly that the path forward is to:

1. Rationalize error handling: OOM errors cause 

[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018144#comment-16018144
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117590009
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

I am not sure if this specific line change is required, so please correct 
me if I am wrong. Thinking out loud..

There are three places in ScanBatch where OutOfMemoryException is handled. 
Since OutOfMemoryException is an unchecked exception, I could not quickly find 
all the calls which trigger the exception in this method.

The [first 
case](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L175)
 and [second 
case](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L215)
 are similar in that `reader.allocate(...)` fails. So although there is no 
unwind logic, seems to me, this case is correctly handled as no records have 
been read, and so there is no need to unwind. Say this triggers spilling in 
sort, then the query could complete successfully, if allocate succeeds next 
time (and so on). Am I following this logic correctly?

But this does not seems to be case, as 
[TestOutOfMemoryOutcome](https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/TestOutOfMemoryOutcome.java#L65)
 triggers an OutOfMemoryException during ["next" 
allocation](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L172),
 and all tests are expected to fail.

And then, there is the [third 
case](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java#L247),
 which is a general catch (e.g.`reader.next()` throws OutOfMemoryException). 
And as you mentioned, readers cannot unwind, so that correctly fails the 
fragment.


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018061#comment-16018061
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117580289
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

Good question. Yes, since the `FragmentExecutor` already handles errors and 
unwinds, we just exploit that (existing, working) path instead of the (also 
existing, but harder-to-keep working) path of `fail`/`STOP`.

The (managed) external sort, for example, reports all its errors via 
exceptions; it does not use `fail`/`STOP`. The fragment executor recovers just 
fine.


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018016#comment-16018016
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117576158
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

Makes sense.

Asking the question a different way.. To avoid regressions, should the 
pre-requisite changes 
([DRILL-5211](https://issues.apache.org/jira/browse/DRILL-5211) and pertinent 
tasks) be committed before this patch? Or since _the readers do not correctly 
handle the case_ anyway, there will be no difference?


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018014#comment-16018014
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117576060
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -213,17 +213,16 @@ public IterOutcome next() {
   try {
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught OutOfMemoryException");
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
   }
   addImplicitVectors();
 } catch (ExecutionSetupException e) {
-  this.context.fail(e);
--- End diff --

Sounds good.


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017741#comment-16017741
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117536637
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

As it turns out, the idea of the OUT_OF_MEMORY return code works better in 
theory than in practice. No reader correctly handles this case. Let's say we 
have three columns (a, b, c). Let say that column c needs to double its vector, 
but hits OOM. No reader has the internal state needed to hold onto the value 
for c, unwind the call stack, then on the next next() call, rewind back to the 
point of writing c into the in-flight row.

Moving forward, we want to take a broader approach to memory: budget 
sufficient memory that readers can work. Modify the mutators so that they 
enforce batch size limits so that the reader operates within its budget.

As we move to that approach, the OUT_OF_MEMORY status will be retired.

The JIRA mentions another JIRA that holds a spec for all this stuff; 
something we discussed six months ago, but did not have time to implement then.

This all merits a complete discussion; maybe we can discuss the overall 
approach in that other JIRA.


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017737#comment-16017737
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117535728
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -213,17 +213,16 @@ public IterOutcome next() {
   try {
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught OutOfMemoryException");
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
   }
   addImplicitVectors();
 } catch (ExecutionSetupException e) {
-  this.context.fail(e);
--- End diff --

Throwing an exception, it turns out, does exactly the same: it cancels the 
query and causes the fragment executor to cascade close() calls to all the 
operators (record batches) in the fragment tree. It seems some code kills the 
query by throwing an exception, other code calls the fail method and bubbles up 
STOP. But, since the proper way to handle STOP is to unwind the stack, STOP is 
equivalent to throwing an exception.

The idea is, rather than have two ways to clean up, let's standardize on 
one. Since we must handle unchecked exceptions in any case, the exception-based 
solution is the logical choice for standardization.


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017707#comment-16017707
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117532070
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -213,17 +213,16 @@ public IterOutcome next() {
   try {
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught OutOfMemoryException");
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
   }
   addImplicitVectors();
 } catch (ExecutionSetupException e) {
-  this.context.fail(e);
--- End diff --

This call triggers query failure (stopping the fragment, notifying the 
Foreman, and cancelling other fragments, etc.). What is the flow after this 
change?

Similar changes below.


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017708#comment-16017708
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117531676
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 
---
@@ -173,9 +174,8 @@ public IterOutcome next() {
 
 currentReader.allocate(mutator.fieldVectorMap());
   } catch (OutOfMemoryException e) {
-logger.debug("Caught Out of Memory Exception", e);
 clearFieldVectorMap();
-return IterOutcome.OUT_OF_MEMORY;
+throw UserException.memoryError(e).build(logger);
--- End diff --

The non-managed external sort spills to disk in case it receives this 
outcome. I do not know if there are other operators that handle this outcome. 
Are all the pre-requisite changes (to handle this change) already committed?


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1601#comment-1601
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117375699
  
--- Diff: protocol/src/main/protobuf/UserBitShared.proto ---
@@ -74,16 +74,22 @@ message DrillPBError{
  */
 RESOURCE = 7;
 /* equivalent to SQLNonTransientException.
+ * - out of memory
--- End diff --

Fixed.


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015232#comment-16015232
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user sohami commented on the issue:

https://github.com/apache/drill/pull/838
  
+1 (pending minor comment).


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015230#comment-16015230
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117161913
  
--- Diff: 
protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java ---
@@ -2178,6 +2178,8 @@ public DrillPBError parsePartialFrom(
*
* 
* equivalent to SQLNonTransientException.
+   * - out of memory
--- End diff --

Same as below.


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015231#comment-16015231
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/838#discussion_r117161761
  
--- Diff: protocol/src/main/protobuf/UserBitShared.proto ---
@@ -74,16 +74,22 @@ message DrillPBError{
  */
 RESOURCE = 7;
 /* equivalent to SQLNonTransientException.
+ * - out of memory
--- End diff --

for out of memory we throw "memoryError" which internally set's the 
errorType as RESOURCE not SYSTEM.


> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5512) Standardize error handling in ScanBatch

2017-05-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011586#comment-16011586
 ] 

ASF GitHub Bot commented on DRILL-5512:
---

GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/838

DRILL-5512: Standardize error handling in ScanBatch

Standardizes error handling to throw a UserException. Prior code threw
various exceptions, called the fail() method, or returned a variety of
status codes.

Note: this PR depends upon DRILL-5511, PR #836. For this PR, please review 
only the changes introduced in DRILL-5512. Commit this after committing 
DRILL-5511.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-5512

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #838


commit b31d40df314d76ae6c458891e65d423b4b9e61cb
Author: Paul Rogers 
Date:   2017-05-15T21:23:10Z

DRILL-5511: Additional UserException categories

Adds three kinds of internal errors:
1. General execution exception: something failed due to issues outside
Drill.
2. Internal error: something failed due to a likely code error.
3. Unspecified: something failed, but we don’t know why.

commit b8a3f15526d83369ce124e777e9d6e7afa84d7e0
Author: Paul Rogers 
Date:   2017-05-15T22:00:21Z

DRILL-5512: Standardize error handling in ScanBatch

Standardizes error handling to throw a UserException. Prior code threw
various exceptions, called the fail() method, or returned a variety of
status codes.




> Standardize error handling in ScanBatch
> ---
>
> Key: DRILL-5512
> URL: https://issues.apache.org/jira/browse/DRILL-5512
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10.0
>
>
> ScanBatch is the Drill operator executor that handles most readers. Like most 
> Drill operators, it uses an ad-hoc set of error detection and reporting 
> methods that evolved over Drill development.
> This ticket asks to standardize on error handling as outlined in DRILL-5083. 
> This basically means reporting all errors as a {{UserException}} rather than 
> using the {{IterOutcome.STOP}} return status or using the 
> {{FragmentContext.fail()}} method.
> This work requires the new error codes introduced in DRILL-5511, and is a 
> step toward making readers aware of vector size limits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)