[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...
Github user Ben-Zvi closed the pull request at: https://github.com/apache/drill/pull/585 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/585#discussion_r79267883 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java --- @@ -592,11 +592,14 @@ public BatchGroup mergeAndSpill(LinkedList batchGroups) throws Schem } injector.injectChecked(context.getExecutionControls(), INTERRUPTION_WHILE_SPILLING, IOException.class); newGroup.closeOutputStream(); -} catch (Exception e) { +} catch (Throwable e) { // we only need to cleanup newGroup if spill failed - AutoCloseables.close(e, newGroup); + try { +AutoCloseables.close(e, newGroup); + } catch (Throwable t) { /* close() may hit the same IO issue; just ignore */ } --- End diff -- The root cause for the whole bug is in Hadoop's RawLocalFileSystem.java: package org.apache.hadoop.fs; . public void write(byte[] b, int off, int len) throws IOException { try { fos.write(b, off, len); } catch (IOException e) {// unexpected exception throw new FSError(e); // assume native fs error } } And FSError is not a subclass of IOException !!! java.lang.Object java.lang.Throwable java.lang.Error org.apache.hadoop.fs.FSError So the only common ancestor is Throwable . And any part in the drill code that catches only IOException will not catch !! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...
Github user Ben-Zvi commented on a diff in the pull request: https://github.com/apache/drill/pull/585#discussion_r79255636 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java --- @@ -592,11 +592,14 @@ public BatchGroup mergeAndSpill(LinkedList batchGroups) throws Schem } injector.injectChecked(context.getExecutionControls(), INTERRUPTION_WHILE_SPILLING, IOException.class); newGroup.closeOutputStream(); -} catch (Exception e) { +} catch (Throwable e) { // we only need to cleanup newGroup if spill failed - AutoCloseables.close(e, newGroup); + try { +AutoCloseables.close(e, newGroup); + } catch (Throwable t) { /* close() may hit the same IO issue; just ignore */ } --- End diff -- In the case of no disk space to spill, close() tries to cleanup by calling flushBuffer() which eventually throws the same exception as there's still no space: at java.io.FileOutputStream.write(FileOutputStream.java:326) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:246) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) - locked <0x24e5> (a java.io.BufferedOutputStream) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) at java.io.DataOutputStream.write(DataOutputStream.java:107) - locked <0x24e7> (a org.apache.hadoop.fs.FSDataOutputStream) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:419) at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:206) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:163) - locked <0x24e8> (a org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:144) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:407) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:169) at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:53) at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:43) at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:598) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/585#discussion_r79096107 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java --- @@ -592,11 +592,14 @@ public BatchGroup mergeAndSpill(LinkedList batchGroups) throws Schem } injector.injectChecked(context.getExecutionControls(), INTERRUPTION_WHILE_SPILLING, IOException.class); newGroup.closeOutputStream(); -} catch (Exception e) { +} catch (Throwable e) { // we only need to cleanup newGroup if spill failed - AutoCloseables.close(e, newGroup); + try { +AutoCloseables.close(e, newGroup); + } catch (Throwable t) { /* close() may hit the same IO issue; just ignore */ } --- End diff -- It looks like close(Throwable t, AutoCloseable) suppresses the exception; did you get an exception during testing ? Otherwise, you could remove this second try-catch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...
GitHub user Ben-Zvi opened a pull request: https://github.com/apache/drill/pull/585 DRILL-3898 : Sort spill was modified to catch all errors, ignore rep⦠â¦eated errors while closing the new group and issue a more detailed error message. Seems that the spilling IO can run into various kinds of errors (no space, failure to create a file,..) which are thrown as different exception classes. Hence changed the catch() statement to catch a more general Throwable , and add the exception's message for more detail (e.g., no disk space). Before the change the "no disk space" Throwable was not caught, and thus execution continued. Also the closing of the newGroup could hit some IO errors (e.g., when flushing), so a try/catch was added to ignore those. Note that this change should also fix DRILL-4542 ("if external sort fails to spill to disk, memory is leaked and wrong error message is displayed"). You can merge this pull request into a Git repository by running: $ git pull https://github.com/Ben-Zvi/drill DRILL-3898 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/585.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #585 commit e988f1644be1d9fde24a489d94c7dbc54f8e82d8 Author: Boaz Ben-Zvi Date: 2016-09-09T23:36:03Z DRILL-3898 : Sort spill was modified to catch all errors, ignore repeated errors while closing the new group and issue a more detailed error message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---