[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user squito commented on the issue: https://github.com/apache/spark/pull/17364 merged to master sorry I forgot to take look at this for a while @steveloughran, thanks for the reminder --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 @squito Is this ready to go in? Like I warned, I'm not going to add tests for this, not on its own --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 I don't have a time/plans to do the test here, as it's a fairly complex piece of test setup for what a review should show isn't doing anything other than guarantee the outcome pf `currentWriter==null` out of all sequential execution paths --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 looking some more, yes, as `tryWithSafeFinallyAndFailureCallbacks` wraps task commit, it guarantees that the original cause doesn't get lost. The abortJob code isn't so well guarded, and looks like a failure there my hide a previous one (like a commitJob failure). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user squito commented on the issue: https://github.com/apache/spark/pull/17364 > Note that as the exception handler tries to close resources before calling committer.abortTask(taskAttemptContext), without this patch a failing releaseResources() means that abortTask() isn't invoked, though hopefully abortJob will handle tasks too it doesn't look that way to me -- the `abortTask` is in a `finally`, so it should get called no matter what. You could add a very targeted regression test -- create the `ExecuteWriteTask` with mock OutputFormats, have those mocks throw an exception on `close`. Then make sure two calls to `releaseResources` work correctly. Your change is obviously correct, but would help avoid future regressions. in any case, lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 Created [SPARK-20045](https://issues.apache.org/jira/browse/SPARK-20045). I think there's room to improve resilience in the abort code, primarily to ensure that the underlying failure cause doesn't get lost. The codepath there is fairly complex and I'm not going to point at a snippet and say "here". Some faulting mock committer would probably be the actual first step: show the problems, then fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 I haven't reviewed that bit of code: make it a separate JIRA and assign to me. This one I came across in the HADOOP-2.8.0 RC3 testing; the underlying fix there is going in, but the spark code still should be made more resilient to failure here. It's always those failure modes which get you âand working with S3, that close() can be where the PUT is initiated, so P(fail) > 0. Part of [HADOOP-13786] (https://issues.apache.org/jira/browse/HADOOP-13786) is MAPREDUCE-6823: making it straighforward to define a different implementation of the FileOutputFormat committer. That should make it easier to do some fault injection in the commit processes, especially all those bits that violate the state machine entirely. I'll see what I can do about breaking things :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17364 @steveloughran, maybe this is strictly not related with the problem specified in JIRA but do you know if we should do the same thing to `SparkHadoopMapReduceWriter`? I remember I had to fix the resource related problem identically with `FileFormatWriter` before later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17364 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17364 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74903/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17364 **[Test build #74903 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74903/testReport)** for PR 17364 at commit [`725740b`](https://github.com/apache/spark/commit/725740b49a2b37392699092b1b0e08c63a6152ff). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17364: [SPARK-20038] [SQL]: FileFormatWriter.ExecuteWriteTask.r...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/17364 Note that as [the exception handler](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L244) tries to close resources before calling `committer.abortTask(taskAttemptContext)`, without this patch a failing `releaseResources()` means that ` abortTask()` isn't invoked, though hopefully abortJob will handle tasks too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org