[jira] [Commented] (SPARK-11066) Flaky test o.a.scheduler.DAGSchedulerSuite.misbehavedResultHandler occasionally fails due to j.l.UnsupportedOperationException concerning a finished JobWaiter

2015-10-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953241#comment-14953241
 ] 

Apache Spark commented on SPARK-11066:
--

User 'shellberg' has created a pull request for this issue:
https://github.com/apache/spark/pull/9076

> Flaky test o.a.scheduler.DAGSchedulerSuite.misbehavedResultHandler 
> occasionally fails due to j.l.UnsupportedOperationException concerning a 
> finished JobWaiter
> --
>
> Key: SPARK-11066
> URL: https://issues.apache.org/jira/browse/SPARK-11066
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1
> Environment: Multiple OS and platform types.
> (Also observed by others, e.g. see External URL)
>Reporter: Dr Stephen A Hellberg
>Priority: Minor
> Fix For: 1.5.1, 1.6.0
>
>
> The DAGSchedulerSuite test for the "misbehaved ResultHandler" has an inherent 
> problem: it creates a job for the DAGScheduler comprising multiple (2) tasks, 
> but whilst the job will fail and a SparkDriverExecutionException will be 
> returned, a race condition exists as to whether the first task's 
> (deliberately) thrown exception causes the job to fail - and having its 
> causing exception set to the DAGSchedulerSuiteDummyException that was thrown 
> as the setup of the misbehaving test - or second (and subsequent) tasks who 
> equally end, but have instead the DAGScheduler's legitimate 
> UnsupportedOperationException (a subclass of RuntimeException) returned 
> instead as their causing exception.  This race condition is likely associated 
> with the vagaries of processing quanta, and expense of throwing two 
> exceptions (under interpreter execution) per thread of control; this race is 
> usually 'won' by the first task throwing the DAGSchedulerDummyException, as 
> desired (and expected)... but not always.
> The problem for the testcase is that the first assertion is largely 
> concerning the test setup, and doesn't (can't? Sorry, still not a ScalaTest 
> expert) capture all the causes of SparkDriverExecutionException that can 
> legitimately arise from a correctly working (not crashed) DAGScheduler.  
> Arguably, this assertion might test something of the DAGScheduler... but not 
> all the possible outcomes for a working DAGScheduler.  Nevertheless, this 
> test - when comprising a multiple task job - will report as a failure when in 
> fact the DAGScheduler is working-as-designed (and not crashed ;-).  
> Furthermore, the test is already failed before it actually tries to use the 
> SparkContext a second time (for an arbitrary processing task), which I think 
> is the real subject of the test?
> The solution, I submit, is to ensure that the job is composed of just one 
> task, and that single task will result in the call to the compromised 
> ResultHandler causing the test's deliberate exception to be thrown and 
> exercising the relevant (DAGScheduler) code paths.  Given tasks are scoped by 
> the number of partitions of an RDD, this could be achieved with a single 
> partitioned RDD (indeed, doing so seems to exercise/would test some default 
> parallelism support of the TaskScheduler?); the pull request offered, 
> however, is based on the minimal change of just using a single partition of 
> the 2 (or more) partition parallelized RDD.  This will result in scheduling a 
> job of just one task, one successful task calling the user-supplied 
> compromised ResultHandler function, which results in failing the job and 
> unambiguously wrapping our DAGSchedulerSuiteException inside a 
> SparkDriverExecutionException; there are no other tasks that on running 
> successfully will find the job failed causing the 'undesired' 
> UnsupportedOperationException to be thrown instead.  This, then, satisfies 
> the test's setup assertion.
> I have tested this hypothesis having parametised the number of partitions, N, 
> used by the "misbehaved ResultHandler" job and have observed the 1 x 
> DAGSchedulerSuiteException first, followed by the legitimate N-1 x 
> UnsupportedOperationExceptions ... what propagates back from the job seems to 
> simply become the result of the race between task threads and the 
> intermittent failures observed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11066) Flaky test o.a.scheduler.DAGSchedulerSuite.misbehavedResultHandler occasionally fails due to j.l.UnsupportedOperationException concerning a finished JobWaiter

2015-10-13 Thread Dr Stephen A Hellberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954736#comment-14954736
 ] 

Dr Stephen A Hellberg commented on SPARK-11066:
---

Sean : apologies, re: Fix Version and Target Version.  I was led astray in 
interpreting their purpose given they were present on the Create issue template.
Fix Version makes complete sense: until the fix is integrated its not fixed; 
Target version... I'd interpreted that as where I'd hope to see the fix 
released/is suitable for being applied.  I know this issue arises in the 1.4.x 
release (and probably before) but I'm mostly interested in seeing this 
addressed in current/future releases; my fix is likely sufficient in prior 
releases equally, so what criteria is used to suggest how far back a committer 
would backport into prior releases (given only Affects Versions)?

> Flaky test o.a.scheduler.DAGSchedulerSuite.misbehavedResultHandler 
> occasionally fails due to j.l.UnsupportedOperationException concerning a 
> finished JobWaiter
> --
>
> Key: SPARK-11066
> URL: https://issues.apache.org/jira/browse/SPARK-11066
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, Tests
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1
> Environment: Multiple OS and platform types.
> (Also observed by others, e.g. see External URL)
>Reporter: Dr Stephen A Hellberg
>Priority: Minor
>
> The DAGSchedulerSuite test for the "misbehaved ResultHandler" has an inherent 
> problem: it creates a job for the DAGScheduler comprising multiple (2) tasks, 
> but whilst the job will fail and a SparkDriverExecutionException will be 
> returned, a race condition exists as to whether the first task's 
> (deliberately) thrown exception causes the job to fail - and having its 
> causing exception set to the DAGSchedulerSuiteDummyException that was thrown 
> as the setup of the misbehaving test - or second (and subsequent) tasks who 
> equally end, but have instead the DAGScheduler's legitimate 
> UnsupportedOperationException (a subclass of RuntimeException) returned 
> instead as their causing exception.  This race condition is likely associated 
> with the vagaries of processing quanta, and expense of throwing two 
> exceptions (under interpreter execution) per thread of control; this race is 
> usually 'won' by the first task throwing the DAGSchedulerDummyException, as 
> desired (and expected)... but not always.
> The problem for the testcase is that the first assertion is largely 
> concerning the test setup, and doesn't (can't? Sorry, still not a ScalaTest 
> expert) capture all the causes of SparkDriverExecutionException that can 
> legitimately arise from a correctly working (not crashed) DAGScheduler.  
> Arguably, this assertion might test something of the DAGScheduler... but not 
> all the possible outcomes for a working DAGScheduler.  Nevertheless, this 
> test - when comprising a multiple task job - will report as a failure when in 
> fact the DAGScheduler is working-as-designed (and not crashed ;-).  
> Furthermore, the test is already failed before it actually tries to use the 
> SparkContext a second time (for an arbitrary processing task), which I think 
> is the real subject of the test?
> The solution, I submit, is to ensure that the job is composed of just one 
> task, and that single task will result in the call to the compromised 
> ResultHandler causing the test's deliberate exception to be thrown and 
> exercising the relevant (DAGScheduler) code paths.  Given tasks are scoped by 
> the number of partitions of an RDD, this could be achieved with a single 
> partitioned RDD (indeed, doing so seems to exercise/would test some default 
> parallelism support of the TaskScheduler?); the pull request offered, 
> however, is based on the minimal change of just using a single partition of 
> the 2 (or more) partition parallelized RDD.  This will result in scheduling a 
> job of just one task, one successful task calling the user-supplied 
> compromised ResultHandler function, which results in failing the job and 
> unambiguously wrapping our DAGSchedulerSuiteException inside a 
> SparkDriverExecutionException; there are no other tasks that on running 
> successfully will find the job failed causing the 'undesired' 
> UnsupportedOperationException to be thrown instead.  This, then, satisfies 
> the test's setup assertion.
> I have tested this hypothesis having parametised the number of partitions, N, 
> used by the "misbehaved ResultHandler" job and have observed the 1 x 
> DAGSchedulerSuiteException first, followed by the legitimate N-1 x 
> UnsupportedOperationExceptions ... what propagat

[jira] [Commented] (SPARK-11066) Flaky test o.a.scheduler.DAGSchedulerSuite.misbehavedResultHandler occasionally fails due to j.l.UnsupportedOperationException concerning a finished JobWaiter

2015-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954757#comment-14954757
 ] 

Sean Owen commented on SPARK-11066:
---

Yes, the problem is anyone who submits a JIRA presumably wants to see it 
addressed and soon. Few are actually actionable, valid, and something that the 
submitter follows through on. Hence Target Version ought to be set only by 
someone who is willing and able to drive to a resolution. Then the view of 
JIRAs targeted at a release is a somewhat reliable picture of what could happen 
in that release. It's still used unevenly but that's the reason.

If it's likely to be resolved rapidly like this one I usually don't even 
bother, but, it'd be valid to target at 1.6 / 1.5.2 after seeing it's probably 
a fine change that passes tests, etc (still some style failures)

> Flaky test o.a.scheduler.DAGSchedulerSuite.misbehavedResultHandler 
> occasionally fails due to j.l.UnsupportedOperationException concerning a 
> finished JobWaiter
> --
>
> Key: SPARK-11066
> URL: https://issues.apache.org/jira/browse/SPARK-11066
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, Tests
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1
> Environment: Multiple OS and platform types.
> (Also observed by others, e.g. see External URL)
>Reporter: Dr Stephen A Hellberg
>Priority: Minor
>
> The DAGSchedulerSuite test for the "misbehaved ResultHandler" has an inherent 
> problem: it creates a job for the DAGScheduler comprising multiple (2) tasks, 
> but whilst the job will fail and a SparkDriverExecutionException will be 
> returned, a race condition exists as to whether the first task's 
> (deliberately) thrown exception causes the job to fail - and having its 
> causing exception set to the DAGSchedulerSuiteDummyException that was thrown 
> as the setup of the misbehaving test - or second (and subsequent) tasks who 
> equally end, but have instead the DAGScheduler's legitimate 
> UnsupportedOperationException (a subclass of RuntimeException) returned 
> instead as their causing exception.  This race condition is likely associated 
> with the vagaries of processing quanta, and expense of throwing two 
> exceptions (under interpreter execution) per thread of control; this race is 
> usually 'won' by the first task throwing the DAGSchedulerDummyException, as 
> desired (and expected)... but not always.
> The problem for the testcase is that the first assertion is largely 
> concerning the test setup, and doesn't (can't? Sorry, still not a ScalaTest 
> expert) capture all the causes of SparkDriverExecutionException that can 
> legitimately arise from a correctly working (not crashed) DAGScheduler.  
> Arguably, this assertion might test something of the DAGScheduler... but not 
> all the possible outcomes for a working DAGScheduler.  Nevertheless, this 
> test - when comprising a multiple task job - will report as a failure when in 
> fact the DAGScheduler is working-as-designed (and not crashed ;-).  
> Furthermore, the test is already failed before it actually tries to use the 
> SparkContext a second time (for an arbitrary processing task), which I think 
> is the real subject of the test?
> The solution, I submit, is to ensure that the job is composed of just one 
> task, and that single task will result in the call to the compromised 
> ResultHandler causing the test's deliberate exception to be thrown and 
> exercising the relevant (DAGScheduler) code paths.  Given tasks are scoped by 
> the number of partitions of an RDD, this could be achieved with a single 
> partitioned RDD (indeed, doing so seems to exercise/would test some default 
> parallelism support of the TaskScheduler?); the pull request offered, 
> however, is based on the minimal change of just using a single partition of 
> the 2 (or more) partition parallelized RDD.  This will result in scheduling a 
> job of just one task, one successful task calling the user-supplied 
> compromised ResultHandler function, which results in failing the job and 
> unambiguously wrapping our DAGSchedulerSuiteException inside a 
> SparkDriverExecutionException; there are no other tasks that on running 
> successfully will find the job failed causing the 'undesired' 
> UnsupportedOperationException to be thrown instead.  This, then, satisfies 
> the test's setup assertion.
> I have tested this hypothesis having parametised the number of partitions, N, 
> used by the "misbehaved ResultHandler" job and have observed the 1 x 
> DAGSchedulerSuiteException first, followed by the legitimate N-1 x 
> UnsupportedOperationExceptions ... what propagates back from the job seems to 
> s

[jira] [Commented] (SPARK-11066) Flaky test o.a.scheduler.DAGSchedulerSuite.misbehavedResultHandler occasionally fails due to j.l.UnsupportedOperationException concerning a finished JobWaiter

2015-10-13 Thread Dr Stephen A Hellberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955085#comment-14955085
 ] 

Dr Stephen A Hellberg commented on SPARK-11066:
---

Thanks for the clarification Sean.  And, I've given my patches' comments a bit 
of a haircut... Sorry, I probably err on verbosity.
(Ahem, some would likely consider that a stylistic failure ;-) ).

I've also had a go at getting to grips with  the dev/lint-scala tool applied to 
the codebase with my proposed (revised) patch, which passes now.

> Flaky test o.a.scheduler.DAGSchedulerSuite.misbehavedResultHandler 
> occasionally fails due to j.l.UnsupportedOperationException concerning a 
> finished JobWaiter
> --
>
> Key: SPARK-11066
> URL: https://issues.apache.org/jira/browse/SPARK-11066
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, Tests
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1
> Environment: Multiple OS and platform types.
> (Also observed by others, e.g. see External URL)
>Reporter: Dr Stephen A Hellberg
>Priority: Minor
>
> The DAGSchedulerSuite test for the "misbehaved ResultHandler" has an inherent 
> problem: it creates a job for the DAGScheduler comprising multiple (2) tasks, 
> but whilst the job will fail and a SparkDriverExecutionException will be 
> returned, a race condition exists as to whether the first task's 
> (deliberately) thrown exception causes the job to fail - and having its 
> causing exception set to the DAGSchedulerSuiteDummyException that was thrown 
> as the setup of the misbehaving test - or second (and subsequent) tasks who 
> equally end, but have instead the DAGScheduler's legitimate 
> UnsupportedOperationException (a subclass of RuntimeException) returned 
> instead as their causing exception.  This race condition is likely associated 
> with the vagaries of processing quanta, and expense of throwing two 
> exceptions (under interpreter execution) per thread of control; this race is 
> usually 'won' by the first task throwing the DAGSchedulerDummyException, as 
> desired (and expected)... but not always.
> The problem for the testcase is that the first assertion is largely 
> concerning the test setup, and doesn't (can't? Sorry, still not a ScalaTest 
> expert) capture all the causes of SparkDriverExecutionException that can 
> legitimately arise from a correctly working (not crashed) DAGScheduler.  
> Arguably, this assertion might test something of the DAGScheduler... but not 
> all the possible outcomes for a working DAGScheduler.  Nevertheless, this 
> test - when comprising a multiple task job - will report as a failure when in 
> fact the DAGScheduler is working-as-designed (and not crashed ;-).  
> Furthermore, the test is already failed before it actually tries to use the 
> SparkContext a second time (for an arbitrary processing task), which I think 
> is the real subject of the test?
> The solution, I submit, is to ensure that the job is composed of just one 
> task, and that single task will result in the call to the compromised 
> ResultHandler causing the test's deliberate exception to be thrown and 
> exercising the relevant (DAGScheduler) code paths.  Given tasks are scoped by 
> the number of partitions of an RDD, this could be achieved with a single 
> partitioned RDD (indeed, doing so seems to exercise/would test some default 
> parallelism support of the TaskScheduler?); the pull request offered, 
> however, is based on the minimal change of just using a single partition of 
> the 2 (or more) partition parallelized RDD.  This will result in scheduling a 
> job of just one task, one successful task calling the user-supplied 
> compromised ResultHandler function, which results in failing the job and 
> unambiguously wrapping our DAGSchedulerSuiteException inside a 
> SparkDriverExecutionException; there are no other tasks that on running 
> successfully will find the job failed causing the 'undesired' 
> UnsupportedOperationException to be thrown instead.  This, then, satisfies 
> the test's setup assertion.
> I have tested this hypothesis having parametised the number of partitions, N, 
> used by the "misbehaved ResultHandler" job and have observed the 1 x 
> DAGSchedulerSuiteException first, followed by the legitimate N-1 x 
> UnsupportedOperationExceptions ... what propagates back from the job seems to 
> simply become the result of the race between task threads and the 
> intermittent failures observed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For addition

[jira] [Commented] (SPARK-11066) Flaky test o.a.scheduler.DAGSchedulerSuite.misbehavedResultHandler occasionally fails due to j.l.UnsupportedOperationException concerning a finished JobWaiter

2015-10-13 Thread Dr Stephen A Hellberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955337#comment-14955337
 ] 

Dr Stephen A Hellberg commented on SPARK-11066:
---

Added some indentation to those comments for alignment... hopefully in-line 
with SPARK practice!

> Flaky test o.a.scheduler.DAGSchedulerSuite.misbehavedResultHandler 
> occasionally fails due to j.l.UnsupportedOperationException concerning a 
> finished JobWaiter
> --
>
> Key: SPARK-11066
> URL: https://issues.apache.org/jira/browse/SPARK-11066
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, Tests
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1
> Environment: Multiple OS and platform types.
> (Also observed by others, e.g. see External URL)
>Reporter: Dr Stephen A Hellberg
>Priority: Minor
>
> The DAGSchedulerSuite test for the "misbehaved ResultHandler" has an inherent 
> problem: it creates a job for the DAGScheduler comprising multiple (2) tasks, 
> but whilst the job will fail and a SparkDriverExecutionException will be 
> returned, a race condition exists as to whether the first task's 
> (deliberately) thrown exception causes the job to fail - and having its 
> causing exception set to the DAGSchedulerSuiteDummyException that was thrown 
> as the setup of the misbehaving test - or second (and subsequent) tasks who 
> equally end, but have instead the DAGScheduler's legitimate 
> UnsupportedOperationException (a subclass of RuntimeException) returned 
> instead as their causing exception.  This race condition is likely associated 
> with the vagaries of processing quanta, and expense of throwing two 
> exceptions (under interpreter execution) per thread of control; this race is 
> usually 'won' by the first task throwing the DAGSchedulerDummyException, as 
> desired (and expected)... but not always.
> The problem for the testcase is that the first assertion is largely 
> concerning the test setup, and doesn't (can't? Sorry, still not a ScalaTest 
> expert) capture all the causes of SparkDriverExecutionException that can 
> legitimately arise from a correctly working (not crashed) DAGScheduler.  
> Arguably, this assertion might test something of the DAGScheduler... but not 
> all the possible outcomes for a working DAGScheduler.  Nevertheless, this 
> test - when comprising a multiple task job - will report as a failure when in 
> fact the DAGScheduler is working-as-designed (and not crashed ;-).  
> Furthermore, the test is already failed before it actually tries to use the 
> SparkContext a second time (for an arbitrary processing task), which I think 
> is the real subject of the test?
> The solution, I submit, is to ensure that the job is composed of just one 
> task, and that single task will result in the call to the compromised 
> ResultHandler causing the test's deliberate exception to be thrown and 
> exercising the relevant (DAGScheduler) code paths.  Given tasks are scoped by 
> the number of partitions of an RDD, this could be achieved with a single 
> partitioned RDD (indeed, doing so seems to exercise/would test some default 
> parallelism support of the TaskScheduler?); the pull request offered, 
> however, is based on the minimal change of just using a single partition of 
> the 2 (or more) partition parallelized RDD.  This will result in scheduling a 
> job of just one task, one successful task calling the user-supplied 
> compromised ResultHandler function, which results in failing the job and 
> unambiguously wrapping our DAGSchedulerSuiteException inside a 
> SparkDriverExecutionException; there are no other tasks that on running 
> successfully will find the job failed causing the 'undesired' 
> UnsupportedOperationException to be thrown instead.  This, then, satisfies 
> the test's setup assertion.
> I have tested this hypothesis having parametised the number of partitions, N, 
> used by the "misbehaved ResultHandler" job and have observed the 1 x 
> DAGSchedulerSuiteException first, followed by the legitimate N-1 x 
> UnsupportedOperationExceptions ... what propagates back from the job seems to 
> simply become the result of the race between task threads and the 
> intermittent failures observed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org