[jira] [Commented] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021185#comment-15021185 ] Richard W. Eggert II commented on SPARK-4514: - The unit test attached to this issue fails in master, but passes in https://github.com/apache/spark/pull/9264 > SparkContext localProperties does not inherit property updates across thread > reuse > -- > > Key: SPARK-4514 > URL: https://issues.apache.org/jira/browse/SPARK-4514 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Erik Erlandson >Assignee: Josh Rosen >Priority: Critical > > The current job group id of a Spark context is stored in the > {{localProperties}} member value. This data structure is designed to be > thread local, and its settings are not preserved when {{ComplexFutureAction}} > instantiates a new {{Future}}. > One consequence of this is that {{takeAsync()}} does not behave in the same > way as other async actions, e.g. {{countAsync()}}. For example, this test > (if copied into StatusTrackerSuite.scala), will fail, because > {{"my-job-group2"}} is not propagated to the Future which actually > instantiates the job: > {code:java} > test("getJobIdsForGroup() with takeAsync()") { > sc = new SparkContext("local", "test", new SparkConf(false)) > sc.setJobGroup("my-job-group2", "description") > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be (Seq.empty) > val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) > val firstJobId = eventually(timeout(10 seconds)) { > firstJobFuture.jobIds.head > } > eventually(timeout(10 seconds)) { > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be > (Seq(firstJobId)) > } > } > {code} > It also impacts current PR for SPARK-1021, which involves additional uses of > {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021187#comment-15021187 ] Richard W. Eggert II commented on SPARK-4514: - This test, however, still fails: {code} test("getJobIdsForGroup() with takeAsync() across multiple partitions") { sc = new SparkContext("local", "test", new SparkConf(false)) sc.setJobGroup("my-job-group2", "description") sc.statusTracker.getJobIdsForGroup("my-job-group2") shouldBe empty val firstJobFuture = sc.parallelize(1 to 1000, 2).takeAsync(999) val firstJobId = eventually(timeout(10 seconds)) { firstJobFuture.jobIds.head } eventually(timeout(10 seconds)) { sc.statusTracker.getJobIdsForGroup("my-job-group2") should have size 2 } } {code} > SparkContext localProperties does not inherit property updates across thread > reuse > -- > > Key: SPARK-4514 > URL: https://issues.apache.org/jira/browse/SPARK-4514 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Erik Erlandson >Assignee: Josh Rosen >Priority: Critical > > The current job group id of a Spark context is stored in the > {{localProperties}} member value. This data structure is designed to be > thread local, and its settings are not preserved when {{ComplexFutureAction}} > instantiates a new {{Future}}. > One consequence of this is that {{takeAsync()}} does not behave in the same > way as other async actions, e.g. {{countAsync()}}. For example, this test > (if copied into StatusTrackerSuite.scala), will fail, because > {{"my-job-group2"}} is not propagated to the Future which actually > instantiates the job: > {code:java} > test("getJobIdsForGroup() with takeAsync()") { > sc = new SparkContext("local", "test", new SparkConf(false)) > sc.setJobGroup("my-job-group2", "description") > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be (Seq.empty) > val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) > val firstJobId = eventually(timeout(10 seconds)) { > firstJobFuture.jobIds.head > } > eventually(timeout(10 seconds)) { > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be > (Seq(firstJobId)) > } > } > {code} > It also impacts current PR for SPARK-1021, which involves additional uses of > {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021194#comment-15021194 ] Richard W. Eggert II commented on SPARK-4514: - I implemented a two-line fix that causes this test to now pass in that PR. > SparkContext localProperties does not inherit property updates across thread > reuse > -- > > Key: SPARK-4514 > URL: https://issues.apache.org/jira/browse/SPARK-4514 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Erik Erlandson >Assignee: Josh Rosen >Priority: Critical > > The current job group id of a Spark context is stored in the > {{localProperties}} member value. This data structure is designed to be > thread local, and its settings are not preserved when {{ComplexFutureAction}} > instantiates a new {{Future}}. > One consequence of this is that {{takeAsync()}} does not behave in the same > way as other async actions, e.g. {{countAsync()}}. For example, this test > (if copied into StatusTrackerSuite.scala), will fail, because > {{"my-job-group2"}} is not propagated to the Future which actually > instantiates the job: > {code:java} > test("getJobIdsForGroup() with takeAsync()") { > sc = new SparkContext("local", "test", new SparkConf(false)) > sc.setJobGroup("my-job-group2", "description") > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be (Seq.empty) > val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) > val firstJobId = eventually(timeout(10 seconds)) { > firstJobFuture.jobIds.head > } > eventually(timeout(10 seconds)) { > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be > (Seq(firstJobId)) > } > } > {code} > It also impacts current PR for SPARK-1021, which involves additional uses of > {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021199#comment-15021199 ] Apache Spark commented on SPARK-4514: - User 'reggert' has created a pull request for this issue: https://github.com/apache/spark/pull/9264 > SparkContext localProperties does not inherit property updates across thread > reuse > -- > > Key: SPARK-4514 > URL: https://issues.apache.org/jira/browse/SPARK-4514 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Erik Erlandson >Assignee: Josh Rosen >Priority: Critical > > The current job group id of a Spark context is stored in the > {{localProperties}} member value. This data structure is designed to be > thread local, and its settings are not preserved when {{ComplexFutureAction}} > instantiates a new {{Future}}. > One consequence of this is that {{takeAsync()}} does not behave in the same > way as other async actions, e.g. {{countAsync()}}. For example, this test > (if copied into StatusTrackerSuite.scala), will fail, because > {{"my-job-group2"}} is not propagated to the Future which actually > instantiates the job: > {code:java} > test("getJobIdsForGroup() with takeAsync()") { > sc = new SparkContext("local", "test", new SparkConf(false)) > sc.setJobGroup("my-job-group2", "description") > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be (Seq.empty) > val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) > val firstJobId = eventually(timeout(10 seconds)) { > firstJobFuture.jobIds.head > } > eventually(timeout(10 seconds)) { > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be > (Seq(firstJobId)) > } > } > {code} > It also impacts current PR for SPARK-1021, which involves additional uses of > {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511537#comment-14511537 ] Ilya Ganelin commented on SPARK-4514: - [~joshrosen] - given your work on SPARK-6629 is this still relevant - I saw that there was a comment there stating that issue may not be a problem? I can knock this one out if it's still necessary. SparkContext localProperties does not inherit property updates across thread reuse -- Key: SPARK-4514 URL: https://issues.apache.org/jira/browse/SPARK-4514 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0, 1.1.1, 1.2.0 Reporter: Erik Erlandson Assignee: Josh Rosen Priority: Critical The current job group id of a Spark context is stored in the {{localProperties}} member value. This data structure is designed to be thread local, and its settings are not preserved when {{ComplexFutureAction}} instantiates a new {{Future}}. One consequence of this is that {{takeAsync()}} does not behave in the same way as other async actions, e.g. {{countAsync()}}. For example, this test (if copied into StatusTrackerSuite.scala), will fail, because {{my-job-group2}} is not propagated to the Future which actually instantiates the job: {code:java} test(getJobIdsForGroup() with takeAsync()) { sc = new SparkContext(local, test, new SparkConf(false)) sc.setJobGroup(my-job-group2, description) sc.statusTracker.getJobIdsForGroup(my-job-group2) should be (Seq.empty) val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) val firstJobId = eventually(timeout(10 seconds)) { firstJobFuture.jobIds.head } eventually(timeout(10 seconds)) { sc.statusTracker.getJobIdsForGroup(my-job-group2) should be (Seq(firstJobId)) } } {code} It also impacts current PR for SPARK-1021, which involves additional uses of {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388095#comment-14388095 ] Josh Rosen commented on SPARK-4514: --- I don't know that there's a good way to fix this for all arbitrary ways in which users might create or re-use threads. This inheritance behavior is slightly more understandable in cases where users explicitly create child threads. Although our documentation doesn't seem to explicitly promise that properties will be inherited, I think that users might have come to rely on this behavior so I don't think that we can remove it at this point. We can certainly fix it for the AsyncRDDActions case, though, because we can manually thread the properties in the constructor. This pain could have probably been avoided if the original design used something like Scala's {{DynamicVariable}} where you're forced to explicitly consider the scope / lifecycle of the thread-local property. I'm going to try to fix this for the AsyncRDDActions case and will try to improve the documentation to warn about this pitfall for the more general cases involving arbitrary user code. Let me know if you can spot another solution which won't break existing user code that relies on property inheritance in the non-thread-reuse cases. SparkContext localProperties does not inherit property updates across thread reuse -- Key: SPARK-4514 URL: https://issues.apache.org/jira/browse/SPARK-4514 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0, 1.1.1, 1.2.0 Reporter: Erik Erlandson Assignee: Josh Rosen Priority: Critical The current job group id of a Spark context is stored in the {{localProperties}} member value. This data structure is designed to be thread local, and its settings are not preserved when {{ComplexFutureAction}} instantiates a new {{Future}}. One consequence of this is that {{takeAsync()}} does not behave in the same way as other async actions, e.g. {{countAsync()}}. For example, this test (if copied into StatusTrackerSuite.scala), will fail, because {{my-job-group2}} is not propagated to the Future which actually instantiates the job: {code:java} test(getJobIdsForGroup() with takeAsync()) { sc = new SparkContext(local, test, new SparkConf(false)) sc.setJobGroup(my-job-group2, description) sc.statusTracker.getJobIdsForGroup(my-job-group2) should be (Seq.empty) val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) val firstJobId = eventually(timeout(10 seconds)) { firstJobFuture.jobIds.head } eventually(timeout(10 seconds)) { sc.statusTracker.getJobIdsForGroup(my-job-group2) should be (Seq(firstJobId)) } } {code} It also impacts current PR for SPARK-1021, which involves additional uses of {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388231#comment-14388231 ] Josh Rosen commented on SPARK-4514: --- I've filed SPARK-6629 to fix a related issue where inherited job groups did not play nicely with cancellation. SparkContext localProperties does not inherit property updates across thread reuse -- Key: SPARK-4514 URL: https://issues.apache.org/jira/browse/SPARK-4514 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0, 1.1.1, 1.2.0 Reporter: Erik Erlandson Assignee: Josh Rosen Priority: Critical The current job group id of a Spark context is stored in the {{localProperties}} member value. This data structure is designed to be thread local, and its settings are not preserved when {{ComplexFutureAction}} instantiates a new {{Future}}. One consequence of this is that {{takeAsync()}} does not behave in the same way as other async actions, e.g. {{countAsync()}}. For example, this test (if copied into StatusTrackerSuite.scala), will fail, because {{my-job-group2}} is not propagated to the Future which actually instantiates the job: {code:java} test(getJobIdsForGroup() with takeAsync()) { sc = new SparkContext(local, test, new SparkConf(false)) sc.setJobGroup(my-job-group2, description) sc.statusTracker.getJobIdsForGroup(my-job-group2) should be (Seq.empty) val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) val firstJobId = eventually(timeout(10 seconds)) { firstJobFuture.jobIds.head } eventually(timeout(10 seconds)) { sc.statusTracker.getJobIdsForGroup(my-job-group2) should be (Seq(firstJobId)) } } {code} It also impacts current PR for SPARK-1021, which involves additional uses of {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org