[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-113063763 @srowen Sorry. The issue I mentioned in https://github.com/apache/spark/pull/6852#issuecomment-112722654 won't happen. I put ~~strikethrough~~ there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-113059389 @tdas +1 yep https://github.com/apache/spark/pull/6852#issuecomment-112707759 is the good catch here; was just questioning the other motivation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6852 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112962475 Merging this in Spark 1.4 and master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112962440 @srowen We have been seeing some flakiness in this test in our daily master builds in Jenkins. So these fixes LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112939012 [Test build #35050 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35050/console) for PR 6852 at commit [`d464211`](https://github.com/apache/spark/commit/d464211afe6805117cd4d56957a6f773bc8ae3a8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112939057 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112909451 [Test build #35050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35050/consoleFull) for PR 6852 at commit [`d464211`](https://github.com/apache/spark/commit/d464211afe6805117cd4d56957a6f773bc8ae3a8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112909043 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112908908 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112908986 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112772435 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112772353 **[Test build #35032 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35032/console)** for PR 6852 at commit [`d464211`](https://github.com/apache/spark/commit/d464211afe6805117cd4d56957a6f773bc8ae3a8) after a configured wait of `175m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112752168 OK, that's fine, but the problems you mentioned above aren't a symptom of that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112751779 Writes being visible is what I'm concerned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112739900 Are you worried about writes being visible or a correctness issue? Yes I see the potential correctness issue -- it's not `foreachRDD`s happening at the same time but the polling read afterwards. I think you'd find there are definitely memory barriers in, for example, merely submitting a task. So I don't think there's a possibility that the writes never turn up in the reading thread. However that point is moot since I think the change is needed for the reason above anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112728270 [Test build #35032 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35032/consoleFull) for PR 6852 at commit [`d464211`](https://github.com/apache/spark/commit/d464211afe6805117cd4d56957a6f773bc8ae3a8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112727979 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112728004 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112727686 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112727312 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112727289 [Test build #35028 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35028/console) for PR 6852 at commit [`d464211`](https://github.com/apache/spark/commit/d464211afe6805117cd4d56957a6f773bc8ae3a8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112722654 > There are certainly some memory barriers in between the writes and reads without this. For these tests, there is no memory barrier because the checking codes are called after `ssc.start()` immediately. And one potential issue is that, e.g., writing and reading a java.util.HashMap at the same time may cause an infinite loop: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6423457 > If a HashMap is used in a concurrent setting with insufficient synchronization, it is possible for the data structure to get corrupted in such a way that infinite loops appear in the data structure and thus get() could loop forever. Of cause, I'm not sure if scala.collection.mutable.HashMap has a similar issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112712207 Yes I understand that argument, though have you observed a failure as a result? There are certainly some memory barriers in between the writes and reads without this. If you go this route, there are 1000 other places this needs to change in Spark. I am not sure it helps to do it piecemeal, and not sure it fixes something? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112707759 > In all cases foreachRDD executes serially on the driver; what's the thread safety issue? The codes in foreachRDD run in `JobScheduler.jobExecutor`. But the checking codes run in the main thread. And there is no memory barrier to guarantee the memory visibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112705568 In all cases `foreachRDD` executes serially on the driver; what's the thread safety issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/6852#discussion_r32602651 --- Diff: external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaDirectKafkaStreamSuite.java --- @@ -116,7 +114,7 @@ public String call(MessageAndMetadata msgAndMd) throws Exception ); JavaDStream unifiedStream = stream1.union(stream2); -final HashSet result = new HashSet(); +final Set result = Collections.synchronizedSet(new HashSet()); --- End diff -- When would this be run concurrently though? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112703034 [Test build #35028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35028/consoleFull) for PR 6852 at commit [`d464211`](https://github.com/apache/spark/commit/d464211afe6805117cd4d56957a6f773bc8ae3a8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112701815 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112701962 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/6852#issuecomment-112700126 cc @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8404][Streaming][Tests] Use thread-safe...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/6852 [SPARK-8404][Streaming][Tests] Use thread-safe collections to make the tests more reliable KafkaStreamSuite, DirectKafkaStreamSuite, JavaKafkaStreamSuite and JavaDirectKafkaStreamSuite use non-thread-safe collections to collect data in one thread and check it in another thread. It may fail the tests. This PR changes them to thread-safe collections. Note: I cannot reproduce the test failures in my environment. But at least, this PR should make the tests more reliable. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark fix-KafkaStreamSuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6852.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6852 commit d464211afe6805117cd4d56957a6f773bc8ae3a8 Author: zsxwing Date: 2015-06-17T07:43:34Z Use thread-safe collections to make the tests more reliable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org