[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72959021 Agreed. I will file a jira. We should discuss the issue there. This one was more of a break-fix, but that would be a more elaborate fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3655 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72949130 I can merge this for now and we can focus on that issue later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72789875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26710/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72789869 [Test build #26710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26710/consoleFull) for PR 3655 at commit [`5e2e7ad`](https://github.com/apache/spark/commit/5e2e7ad479d2739c4f1bd62fd1d48b216b2bdce0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72784728 @harishreedharan This begs a higher level questions of whether the write ahead log (which is the probably component to fail) should have its own retries independent of the receiver retrying. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72784691 [Test build #26710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26710/consoleFull) for PR 3655 at commit [`5e2e7ad`](https://github.com/apache/spark/commit/5e2e7ad479d2739c4f1bd62fd1d48b216b2bdce0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72784543 ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-70015327 No, this does prevent data loss - basically if the store fails multiple times, we shutdown the receiver completely. So the new receiver which gets started starts from the last commit, so we are safe. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-69684946 So I think the aim of this patch is to fix the recoverable problems of data store with retries, not prevent data loss. That's my thought, sorry for my misunderstanding. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-69684611 Hi @harishreedharan , After carefully looking at the code, I think data will not be lost even in such failure situation. For example, if we meet exception in `onPushBlock` which is called in `BlockGenerator#keepPushingBlocks`, this function will catch the exception and call `reportError` to notify the exception, After that the thread of `blockPushingThread` is exited. So there's no chance to call `onPushBlock` again, in another words, `storeBlockAndCommitOffset` will not be called again, so offset will not be committed to ZK from the failure point. And the receiving thread will keep getting data and pushed into `blockForPushing` until it is full, after that, the whole receiver system is blocked. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-69384366 @tdas Any comments on this one, or is this one ready to go in? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/3655#discussion_r21766800 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala --- @@ -201,12 +201,31 @@ class ReliableKafkaReceiver[ topicPartitionOffsetMap.clear() } - /** Store the ready-to-be-stored block and commit the related offsets to zookeeper. */ + /** + * Store the ready-to-be-stored block and commit the related offsets to zookeeper. This method + * will try a fixed number of times to push the block. If the push fails, the receiver is stopped. + */ private def storeBlockAndCommitOffset( blockId: StreamBlockId, arrayBuffer: mutable.ArrayBuffer[_]): Unit = { -store(arrayBuffer.asInstanceOf[mutable.ArrayBuffer[(K, V)]]) -Option(blockOffsetMap.get(blockId)).foreach(commitOffset) -blockOffsetMap.remove(blockId) +var count = 0 +var pushed = false +var exception: Exception = null +while (!pushed && count <= 3) { --- End diff -- In general, the failures probably are transient. Example, HDFS hflush fails because of GC or replication fails to the second BM because of timeouts or something. Such issues are likely to succeed on retries, but any major ones won't -- so limited retry followed by failures is probably a reasonable approach without making things too complex. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3655#discussion_r21586197 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala --- @@ -201,12 +201,31 @@ class ReliableKafkaReceiver[ topicPartitionOffsetMap.clear() } - /** Store the ready-to-be-stored block and commit the related offsets to zookeeper. */ + /** + * Store the ready-to-be-stored block and commit the related offsets to zookeeper. This method + * will try a fixed number of times to push the block. If the push fails, the receiver is stopped. + */ private def storeBlockAndCommitOffset( blockId: StreamBlockId, arrayBuffer: mutable.ArrayBuffer[_]): Unit = { -store(arrayBuffer.asInstanceOf[mutable.ArrayBuffer[(K, V)]]) -Option(blockOffsetMap.get(blockId)).foreach(commitOffset) -blockOffsetMap.remove(blockId) +var count = 0 +var pushed = false +var exception: Exception = null +while (!pushed && count <= 3) { --- End diff -- General question - is it likely that a store fails, but then immediately succeeds? Just wondering at the likelihood that this does anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-66399109 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24282/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-66399105 [Test build #24282 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24282/consoleFull) for PR 3655 at commit [`5e2e7ad`](https://github.com/apache/spark/commit/5e2e7ad479d2739c4f1bd62fd1d48b216b2bdce0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-66398284 Thanks Hari, seems this is a simple solution. BTW should we make `count = 3` as a configurable parameter? For others LGTM. Original thoughts of introducing pending queue probably will make the design much more complex because of synchronization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-66393421 I messed up the jira number in the commit. Please fix it when merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org