Github user harishreedharan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3655#discussion_r21766800
  
    --- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala
 ---
    @@ -201,12 +201,31 @@ class ReliableKafkaReceiver[
         topicPartitionOffsetMap.clear()
       }
     
    -  /** Store the ready-to-be-stored block and commit the related offsets to 
zookeeper. */
    +  /**
    +   * Store the ready-to-be-stored block and commit the related offsets to 
zookeeper. This method
    +   * will try a fixed number of times to push the block. If the push 
fails, the receiver is stopped.
    +   */
       private def storeBlockAndCommitOffset(
           blockId: StreamBlockId, arrayBuffer: mutable.ArrayBuffer[_]): Unit = 
{
    -    store(arrayBuffer.asInstanceOf[mutable.ArrayBuffer[(K, V)]])
    -    Option(blockOffsetMap.get(blockId)).foreach(commitOffset)
    -    blockOffsetMap.remove(blockId)
    +    var count = 0
    +    var pushed = false
    +    var exception: Exception = null
    +    while (!pushed && count <= 3) {
    --- End diff --
    
    In general, the failures probably are transient. Example, HDFS hflush fails 
because of GC or replication fails to the second BM because of timeouts or 
something. Such issues are likely to succeed on retries, but any major ones 
won't -- so limited retry followed by failures is probably a reasonable 
approach without making things too complex.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to