Github user harishreedharan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6910#discussion_r32876281
  
    --- Diff: 
external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkAvroCallbackHandler.scala
 ---
    @@ -53,7 +53,7 @@ private[flume] class SparkAvroCallbackHandler(val 
threads: Int, val channel: Cha
       // Since the new txn may not have the same sequence number we must guard 
against accidentally
       // committing a new transaction. To reduce the probability of that 
happening a random string is
       // prepended to the sequence number. Does not change for life of sink
    -  private val seqBase = RandomStringUtils.randomAlphanumeric(8)
    +  private val seqBase = UUID.randomUUID().toString.substring(0, 8)
    --- End diff --
    
    Last time, we tried it and we ended up getting the same string multiple 
times. I am not entirely sure why, but that was the reason we used 
RandomStringUtils in the first place.
    
    The idea of using the seqBase is to protect against the case where a sink 
restart can cause collisions. I think this should be random enough, else we can 
use the full UUID which I feel is adding too much overhead without a whole lot 
of gain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to