Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/6910#discussion_r32876281 --- Diff: external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkAvroCallbackHandler.scala --- @@ -53,7 +53,7 @@ private[flume] class SparkAvroCallbackHandler(val threads: Int, val channel: Cha // Since the new txn may not have the same sequence number we must guard against accidentally // committing a new transaction. To reduce the probability of that happening a random string is // prepended to the sequence number. Does not change for life of sink - private val seqBase = RandomStringUtils.randomAlphanumeric(8) + private val seqBase = UUID.randomUUID().toString.substring(0, 8) --- End diff -- Last time, we tried it and we ended up getting the same string multiple times. I am not entirely sure why, but that was the reason we used RandomStringUtils in the first place. The idea of using the seqBase is to protect against the case where a sink restart can cause collisions. I think this should be random enough, else we can use the full UUID which I feel is adding too much overhead without a whole lot of gain.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org