gaborgsomogyi commented on a change in pull request #25135: [SPARK-28367][SS] 
Use new KafkaConsumer.poll API in Kafka connector
URL: https://github.com/apache/spark/pull/25135#discussion_r303786627
 
 

 ##########
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
 ##########
 @@ -419,6 +416,19 @@ private[kafka010] class KafkaOffsetReader(
     stopConsumer()
     _consumer = null  // will automatically get reinitialized again
   }
+
+  private def getPartitions(): ju.Set[TopicPartition] = {
+    var partitions = Set.empty[TopicPartition].asJava
+    val startTimeMs = System.currentTimeMillis()
+    while (partitions.isEmpty && System.currentTimeMillis() - startTimeMs < 
pollTimeoutMs) {
+      // Poll to get the latest assigned partitions
+      consumer.poll(jt.Duration.ofMillis(100))
 
 Review comment:
   If assignment is available and the consumer is able to poll then 
`consumer.poll(jt.Duration.ofMillis(100))` call returns immediately (not 
mentioning the time required to poll) so not yet understand why would it be 
better.
   
   Based on your suggestion I've concluded the following implementation 
(correct me if I misunderstood):
   ```
   var firstFetch = false
   ...
   if (!firstFetch) {
     consumer.poll(jt.Duration.Zero)
     firstFetch = true
   } else {
     consumer.poll(jt.Duration.ofMillis(100))
   }
   ...
   ```
   Does it worth such complication to spare 100ms?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to