[GitHub] spark pull request #22042: [SPARK-25005][SS]Support non-consecutive offsets ...

tdas Tue, 21 Aug 2018 18:25:26 -0700

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22042#discussion_r211802112
  
    --- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ---
    @@ -250,33 +341,39 @@ private[kafka010] case class InternalKafkaConsumer(
           offset: Long,
           untilOffset: Long,
           pollTimeoutMs: Long,
    -      failOnDataLoss: Boolean): ConsumerRecord[Array[Byte], Array[Byte]] = 
{
    -    if (offset != nextOffsetInFetchedData || !fetchedData.hasNext()) {
    -      // This is the first fetch, or the last pre-fetched data has been 
drained.
    +      failOnDataLoss: Boolean): FetchedRecord = {
    +    if (offset != fetchedData.nextOffsetInFetchedData) {
    +      // This is the first fetch, or the fetched data has been reset.
           // Seek to the offset because we may call seekToBeginning or 
seekToEnd before this.
    -      seek(offset)
    -      poll(pollTimeoutMs)
    -    }
    -
    -    if (!fetchedData.hasNext()) {
    -      // We cannot fetch anything after `poll`. Two possible cases:
    -      // - `offset` is out of range so that Kafka returns nothing. Just 
throw
    -      // `OffsetOutOfRangeException` to let the caller handle it.
    -      // - Cannot fetch any data before timeout. TimeoutException will be 
thrown.
    -      val range = getAvailableOffsetRange()
    -      if (offset < range.earliest || offset >= range.latest) {
    -        throw new OffsetOutOfRangeException(
    -          Map(topicPartition -> java.lang.Long.valueOf(offset)).asJava)
    +      poll(offset, pollTimeoutMs)
    +    } else if (!fetchedData.hasNext) {
    +      // The last pre-fetched data has been drained.
    +      if (offset < fetchedData.offsetAfterPoll) {
    +        // Offsets in [offset, offsetAfterPoll) are missing. We should 
skip them.
    --- End diff --
    
    "skip them" is confusing. What does it mean to skip? Why are we still 
returning something.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22042: [SPARK-25005][SS]Support non-consecutive offsets ...

Reply via email to