[GitHub] spark pull request #20836: SPARK-23685 : Fix for the Spark Structured Stream...
Github user sirishaSindri closed the pull request at: https://github.com/apache/spark/pull/20836 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20836: SPARK-23685 : Fix for the Spark Structured Stream...
Github user sirishaSindri commented on a diff in the pull request: https://github.com/apache/spark/pull/20836#discussion_r177276642 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -279,9 +279,8 @@ private[kafka010] case class InternalKafkaConsumer( if (record.offset > offset) { // This may happen when some records aged out but their offsets already got verified if (failOnDataLoss) { - reportDataLoss(true, s"Cannot fetch records in [$offset, ${record.offset})") --- End diff -- @gaborgsomogyi Thank you Gaborgsomogyi for looking at it. For the batch queries, it will always fail if it fails to read any data from the provided offsets due to lost data. https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html . With this change,it wont fail .Instead It will return all the available messages within the requested offset range. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20836: SPARK-23685 : Fix for the Spark Structured Stream...
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20836#discussion_r176656905 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -279,9 +279,8 @@ private[kafka010] case class InternalKafkaConsumer( if (record.offset > offset) { // This may happen when some records aged out but their offsets already got verified if (failOnDataLoss) { - reportDataLoss(true, s"Cannot fetch records in [$offset, ${record.offset})") --- End diff -- It's just breaks the whole concept. When `failOnDataLoss ` enabled this exception means it should fail. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20836: SPARK-23685 : Fix for the Spark Structured Stream...
GitHub user sirishaSindri opened a pull request: https://github.com/apache/spark/pull/20836 SPARK-23685 : Fix for the Spark Structured Streaming Kafka 0.10 Consu⦠â¦mer Can't Handle Non-consecutive Offsets ## What changes were proposed in this pull request? In the fetchData , Instead of throwing an exception on failOnDataLoss, I am saying return the record if its offset falls in the user requested offset range ## How this patch was tested : manually tested, added a unit test and ran in a real deployment You can merge this pull request into a Git repository by running: $ git pull https://github.com/sirishaSindri/spark SPARK-23685 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20836.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20836 commit 5ccfed840f9cf9cd1c28a309b934e1285332d04d Author: z001k5c Date: 2018-03-15T15:53:14Z SPARK-23685 : Fix for the Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets commit 7e08cd9062683c062b7b0408ffe40ff726249909 Author: z001k5c Date: 2018-03-15T17:06:06Z SPARK-23685 : Fix for the Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org