[spark] branch master updated: [SPARK-40844][SS] Flip the default value of Kafka offset fetching config

kabhwan Wed, 19 Oct 2022 13:11:24 -0700

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 706862cf99c [SPARK-40844][SS] Flip the default value of Kafka offset 
fetching config
706862cf99c is described below

commit 706862cf99c1122e646db17a490963393e2eae14
Author: Jungtaek Lim <kabhwan.opensou...@gmail.com>
AuthorDate: Thu Oct 20 05:10:41 2022 +0900

    [SPARK-40844][SS] Flip the default value of Kafka offset fetching config
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to flip the default value of Kafka offset fetching config 
(`spark.sql.streaming.kafka.useDeprecatedOffsetFetching`) from `true` to 
`false`, which enables AdminClient based offset fetching by default.
    
    ### Why are the changes needed?
    
    We had been encountered several production issues with old offset fetching 
(e.g. hang, issue with Kafka consumer group rebalance) which could be mitigated 
with new offset fetching. Despite the breaking change on the ACL, there is no 
need for moderate users to suffer from the old way.
    
    The discussion went through the dev. mailing list: 
https://lists.apache.org/thread/spkco94gw33sj8355mhlxz1vl7gl1g5c
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, especially users who relies on Kafka ACL based on consumer group. They 
need to either adjust the ACL to topic based one, or set the value to `true` 
for `spark.sql.streaming.kafka.useDeprecatedOffsetFetching` to use the old 
approach.
    
    ### How was this patch tested?
    
    Existing UTs.
    
    Closes #38306 from HeartSaVioR/SPARK-40844.
    
    Authored-by: Jungtaek Lim <kabhwan.opensou...@gmail.com>
    Signed-off-by: Jungtaek Lim <kabhwan.opensou...@gmail.com>
---
 docs/ss-migration-guide.md                                           | 2 ++
 docs/structured-streaming-kafka-integration.md                       | 5 +++--
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala       | 2 +-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/docs/ss-migration-guide.md b/docs/ss-migration-guide.md
index 0ca5b00debc..57fe3a84e12 100644
--- a/docs/ss-migration-guide.md
+++ b/docs/ss-migration-guide.md
@@ -30,6 +30,8 @@ Please refer [Migration Guide: SQL, Datasets and 
DataFrame](sql-migration-guide.
 
 - Since Spark 3.4, `Trigger.Once` is deprecated, and users are encouraged to 
migrate from `Trigger.Once` to `Trigger.AvailableNow`. Please refer 
[SPARK-39805](https://issues.apache.org/jira/browse/SPARK-39805) for more 
details.
 
+- Since Spark 3.4, the default value of configuration for Kafka offset 
fetching (`spark.sql.streaming.kafka.useDeprecatedOffsetFetching`) is changed 
from `true` to `false`. The default no longer relies consumer group based 
scheduling, which affect the required ACL. For further details please see 
[Structured Streaming Kafka 
Integration](structured-streaming-kafka-integration.html#offset-fetching).
+
 ## Upgrading from Structured Streaming 3.2 to 3.3
 
 - Since Spark 3.3, all stateful operators require hash partitioning with exact 
grouping keys. In previous versions, all stateful operators except 
stream-stream join require loose partitioning criteria which opens the 
possibility on correctness issue. (See 
[SPARK-38204](https://issues.apache.org/jira/browse/SPARK-38204) for more 
details.) To ensure backward compatibility, we retain the old behavior with the 
checkpoint built from older versions.
diff --git a/docs/structured-streaming-kafka-integration.md 
b/docs/structured-streaming-kafka-integration.md
index 6121f19e806..d7b741a3fe6 100644
--- a/docs/structured-streaming-kafka-integration.md
+++ b/docs/structured-streaming-kafka-integration.md
@@ -568,8 +568,9 @@ Timestamp offset options require Kafka 0.10.1.0 or higher.
 ### Offset fetching
 
 In Spark 3.0 and before Spark uses <code>KafkaConsumer</code> for offset 
fetching which could cause infinite wait in the driver.
-In Spark 3.1 a new configuration option added 
<code>spark.sql.streaming.kafka.useDeprecatedOffsetFetching</code> (default: 
<code>true</code>)
-which could be set to `false` allowing Spark to use new offset fetching 
mechanism using <code>AdminClient</code>.
+In Spark 3.1 a new configuration option added 
<code>spark.sql.streaming.kafka.useDeprecatedOffsetFetching</code> (default: 
<code>false</code>)
+which allows Spark to use new offset fetching mechanism using 
<code>AdminClient</code>. (Set this to `true` to use old offset fetching with 
<code>KafkaConsumer</code>.)
+
 When the new mechanism used the following applies.
 
 First of all the new approach supports Kafka brokers `0.11.0.0+`.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index a99a795018d..72eb420de37 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1924,7 +1924,7 @@ object SQLConf {
         "Integration Guide.")
       .version("3.1.0")
       .booleanConf
-      .createWithDefault(true)
+      .createWithDefault(false)
 
   val STATEFUL_OPERATOR_CHECK_CORRECTNESS_ENABLED =
     buildConf("spark.sql.streaming.statefulOperator.checkCorrectness.enabled")


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-40844][SS] Flip the default value of Kafka offset fetching config

Reply via email to