This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 706862cf99c [SPARK-40844][SS] Flip the default value of Kafka offset fetching config 706862cf99c is described below commit 706862cf99c1122e646db17a490963393e2eae14 Author: Jungtaek Lim <kabhwan.opensou...@gmail.com> AuthorDate: Thu Oct 20 05:10:41 2022 +0900 [SPARK-40844][SS] Flip the default value of Kafka offset fetching config ### What changes were proposed in this pull request? This PR proposes to flip the default value of Kafka offset fetching config (`spark.sql.streaming.kafka.useDeprecatedOffsetFetching`) from `true` to `false`, which enables AdminClient based offset fetching by default. ### Why are the changes needed? We had been encountered several production issues with old offset fetching (e.g. hang, issue with Kafka consumer group rebalance) which could be mitigated with new offset fetching. Despite the breaking change on the ACL, there is no need for moderate users to suffer from the old way. The discussion went through the dev. mailing list: https://lists.apache.org/thread/spkco94gw33sj8355mhlxz1vl7gl1g5c ### Does this PR introduce _any_ user-facing change? Yes, especially users who relies on Kafka ACL based on consumer group. They need to either adjust the ACL to topic based one, or set the value to `true` for `spark.sql.streaming.kafka.useDeprecatedOffsetFetching` to use the old approach. ### How was this patch tested? Existing UTs. Closes #38306 from HeartSaVioR/SPARK-40844. Authored-by: Jungtaek Lim <kabhwan.opensou...@gmail.com> Signed-off-by: Jungtaek Lim <kabhwan.opensou...@gmail.com> --- docs/ss-migration-guide.md | 2 ++ docs/structured-streaming-kafka-integration.md | 5 +++-- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/ss-migration-guide.md b/docs/ss-migration-guide.md index 0ca5b00debc..57fe3a84e12 100644 --- a/docs/ss-migration-guide.md +++ b/docs/ss-migration-guide.md @@ -30,6 +30,8 @@ Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide. - Since Spark 3.4, `Trigger.Once` is deprecated, and users are encouraged to migrate from `Trigger.Once` to `Trigger.AvailableNow`. Please refer [SPARK-39805](https://issues.apache.org/jira/browse/SPARK-39805) for more details. +- Since Spark 3.4, the default value of configuration for Kafka offset fetching (`spark.sql.streaming.kafka.useDeprecatedOffsetFetching`) is changed from `true` to `false`. The default no longer relies consumer group based scheduling, which affect the required ACL. For further details please see [Structured Streaming Kafka Integration](structured-streaming-kafka-integration.html#offset-fetching). + ## Upgrading from Structured Streaming 3.2 to 3.3 - Since Spark 3.3, all stateful operators require hash partitioning with exact grouping keys. In previous versions, all stateful operators except stream-stream join require loose partitioning criteria which opens the possibility on correctness issue. (See [SPARK-38204](https://issues.apache.org/jira/browse/SPARK-38204) for more details.) To ensure backward compatibility, we retain the old behavior with the checkpoint built from older versions. diff --git a/docs/structured-streaming-kafka-integration.md b/docs/structured-streaming-kafka-integration.md index 6121f19e806..d7b741a3fe6 100644 --- a/docs/structured-streaming-kafka-integration.md +++ b/docs/structured-streaming-kafka-integration.md @@ -568,8 +568,9 @@ Timestamp offset options require Kafka 0.10.1.0 or higher. ### Offset fetching In Spark 3.0 and before Spark uses <code>KafkaConsumer</code> for offset fetching which could cause infinite wait in the driver. -In Spark 3.1 a new configuration option added <code>spark.sql.streaming.kafka.useDeprecatedOffsetFetching</code> (default: <code>true</code>) -which could be set to `false` allowing Spark to use new offset fetching mechanism using <code>AdminClient</code>. +In Spark 3.1 a new configuration option added <code>spark.sql.streaming.kafka.useDeprecatedOffsetFetching</code> (default: <code>false</code>) +which allows Spark to use new offset fetching mechanism using <code>AdminClient</code>. (Set this to `true` to use old offset fetching with <code>KafkaConsumer</code>.) + When the new mechanism used the following applies. First of all the new approach supports Kafka brokers `0.11.0.0+`. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index a99a795018d..72eb420de37 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -1924,7 +1924,7 @@ object SQLConf { "Integration Guide.") .version("3.1.0") .booleanConf - .createWithDefault(true) + .createWithDefault(false) val STATEFUL_OPERATOR_CHECK_CORRECTNESS_ENABLED = buildConf("spark.sql.streaming.statefulOperator.checkCorrectness.enabled") --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org