Re: [PR] [SPARK-47847][CORE] Deprecate `spark.network.remoteReadNioBufferConversion` [spark]
dongjoon-hyun commented on PR #46047: URL: https://github.com/apache/spark/pull/46047#issuecomment-2103920027 Merged to master/3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47847][CORE] Deprecate `spark.network.remoteReadNioBufferConversion` [spark]
dongjoon-hyun closed pull request #46047: [SPARK-47847][CORE] Deprecate `spark.network.remoteReadNioBufferConversion` URL: https://github.com/apache/spark/pull/46047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47847][CORE] Deprecate `spark.network.remoteReadNioBufferConversion` [spark]
dongjoon-hyun commented on PR #46047: URL: https://github.com/apache/spark/pull/46047#issuecomment-2103919298 Since this is irrelevant to CI, I verified manually like the following. ``` $ bin/spark-shell -c spark.network.remoteReadNioBufferConversion=true WARNING: Using incubator modules: jdk.incubator.vector 24/05/09 22:54:07 WARN SparkConf: The configuration key 'spark.network.remoteReadNioBufferConversion' has been deprecated as of Spark 3.5.2 and may be removed in the future. Please open a JIRA ticket to report it if you need to use this configuration. 24/05/09 22:54:07 WARN SparkConf: The configuration key 'spark.network.remoteReadNioBufferConversion' has been deprecated as of Spark 3.5.2 and may be removed in the future. Please open a JIRA ticket to report it if you need to use this configuration. 24/05/09 22:54:07 WARN SparkConf: The configuration key 'spark.network.remoteReadNioBufferConversion' has been deprecated as of Spark 3.5.2 and may be removed in the future. Please open a JIRA ticket to report it if you need to use this configuration. 24/05/09 22:54:07 INFO SignalUtils: Registering signal handler for INT 24/05/09 22:54:07 WARN SparkConf: The configuration key 'spark.network.remoteReadNioBufferConversion' has been deprecated as of Spark 3.5.2 and may be removed in the future. Please open a JIRA ticket to report it if you need to use this configuration. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Scala version 2.13.13 (OpenJDK 64-Bit Server VM, Java 21.0.3) Type in expressions to have them evaluated. Type :help for more information. 24/05/09 22:54:09 WARN SparkConf: The configuration key 'spark.network.remoteReadNioBufferConversion' has been deprecated as of Spark 3.5.2 and may be removed in the future. Please open a JIRA ticket to report it if you need to use this configuration. 24/05/09 22:54:09 WARN SparkConf: The configuration key 'spark.network.remoteReadNioBufferConversion' has been deprecated as of Spark 3.5.2 and may be removed in the future. Please open a JIRA ticket to report it if you need to use this configuration. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-48230][BUILD] Remove unused jodd-core [spark]
pan3793 opened a new pull request, #46520: URL: https://github.com/apache/spark/pull/46520 ### What changes were proposed in this pull request? Remove a jar that has CVE https://github.com/advisories/GHSA-jrg3-qq99-35g7 ### Why are the changes needed? Previously, `jodd-core` came from Hive transitive deps, while https://github.com/apache/hive/pull/5151 (Hive 2.3.10) cut it out, so we can remove it from Spark now. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48219][CORE] StreamReader Charset fix with UTF8 [spark]
dongjoon-hyun commented on PR #46509: URL: https://github.com/apache/spark/pull/46509#issuecomment-2103908417 Sorry but I'll leave this to the other reviewers, @xuzifu666 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48219][CORE] StreamReader Charset fix with UTF8 [spark]
xuzifu666 commented on PR #46509: URL: https://github.com/apache/spark/pull/46509#issuecomment-2103906360 @dongjoon-hyun Could you give a final review? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]
cloud-fan commented on PR #46512: URL: https://github.com/apache/spark/pull/46512#issuecomment-2103907125 I see, I'll install python 3.9 on the release docker image. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47847][CORE] Deprecate spark.network.remoteReadNioBufferConversion [spark]
pan3793 commented on PR #46047: URL: https://github.com/apache/spark/pull/46047#issuecomment-2103900781 @dongjoon-hyun thanks for your suggestion, updated the deprecated message, and we can consider removing it at 4.1.0 or later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48228][PYTHON][CONNECT] Implement the missing function validation in ApplyInXXX [spark]
zhengruifeng commented on PR #46519: URL: https://github.com/apache/spark/pull/46519#issuecomment-2103877686 @dongjoon-hyun and @HyukjinKwon thanks for reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48201][DOCS][PYTHON] Make some corrections in the docstring of pyspark DataStreamReader methods [spark]
dongjoon-hyun commented on PR #46416: URL: https://github.com/apache/spark/pull/46416#issuecomment-2103874418 Welcome to the Apache Spark community, @chloeh13q . I added you to the Apache Spark contributor group and assigned SPARK-48201 to you. Congratulations for your first commit! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48201][DOCS][PYTHON] Make some corrections in the docstring of pyspark DataStreamReader methods [spark]
dongjoon-hyun closed pull request #46416: [SPARK-48201][DOCS][PYTHON] Make some corrections in the docstring of pyspark DataStreamReader methods URL: https://github.com/apache/spark/pull/46416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] Fix previous reader checks in Vectorized DELTA_BYTE_ARRAY decoder [spark]
dongjoon-hyun commented on code in PR #46485: URL: https://github.com/apache/spark/pull/46485#discussion_r1596258635 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java: ## @@ -353,8 +353,9 @@ private void initDataReader( throw new IOException("could not read page in col " + descriptor, e); } // for PARQUET-246 (See VectorizedDeltaByteArrayReader.setPreviousValues) -if (CorruptDeltaByteArrays.requiresSequentialReads(writerVersion, dataEncoding) && -previousReader instanceof RequiresPreviousReader) { +if (CorruptDeltaByteArrays.requiresSequentialReads(writerVersion, dataEncoding) +&& previousReader != null +&& dataColumn instanceof RequiresPreviousReader) { Review Comment: Could you file a JIRA issue, @yutsareva ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] Fix previous reader checks in Vectorized DELTA_BYTE_ARRAY decoder [spark]
dongjoon-hyun commented on PR #46485: URL: https://github.com/apache/spark/pull/46485#issuecomment-2103871560 cc @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48228][PYTHON][CONNECT] Implement the missing function validation in ApplyInXXX [spark]
dongjoon-hyun commented on PR #46519: URL: https://github.com/apache/spark/pull/46519#issuecomment-2103868519 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47834][SQL][CONNECT] Mark deprecated functions with `@deprecated` in `SQLImplicits` [spark]
LuciferYang commented on PR #46029: URL: https://github.com/apache/spark/pull/46029#issuecomment-2103868266 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48228][PYTHON][CONNECT] Implement the missing function validation in ApplyInXXX [spark]
dongjoon-hyun closed pull request #46519: [SPARK-48228][PYTHON][CONNECT] Implement the missing function validation in ApplyInXXX URL: https://github.com/apache/spark/pull/46519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48210][DOC]Modify the description of whether dynamic partition… [spark]
dongjoon-hyun commented on PR #46496: URL: https://github.com/apache/spark/pull/46496#issuecomment-2103866076 cc @mridulm and @tgravescs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48224][SQL] Disallow map keys from being of variant type [spark]
dongjoon-hyun closed pull request #46516: [SPARK-48224][SQL] Disallow map keys from being of variant type URL: https://github.com/apache/spark/pull/46516 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]
dongjoon-hyun commented on PR #46468: URL: https://github.com/apache/spark/pull/46468#issuecomment-2103844998 Also, cc @cloud-fan and @HyukjinKwon This fixes not only Hive dependency but also a long standing `libthrift` library issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]
dongjoon-hyun commented on PR #46468: URL: https://github.com/apache/spark/pull/46468#issuecomment-2103844347 Merged to master! Thank you so much, @pan3793 and @sunchao . From now, many people will use Hive 2.3.10. I believe we can build more confidence before Apache Spark 4.0.0 release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]
dongjoon-hyun closed pull request #46468: [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 URL: https://github.com/apache/spark/pull/46468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [DRAFT][BUILD] Test upgrading built-in Hive to 2.3.10 [spark]
dongjoon-hyun closed pull request #45372: [DRAFT][BUILD] Test upgrading built-in Hive to 2.3.10 URL: https://github.com/apache/spark/pull/45372 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]
dongjoon-hyun commented on PR #45201: URL: https://github.com/apache/spark/pull/45201#issuecomment-2103839853 It's supposed to be here as a last resort until we release Apache Spark 4.0.0 successfully without reverting Hive 2.3.10, @pan3793 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]
dongjoon-hyun commented on PR #46512: URL: https://github.com/apache/spark/pull/46512#issuecomment-2103835797 Ya, as @nchammas mentioned, it seems that we missed to bump Python to 3.9 in `spark-rm` in the following PR. - #46228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]
nchammas commented on PR #46512: URL: https://github.com/apache/spark/pull/46512#issuecomment-2103835083 Yes, and we dropped support for Python 3.8 in #46228. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]
dongjoon-hyun commented on PR #46512: URL: https://github.com/apache/spark/pull/46512#issuecomment-2103834606 It seems that `files` attribute is added at Python 3.9, but the running python version is `Python 3.8`, @cloud-fan . - https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47441][YARN] Do not add log link for unmanaged AM in Spark UI [spark]
dongjoon-hyun commented on PR #45565: URL: https://github.com/apache/spark/pull/45565#issuecomment-2103823764 Also, cc @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47847][CORE] Deprecate spark.network.remoteReadNioBufferConversion [spark]
dongjoon-hyun commented on PR #46047: URL: https://github.com/apache/spark/pull/46047#issuecomment-2103818630 BTW, we need to give enough time to report the issue from users. So, we cannot delete this configuration at Apache Spark 4.0.0 because Apache Spark 3.5.2 is not released yet and we need to wait for one whole release after this. > there are no negative reports, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47847][CORE] Deprecate spark.network.remoteReadNioBufferConversion [spark]
dongjoon-hyun commented on PR #46047: URL: https://github.com/apache/spark/pull/46047#issuecomment-2103817471 To @pan3793 , I rethink about this. > I fill the deprecated message with "Not used anymore", to be consistent with existing items > > ``` > DeprecatedConfig("spark.yarn.am.port", "2.0.0", "Not used anymore"), > DeprecatedConfig("spark.executor.port", "2.0.0", "Not used anymore"), > ... > ``` Since `remoteReadNioBufferConversion` is used in the code, `Not used anymore` is technically wrong. Shall we ask the user to report like the following? https://github.com/apache/spark/blob/1138b2a68b5408e6d079bdbce8026323694628e5/sql/core/src/main/scala/org/apache/spark/sql/execution/analysis/DetectAmbiguousSelfJoin.scala#L101 For example, we can use `Please open a JIRA ticket to report it if you need to use this configuration.`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]
cloud-fan commented on PR #46512: URL: https://github.com/apache/spark/pull/46512#issuecomment-2103814645 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]
cloud-fan commented on PR #46512: URL: https://github.com/apache/spark/pull/46512#issuecomment-2103814454 the Bundler issue is resolved, but I hit a new issue for generating pyspark docs ``` Configuration error: There is a programmable error in your configuration file: Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/sphinx/config.py", line 332, in eval_config_file exec(code, namespace) File "/opt/spark-rm/output/spark/python/docs/source/conf.py", line 27, in from pyspark.pandas.supported_api_gen import generate_supported_api File "/opt/spark-rm/output/spark/python/pyspark/__init__.py", line 53, in from pyspark.util import is_remote_only File "/opt/spark-rm/output/spark/python/pyspark/util.py", line 33, in from pyspark.errors import PySparkRuntimeError File "/opt/spark-rm/output/spark/python/pyspark/errors/__init__.py", line 21, in from pyspark.errors.exceptions.base import ( # noqa: F401 File "/opt/spark-rm/output/spark/python/pyspark/errors/exceptions/base.py", line 21, in from pyspark.errors.utils import ErrorClassesReader File "/opt/spark-rm/output/spark/python/pyspark/errors/utils.py", line 23, in from pyspark.errors.error_classes import ERROR_CLASSES_MAP File "/opt/spark-rm/output/spark/python/pyspark/errors/error_classes.py", line 26, in importlib.resources AttributeError: module 'importlib.resources' has no attribute 'files' ``` maybe there is some python library version inconsistency between Github Action and release docker -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [MINOR][BUILD] Remove duplicate configuration of maven-compiler-plugin [spark]
zml1206 commented on PR #46024: URL: https://github.com/apache/spark/pull/46024#issuecomment-2103810977 Thank you for review. @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [MINOR][BUILD] Remove duplicate configuration of maven-compiler-plugin [spark]
dongjoon-hyun closed pull request #46024: [MINOR][BUILD] Remove duplicate configuration of maven-compiler-plugin URL: https://github.com/apache/spark/pull/46024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [MINOR][BUILD] Remove duplicate configuration of maven-compiler-plugin [spark]
dongjoon-hyun commented on PR #46024: URL: https://github.com/apache/spark/pull/46024#issuecomment-2103808514 Sorry for being late. I missed your ping here, @zml1206 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48228][PYTHON][CONNECT] Implement the missing function validation in ApplyInXXX [spark]
zhengruifeng commented on code in PR #46519: URL: https://github.com/apache/spark/pull/46519#discussion_r1596217131 ## python/pyspark/sql/connect/group.py: ## @@ -34,6 +34,7 @@ from pyspark.util import PythonEvalType from pyspark.sql.group import GroupedData as PySparkGroupedData from pyspark.sql.pandas.group_ops import PandasCogroupedOps as PySparkPandasCogroupedOps +from pyspark.sql.pandas.functions import _validate_pandas_udf # type: ignore[attr-defined] Review Comment: Spark Classic invoke `pandas_udf` in Pandas Functions (ApplyInXXX), `pandas_udf` includes the function validation. While in Spark Connect, we can not use `pandas_udf` due to the differences in underlying implementations: `pandas_udf` returns a wrapper while Spark Connect requires a `UserDefinedFunction` object. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-48228][PYTHON][CONNECT] Implement the missing function validation in ApplyInXXX [spark]
zhengruifeng opened a new pull request, #46519: URL: https://github.com/apache/spark/pull/46519 ### What changes were proposed in this pull request? Implement the missing function validation in ApplyInXXX https://github.com/apache/spark/pull/46397 fixed this issue for `Cogrouped.ApplyInPandas`, this PR fix remaining methods. ### Why are the changes needed? for better error message: ``` In [12]: df1 = spark.range(11) In [13]: df2 = df1.groupby("id").applyInPandas(lambda: 1, StructType([StructField("d", DoubleType())])) In [14]: df2.show() ``` before this PR, an invalid function causes weird execution errors: ``` 24/05/10 11:37:36 ERROR Executor: Exception in task 0.0 in stage 10.0 (TID 36) org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1834, in main process() File "/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1826, in process serializer.dump_stream(out_iter, outfile) File "/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 531, in dump_stream return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream) File "/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 104, in dump_stream for batch in iterator: File "/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 524, in init_stream_yield_batches for series in iterator: File "/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py", line 1610, in mapper return f(keys, vals) ^ File "/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py", line 488, in return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))] ^ File "/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py", line 483, in wrapped result, return_type, _assign_cols_by_name, truncate_return_schema=False ^^ UnboundLocalError: cannot access local variable 'result' where it is not associated with a value at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:523) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:117) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:479) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:601) at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:896) ... ``` After this PR, the error happens before execution, which is consistent with Spark Classic, and much clear ``` PySparkValueError: [INVALID_PANDAS_UDF] Invalid function: the function in groupby.applyInArrow must take either one argument (data) or two arguments (key, data). ``` ### Does this PR introduce _any_ user-facing change? yes, error message changes ### How was this patch tested? added tests ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47834][SQL][CONNECT] Mark deprecated functions with `@deprecated` in `SQLImplicits` [spark]
dongjoon-hyun commented on PR #46029: URL: https://github.com/apache/spark/pull/46029#issuecomment-2103806851 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47834][SQL][CONNECT] Mark deprecated functions with `@deprecated` in `SQLImplicits` [spark]
dongjoon-hyun closed pull request #46029: [SPARK-47834][SQL][CONNECT] Mark deprecated functions with `@deprecated` in `SQLImplicits` URL: https://github.com/apache/spark/pull/46029 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47954][K8S] Support creating ingress entry for external UI access [spark]
dongjoon-hyun commented on PR #46184: URL: https://github.com/apache/spark/pull/46184#issuecomment-2103803949 Just FYI, please take your time. We can target this for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48144][SQL] Fix `canPlanAsBroadcastHashJoin` to respect shuffle join hints [spark]
dongjoon-hyun commented on PR #46401: URL: https://github.com/apache/spark/pull/46401#issuecomment-2103802765 Gentle ping, @fred-db . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47995][PYTHON][INFRA][TESTS] Upgrade `pyarrow` to 16.0.0 in GitHub Action CI [spark]
dongjoon-hyun commented on PR #46232: URL: https://github.com/apache/spark/pull/46232#issuecomment-2103795700 This is still blocked by `mlflow 2.12.2` ``` mlflow 2.12.2 requires pyarrow<16,>=4.0.0, but you have pyarrow 16.0.0 which is incompatible. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-27900][CORE][K8s] Add `spark.driver.killOnOOMError` flag in cluster mode [spark]
dimon222 commented on PR #26161: URL: https://github.com/apache/spark/pull/26161#issuecomment-2103793011 Was this ever fixed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48225][BUILD] Upgrade `sbt` to 1.10.0 [spark]
dongjoon-hyun closed pull request #46517: [SPARK-48225][BUILD] Upgrade `sbt` to 1.10.0 URL: https://github.com/apache/spark/pull/46517 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48176][SQL] Adjust name of FIELD_ALREADY_EXISTS error condition [spark]
HyukjinKwon closed pull request #46510: [SPARK-48176][SQL] Adjust name of FIELD_ALREADY_EXISTS error condition URL: https://github.com/apache/spark/pull/46510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48176][SQL] Adjust name of FIELD_ALREADY_EXISTS error condition [spark]
HyukjinKwon commented on PR #46510: URL: https://github.com/apache/spark/pull/46510#issuecomment-2103788152 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]
LuciferYang commented on PR #46502: URL: https://github.com/apache/spark/pull/46502#issuecomment-2103783129 Or how about having these modules depend on the `common/utils` module? `common/utils` doesn't seem to be a heavyweight module. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]
dongjoon-hyun commented on PR #46468: URL: https://github.com/apache/spark/pull/46468#issuecomment-2103781151 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48225][BUILD] Upgrade `sbt` to 1.10.0 [spark]
dongjoon-hyun commented on PR #46517: URL: https://github.com/apache/spark/pull/46517#issuecomment-2103780307 Thank you so much for sharing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48197][SQL][TESTS][FOLLOWUP][3.5] Regenerate golden files [spark]
cloud-fan commented on PR #46514: URL: https://github.com/apache/spark/pull/46514#issuecomment-2103779630 thanks for the fix! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on PR #46493: URL: https://github.com/apache/spark/pull/46493#issuecomment-2103774486 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48225][BUILD] Upgrade `sbt` to 1.10.0 [spark]
LuciferYang commented on PR #46517: URL: https://github.com/apache/spark/pull/46517#issuecomment-2103771513 @dongjoon-hyun I was discussing this issue with @panbingkun offline yesterday. From the responses of sbt and coursier, it seems difficult to solve this problem in the short term without some handling of `Resolver.mavenLocal`. I suggest that @panbingkun conduct some performance tests to determine whether this issue can be bypassed by disabling `Resolver.mavenLocal`: - https://github.com/sbt/sbt/issues/7506#issuecomment-1972591578 - https://github.com/coursier/coursier/issues/2942 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596192395 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1999,7 +2042,9 @@ private AppPathsInfo( this.subDirsPerLocalDir = subDirsPerLocalDir; if (logger.isInfoEnabled()) { logger.info("Updated active local dirs {} and sub dirs {} for application {}", - Arrays.toString(activeLocalDirs),subDirsPerLocalDir, appId); + MDC.of(LogKeys.LOCAL_DIRS$.MODULE$, Arrays.toString(activeLocalDirs)), + MDC.of(LogKeys.NUM_SUB_DIRS$.MODULE$, subDirsPerLocalDir), Review Comment: The `subDirsPerLocalDir` is actually a `numbe`r (the number of sub directories), not a sub directory `path`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48226][BUILD] Add `spark-ganglia-lgpl` to `lint-java` & `spark-ganglia-lgpl` and `jvm-profiler` to `sbt-checkstyle` [spark]
LuciferYang commented on PR #46501: URL: https://github.com/apache/spark/pull/46501#issuecomment-2103760020 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596186119 ## common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java: ## @@ -363,7 +367,8 @@ static MergedShuffleFileManager newMergedShuffleFileManagerInstance( return mergeManagerSubClazz.getConstructor(TransportConf.class, File.class) .newInstance(conf, mergeManagerFile); } catch (Exception e) { - defaultLogger.error("Unable to create an instance of {}", mergeManagerImplClassName); + defaultLogger.error("Unable to create an instance of {}", +MDC.of(LogKeys.CLASS_NAME$.MODULE$, mergeManagerImplClassName)); Review Comment: Or call `MERGED_SHUFFLE_FILE_MANAGER` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]
srielau commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596184905 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -945,54 +945,73 @@ class SessionCatalog( throw QueryCompilationErrors.invalidViewText(viewText, metadata.qualifiedName) } } -val projectList = if (!isHiveCreatedView(metadata)) { - val viewColumnNames = if (metadata.viewQueryColumnNames.isEmpty) { -// For view created before Spark 2.2.0, the view text is already fully qualified, the plan -// output is the same with the view output. -metadata.schema.fieldNames.toImmutableArraySeq - } else { -assert(metadata.viewQueryColumnNames.length == metadata.schema.length) -metadata.viewQueryColumnNames - } +val schemaMode = metadata.viewSchemaMode +if (schemaMode == SchemaEvolution) { + View(desc = metadata, isTempView = isTempView, child = parsedPlan) +} else { + val projectList = if (!isHiveCreatedView(metadata)) { +val viewColumnNames = if (metadata.viewQueryColumnNames.isEmpty) { + // For view created before Spark 2.2.0, the view text is already fully qualified, the plan + // output is the same with the view output. + metadata.schema.fieldNames.toImmutableArraySeq +} else { + assert(metadata.viewQueryColumnNames.length == metadata.schema.length) + metadata.viewQueryColumnNames +} - // For view queries like `SELECT * FROM t`, the schema of the referenced table/view may - // change after the view has been created. We need to add an extra SELECT to pick the columns - // according to the recorded column names (to get the correct view column ordering and omit - // the extra columns that we don't require), with UpCast (to make sure the type change is - // safe) and Alias (to respect user-specified view column names) according to the view schema - // in the catalog. - // Note that, the column names may have duplication, e.g. `CREATE VIEW v(x, y) AS - // SELECT 1 col, 2 col`. We need to make sure that the matching attributes have the same - // number of duplications, and pick the corresponding attribute by ordinal. - val viewConf = View.effectiveSQLConf(metadata.viewSQLConfigs, isTempView) - val normalizeColName: String => String = if (viewConf.caseSensitiveAnalysis) { -identity +// For view queries like `SELECT * FROM t`, the schema of the referenced table/view may +// change after the view has been created. We need to add an extra SELECT to pick the +// columns according to the recorded column names (to get the correct view column ordering +// and omit the extra columns that we don't require), with UpCast (to make sure the type +// change is safe) and Alias (to respect user-specified view column names) according to the +// view schema in the catalog. +// Note that, the column names may have duplication, e.g. `CREATE VIEW v(x, y) AS +// SELECT 1 col, 2 col`. We need to make sure that the matching attributes have the same +// number of duplications, and pick the corresponding attribute by ordinal. +val viewConf = View.effectiveSQLConf(metadata.viewSQLConfigs, isTempView) +val normalizeColName: String => String = if (viewConf.caseSensitiveAnalysis) { + identity +} else { + _.toLowerCase(Locale.ROOT) +} +val nameToCounts = viewColumnNames.groupBy(normalizeColName).transform((_, v) => v.length) +val nameToCurrentOrdinal = scala.collection.mutable.HashMap.empty[String, Int] +val viewDDL = buildViewDDL(metadata, isTempView) + +viewColumnNames.zip(metadata.schema).map { case (name, field) => + val normalizedName = normalizeColName(name) + val count = nameToCounts(normalizedName) + val ordinal = nameToCurrentOrdinal.getOrElse(normalizedName, 0) + nameToCurrentOrdinal(normalizedName) = ordinal + 1 + val col = GetViewColumnByNameAndOrdinal( +metadata.identifier.toString, name, ordinal, count, viewDDL) + val cast = schemaMode match { +/* +** For schema binding, we cast the column to the expected type using safe cast only. +** For legacy behavior, we cast the column to the expected type using safe cast only. +** For schema compensation, we cast the column to the expected type using any cast +* in ansi mode. +** For schema (type) evolution, we take teh column as is. +*/ +case SchemaBinding => UpCast(col, field.dataType) +case SchemaUnsupported => UpCast(col, field.dataType) +case SchemaCompensation => Cast(col, field.dataType, ansiEnabled = true) +
Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]
srielau commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596184739 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -945,54 +945,73 @@ class SessionCatalog( throw QueryCompilationErrors.invalidViewText(viewText, metadata.qualifiedName) } } -val projectList = if (!isHiveCreatedView(metadata)) { - val viewColumnNames = if (metadata.viewQueryColumnNames.isEmpty) { -// For view created before Spark 2.2.0, the view text is already fully qualified, the plan -// output is the same with the view output. -metadata.schema.fieldNames.toImmutableArraySeq - } else { -assert(metadata.viewQueryColumnNames.length == metadata.schema.length) -metadata.viewQueryColumnNames - } +val schemaMode = metadata.viewSchemaMode +if (schemaMode == SchemaEvolution) { + View(desc = metadata, isTempView = isTempView, child = parsedPlan) +} else { + val projectList = if (!isHiveCreatedView(metadata)) { +val viewColumnNames = if (metadata.viewQueryColumnNames.isEmpty) { + // For view created before Spark 2.2.0, the view text is already fully qualified, the plan + // output is the same with the view output. + metadata.schema.fieldNames.toImmutableArraySeq +} else { + assert(metadata.viewQueryColumnNames.length == metadata.schema.length) + metadata.viewQueryColumnNames +} - // For view queries like `SELECT * FROM t`, the schema of the referenced table/view may - // change after the view has been created. We need to add an extra SELECT to pick the columns - // according to the recorded column names (to get the correct view column ordering and omit - // the extra columns that we don't require), with UpCast (to make sure the type change is - // safe) and Alias (to respect user-specified view column names) according to the view schema - // in the catalog. - // Note that, the column names may have duplication, e.g. `CREATE VIEW v(x, y) AS - // SELECT 1 col, 2 col`. We need to make sure that the matching attributes have the same - // number of duplications, and pick the corresponding attribute by ordinal. - val viewConf = View.effectiveSQLConf(metadata.viewSQLConfigs, isTempView) - val normalizeColName: String => String = if (viewConf.caseSensitiveAnalysis) { -identity +// For view queries like `SELECT * FROM t`, the schema of the referenced table/view may +// change after the view has been created. We need to add an extra SELECT to pick the +// columns according to the recorded column names (to get the correct view column ordering +// and omit the extra columns that we don't require), with UpCast (to make sure the type +// change is safe) and Alias (to respect user-specified view column names) according to the +// view schema in the catalog. +// Note that, the column names may have duplication, e.g. `CREATE VIEW v(x, y) AS +// SELECT 1 col, 2 col`. We need to make sure that the matching attributes have the same +// number of duplications, and pick the corresponding attribute by ordinal. +val viewConf = View.effectiveSQLConf(metadata.viewSQLConfigs, isTempView) +val normalizeColName: String => String = if (viewConf.caseSensitiveAnalysis) { + identity +} else { + _.toLowerCase(Locale.ROOT) +} +val nameToCounts = viewColumnNames.groupBy(normalizeColName).transform((_, v) => v.length) +val nameToCurrentOrdinal = scala.collection.mutable.HashMap.empty[String, Int] +val viewDDL = buildViewDDL(metadata, isTempView) + +viewColumnNames.zip(metadata.schema).map { case (name, field) => + val normalizedName = normalizeColName(name) + val count = nameToCounts(normalizedName) + val ordinal = nameToCurrentOrdinal.getOrElse(normalizedName, 0) + nameToCurrentOrdinal(normalizedName) = ordinal + 1 + val col = GetViewColumnByNameAndOrdinal( +metadata.identifier.toString, name, ordinal, count, viewDDL) + val cast = schemaMode match { +/* +** For schema binding, we cast the column to the expected type using safe cast only. +** For legacy behavior, we cast the column to the expected type using safe cast only. +** For schema compensation, we cast the column to the expected type using any cast +* in ansi mode. +** For schema (type) evolution, we take teh column as is. +*/ +case SchemaBinding => UpCast(col, field.dataType) +case SchemaUnsupported => UpCast(col, field.dataType) +case SchemaCompensation => Cast(col, field.dataType, ansiEnabled = true) +
Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]
srielau commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596180917 ## sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala: ## @@ -224,6 +224,7 @@ abstract class BaseSessionStateBuilder( TableCapabilityCheck +: CommandCheck +: CollationCheck +: +SyncViewsCheck +: Review Comment: "ViewSyncSchemaToMetaStore" coming -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596176953 ## common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java: ## @@ -298,7 +303,9 @@ public void onFailure(Throwable e) { }); } catch (Exception e) { logger.error("Error while invoking receiveMergeBlockMetaReq() for appId {} shuffleId {} " -+ "reduceId {}", req.appId, req.shuffleId, req.appId, e); Review Comment: fix typo `reduceId {req.appId}` -> `reduceId {req.reduceId}` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children [spark]
zml1206 commented on PR #46135: URL: https://github.com/apache/spark/pull/46135#issuecomment-2103726355 @peter-toth Can this PR be merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596171652 ## common/network-common/src/main/java/org/apache/spark/network/ssl/ReloadingX509TrustManager.java: ## @@ -211,13 +210,13 @@ public void run() { this.reloadCount += 1; } catch (Exception ex) { logger.warn( - "Could not load truststore (keep using existing one) : " + ex.toString(), Review Comment: Remove redundancy `ex.toString()` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]
cloud-fan commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596169480 ## sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala: ## @@ -224,6 +224,7 @@ abstract class BaseSessionStateBuilder( TableCapabilityCheck +: CommandCheck +: CollationCheck +: +SyncViewsCheck +: Review Comment: Unfortunately we don't have an analyzer extension point to run at the very end of the analysis phase. We can add one later, but for now `CheckAnlysis` is the best place to do it. Mabybe we can rename this rule to indicate that it has side effects? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]
pan3793 commented on PR #45201: URL: https://github.com/apache/spark/pull/45201#issuecomment-2103713964 @dongjoon-hyun Jackson 1.x can be removed after SPARK-47018 (bump Hive 2.3.10), what should we do for `hive-jackson-provided`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596167842 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java: ## @@ -472,7 +487,8 @@ static ConcurrentMap reloadRegisteredExecutors(D break; } AppExecId id = parseDbAppExecKey(key); - logger.info("Reloading registered executors: " + id.toString()); + logger.info("Reloading registered executors: {}", +MDC.of(LogKeys.APP_EXECUTOR_ID$.MODULE$, id.toString())); Review Comment: Remove redundancy `.toString()` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]
cloud-fan closed pull request #46512: [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file URL: https://github.com/apache/spark/pull/46512 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596166648 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java: ## @@ -368,7 +382,8 @@ public int removeBlocks(String appId, String execId, String[] blockIds) { if (file.delete()) { numRemovedBlocks++; } else { -logger.warn("Failed to delete block: " + file.getAbsolutePath()); +logger.warn("Failed to delete block: {}", Review Comment: I'm not sure it's appropriate to call this `LogKeys.PATH$.MODULE$`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]
cloud-fan commented on PR #46512: URL: https://github.com/apache/spark/pull/46512#issuecomment-2103711389 thanks, merging to master! (it's easier for me to test after merging it) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]
pan3793 commented on PR #46468: URL: https://github.com/apache/spark/pull/46468#issuecomment-2103709988 Hive 2.3.10 jars should be available on Google Maven Central Mirror now, re-triggered CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48219][CORE] StreamReader Charset fix with UTF8 [spark]
xuzifu666 commented on PR #46509: URL: https://github.com/apache/spark/pull/46509#issuecomment-2103709532 > Do you think you can provide a test coverage to protect your contribution from potential future regression, @xuzifu666 ? > > > Not need @dongjoon-hyun Thanks for you attentions,In my option this code change not need to provide tests for it's a specification for ReadStream usage,if not set utf8 charset may occur error when system default charset not contains Chinese Chars. You can refer it in other framework such as Calcite,Hive,all set utf8 when this method be called. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596163455 ## common/utils/src/main/java/org/apache/spark/internal/LoggerFactory.java: ## @@ -19,6 +19,11 @@ public class LoggerFactory { + public static Logger getLogger(String name) { Review Comment: `YarnShuffleService` will use it: https://github.com/apache/spark/assets/15246973/8a4e79bd-43e2-4995-9e0a-57a047bd1e50;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596162272 ## core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala: ## @@ -30,7 +30,7 @@ import com.codahale.metrics.{Metric, MetricSet} import org.apache.spark.{SecurityManager, SparkConf} import org.apache.spark.ExecutorDeadException -import org.apache.spark.internal.config +import org.apache.spark.internal.{config, LogKeys, MDC} Review Comment: Because `NettyBlockTransferService` extends `BlockTransferService` `BlockTransferService` is java code, and this change `triggered` it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]
pan3793 commented on code in PR #46468: URL: https://github.com/apache/spark/pull/46468#discussion_r1596161241 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala: ## @@ -211,7 +211,7 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { tryDownloadSpark(version, sparkTestingDir.getCanonicalPath) } - // Extract major.minor for testing Spark 3.1.x and 3.0.x with metastore 2.3.9 and Java 11. + // Extract major.minor for testing Spark 3.1.x and 3.0.x with metastore 2.3.10 and Java 11. Review Comment: removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596158815 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala: ## @@ -200,7 +200,7 @@ private[sql] object GrpcRetryHandler extends Logging { if (time.isDefined) { logWarning( log"Non-Fatal error during RPC execution: ${MDC(ERROR, lastException)}, " + - log"retrying (wait=${MDC(WAIT_TIME, time.get.toMillis)} ms, " + Review Comment: Unify `WAIT_TIME` into `RETRY_WAIT_TIME` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596157433 ## common/utils/src/main/java/org/apache/spark/internal/LoggerFactory.java: ## @@ -19,6 +19,11 @@ public class LoggerFactory { + public static Logger getLogger(String name) { Review Comment: `NettyLogger` use it https://github.com/apache/spark/assets/15246973/e11bf41b-9dd8-4fcf-81a2-77986f15d8b1;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596157433 ## common/utils/src/main/java/org/apache/spark/internal/LoggerFactory.java: ## @@ -19,6 +19,11 @@ public class LoggerFactory { + public static Logger getLogger(String name) { Review Comment: `NettyLogger` use it https://github.com/apache/spark/assets/15246973/e11bf41b-9dd8-4fcf-81a2-77986f15d8b1;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596151348 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java: ## @@ -177,10 +179,16 @@ private void transferAllOutstanding() { try { transferStarter.createAndStart(blockIdsToTransfer, myListener); } catch (Exception e) { - logger.error(String.format("Exception while beginning %s of %s outstanding blocks %s", -listener.getTransferType(), blockIdsToTransfer.length, -numRetries > 0 ? "(after " + numRetries + " retries)" : ""), e); - + if (numRetries > 0) { Review Comment: Print different log message according to the value of `numRetries` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1596148922 ## common/network-common/src/main/java/org/apache/spark/network/ssl/SSLFactory.java: ## @@ -136,7 +135,7 @@ public void destroy() { try { manager.destroy(); } catch (InterruptedException ex) { -logger.info("Interrupted while destroying trust manager: " + ex.toString(), ex); Review Comment: Remove redundant `ex.toString()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]
gengliangwang commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596122256 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -945,54 +945,73 @@ class SessionCatalog( throw QueryCompilationErrors.invalidViewText(viewText, metadata.qualifiedName) } } -val projectList = if (!isHiveCreatedView(metadata)) { - val viewColumnNames = if (metadata.viewQueryColumnNames.isEmpty) { -// For view created before Spark 2.2.0, the view text is already fully qualified, the plan -// output is the same with the view output. -metadata.schema.fieldNames.toImmutableArraySeq - } else { -assert(metadata.viewQueryColumnNames.length == metadata.schema.length) -metadata.viewQueryColumnNames - } +val schemaMode = metadata.viewSchemaMode +if (schemaMode == SchemaEvolution) { + View(desc = metadata, isTempView = isTempView, child = parsedPlan) +} else { + val projectList = if (!isHiveCreatedView(metadata)) { +val viewColumnNames = if (metadata.viewQueryColumnNames.isEmpty) { + // For view created before Spark 2.2.0, the view text is already fully qualified, the plan + // output is the same with the view output. + metadata.schema.fieldNames.toImmutableArraySeq +} else { + assert(metadata.viewQueryColumnNames.length == metadata.schema.length) + metadata.viewQueryColumnNames +} - // For view queries like `SELECT * FROM t`, the schema of the referenced table/view may - // change after the view has been created. We need to add an extra SELECT to pick the columns - // according to the recorded column names (to get the correct view column ordering and omit - // the extra columns that we don't require), with UpCast (to make sure the type change is - // safe) and Alias (to respect user-specified view column names) according to the view schema - // in the catalog. - // Note that, the column names may have duplication, e.g. `CREATE VIEW v(x, y) AS - // SELECT 1 col, 2 col`. We need to make sure that the matching attributes have the same - // number of duplications, and pick the corresponding attribute by ordinal. - val viewConf = View.effectiveSQLConf(metadata.viewSQLConfigs, isTempView) - val normalizeColName: String => String = if (viewConf.caseSensitiveAnalysis) { -identity +// For view queries like `SELECT * FROM t`, the schema of the referenced table/view may +// change after the view has been created. We need to add an extra SELECT to pick the +// columns according to the recorded column names (to get the correct view column ordering +// and omit the extra columns that we don't require), with UpCast (to make sure the type +// change is safe) and Alias (to respect user-specified view column names) according to the +// view schema in the catalog. +// Note that, the column names may have duplication, e.g. `CREATE VIEW v(x, y) AS +// SELECT 1 col, 2 col`. We need to make sure that the matching attributes have the same +// number of duplications, and pick the corresponding attribute by ordinal. +val viewConf = View.effectiveSQLConf(metadata.viewSQLConfigs, isTempView) +val normalizeColName: String => String = if (viewConf.caseSensitiveAnalysis) { + identity +} else { + _.toLowerCase(Locale.ROOT) +} +val nameToCounts = viewColumnNames.groupBy(normalizeColName).transform((_, v) => v.length) +val nameToCurrentOrdinal = scala.collection.mutable.HashMap.empty[String, Int] +val viewDDL = buildViewDDL(metadata, isTempView) + +viewColumnNames.zip(metadata.schema).map { case (name, field) => + val normalizedName = normalizeColName(name) + val count = nameToCounts(normalizedName) + val ordinal = nameToCurrentOrdinal.getOrElse(normalizedName, 0) + nameToCurrentOrdinal(normalizedName) = ordinal + 1 + val col = GetViewColumnByNameAndOrdinal( +metadata.identifier.toString, name, ordinal, count, viewDDL) + val cast = schemaMode match { +/* +** For schema binding, we cast the column to the expected type using safe cast only. +** For legacy behavior, we cast the column to the expected type using safe cast only. +** For schema compensation, we cast the column to the expected type using any cast +* in ansi mode. +** For schema (type) evolution, we take teh column as is. +*/ +case SchemaBinding => UpCast(col, field.dataType) +case SchemaUnsupported => UpCast(col, field.dataType) +case SchemaCompensation => Cast(col, field.dataType, ansiEnabled = true) +
Re: [PR] [SPARK-44609][K8S] Remove executor pod from PodsAllocator if it was removed from scheduler backend [spark]
github-actions[bot] closed pull request #42297: [SPARK-44609][K8S] Remove executor pod from PodsAllocator if it was removed from scheduler backend URL: https://github.com/apache/spark/pull/42297 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46885][SQL] Push down filters through `TypedFilter` [spark]
github-actions[bot] closed pull request #44911: [SPARK-46885][SQL] Push down filters through `TypedFilter` URL: https://github.com/apache/spark/pull/44911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46108][SQL] keepInnerXmlAsRaw option for Built-in XML Data Source [spark]
github-actions[bot] commented on PR #44022: URL: https://github.com/apache/spark/pull/44022#issuecomment-2103637588 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP] docs: restructure the docs index page [spark]
github-actions[bot] commented on PR #44812: URL: https://github.com/apache/spark/pull/44812#issuecomment-2103637571 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45708][BUILD] Retry mvn deploy [spark]
github-actions[bot] commented on PR #43559: URL: https://github.com/apache/spark/pull/43559#issuecomment-2103637610 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]
panbingkun commented on PR #46502: URL: https://github.com/apache/spark/pull/46502#issuecomment-2103636458 > I am +1 for the idea. However, I wonder if there will be suggestions about why the two imports are not allowed and how to fix the style error. If that's not feasible with `IllegalImport`, shall we use `RegexpSinglelineJava` and show proper suggestions instead? Well, it makes sense, the `illegalimport` does not provide a friendly prompt information interface, let me to update it later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos [spark]
zhengruifeng commented on PR #46518: URL: https://github.com/apache/spark/pull/46518#issuecomment-2103633099 thanks @HyukjinKwon and @dongjoon-hyun for reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]
gengliangwang commented on PR #46502: URL: https://github.com/apache/spark/pull/46502#issuecomment-2103626513 I am +1 for the idea. However, I wonder if there will be suggestions about why the two imports are not allowed and how to fix the style error. If that's not feasible with `IllegalImport`, shall we use `RegexpSinglelineJava` and show proper suggestions instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos [spark]
dongjoon-hyun commented on PR #46518: URL: https://github.com/apache/spark/pull/46518#issuecomment-2103621276 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos [spark]
dongjoon-hyun closed pull request #46518: [SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos URL: https://github.com/apache/spark/pull/46518 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]
dongjoon-hyun commented on PR #46502: URL: https://github.com/apache/spark/pull/46502#issuecomment-2103617009 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]
panbingkun commented on PR #46502: URL: https://github.com/apache/spark/pull/46502#issuecomment-2103613069 - According to @gengliangwang's suggestion, we did not migrate the `test` code in the `structured log`, so we need to exclude them, eg: https://github.com/apache/spark/assets/15246973/374ec683-5f13-439a-bafa-b7deafdb23dd;> - Other exclusion is: `common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java` https://github.com/apache/spark/assets/15246973/8068a294-a631-4ccc-a081-052f72aeb43a;> A.the module `common/kvstore`, because it does not rely on 'utils' when compiling, if we want to `migrate` it, we need to add a `dependency` on 'utils' in `pom.xml` https://github.com/apache/spark/blob/master/common/kvstore/pom.xml#L38-L66 B.And only one place in this module use 'slf4j', as shown below: https://github.com/apache/spark/blob/e1fb1d7e063af7e8eb6e992c800902aff6e19e15/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java#L324 And we found that this `error` level log does not use `variables` So at present, it seems that migration is `not necessary`. Of course, migration is also possible. - After complete the `last` structured log migration pr https://github.com/apache/spark/pull/46493/files on the java side, this rule should be applied to the spark code base (I have tested it on local env) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47793][TEST][FOLLOWUP] Fix flaky test for Python data source exactly once. [spark]
dongjoon-hyun commented on PR #46481: URL: https://github.com/apache/spark/pull/46481#issuecomment-2103611989 Could you do the final review and sign-off, please, @HyukjinKwon ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]
HyukjinKwon closed pull request #46451: [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs URL: https://github.com/apache/spark/pull/46451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]
HyukjinKwon commented on PR #46451: URL: https://github.com/apache/spark/pull/46451#issuecomment-2103611091 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48226][BUILD] Add `spark-ganglia-lgpl` to `lint-java` & `spark-ganglia-lgpl` and `jvm-profiler` to `sbt-checkstyle` [spark]
dongjoon-hyun closed pull request #46501: [SPARK-48226][BUILD] Add `spark-ganglia-lgpl` to `lint-java` & `spark-ganglia-lgpl` and `jvm-profiler` to `sbt-checkstyle` URL: https://github.com/apache/spark/pull/46501 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos [spark]
dongjoon-hyun commented on code in PR #46518: URL: https://github.com/apache/spark/pull/46518#discussion_r1596102379 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -467,7 +467,9 @@ message Sample { // (Optional) Whether to sample with replacement. optional bool with_replacement = 4; - // (Optional) The random seed. + // (Required) The random seed. + // This filed is required to avoid generate mutable dataframes (see SPARK-48184 for details), + // however, still keep it 'optional' here for backward compatibility. optional int64 seed = 5; Review Comment: Ya, this looks like inevitable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]
HyukjinKwon commented on PR #46408: URL: https://github.com/apache/spark/pull/46408#issuecomment-2103609498 btw you can trigger on your own https://github.com/eric-maynard/spark/runs/24789350525 I can't trigger :-). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]
HyukjinKwon closed pull request #46408: [SPARK-48148][CORE] JSON objects should not be modified when read as STRING URL: https://github.com/apache/spark/pull/46408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]
HyukjinKwon commented on PR #46408: URL: https://github.com/apache/spark/pull/46408#issuecomment-2103609100 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48089][SS][CONNECT] Fix 3.5 <> 4.0 StreamingQueryListener compatibility test [spark]
HyukjinKwon commented on PR #46513: URL: https://github.com/apache/spark/pull/46513#issuecomment-2103607909 Merged to branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48089][SS][CONNECT] Fix 3.5 <> 4.0 StreamingQueryListener compatibility test [spark]
HyukjinKwon closed pull request #46513: [SPARK-48089][SS][CONNECT] Fix 3.5 <> 4.0 StreamingQueryListener compatibility test URL: https://github.com/apache/spark/pull/46513 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos [spark]
zhengruifeng opened a new pull request, #46518: URL: https://github.com/apache/spark/pull/46518 ### What changes were proposed in this pull request? Document the requirement of seed in protos ### Why are the changes needed? the seed should be set at client side ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org