Re: [PR] [SPARK-47847][CORE] Deprecate `spark.network.remoteReadNioBufferConversion` [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46047:
URL: https://github.com/apache/spark/pull/46047#issuecomment-2103920027

   Merged to master/3.5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47847][CORE] Deprecate `spark.network.remoteReadNioBufferConversion` [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun closed pull request #46047: [SPARK-47847][CORE] Deprecate 
`spark.network.remoteReadNioBufferConversion`
URL: https://github.com/apache/spark/pull/46047


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47847][CORE] Deprecate `spark.network.remoteReadNioBufferConversion` [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46047:
URL: https://github.com/apache/spark/pull/46047#issuecomment-2103919298

   Since this is irrelevant to CI, I verified manually like the following.
   
   ```
   $ bin/spark-shell -c spark.network.remoteReadNioBufferConversion=true
   WARNING: Using incubator modules: jdk.incubator.vector
   24/05/09 22:54:07 WARN SparkConf: The configuration key 
'spark.network.remoteReadNioBufferConversion' has been deprecated as of Spark 
3.5.2 and may be removed in the future. Please open a JIRA ticket to report it 
if you need to use this configuration.
   24/05/09 22:54:07 WARN SparkConf: The configuration key 
'spark.network.remoteReadNioBufferConversion' has been deprecated as of Spark 
3.5.2 and may be removed in the future. Please open a JIRA ticket to report it 
if you need to use this configuration.
   24/05/09 22:54:07 WARN SparkConf: The configuration key 
'spark.network.remoteReadNioBufferConversion' has been deprecated as of Spark 
3.5.2 and may be removed in the future. Please open a JIRA ticket to report it 
if you need to use this configuration.
   24/05/09 22:54:07 INFO SignalUtils: Registering signal handler for INT
   24/05/09 22:54:07 WARN SparkConf: The configuration key 
'spark.network.remoteReadNioBufferConversion' has been deprecated as of Spark 
3.5.2 and may be removed in the future. Please open a JIRA ticket to report it 
if you need to use this configuration.
   Welcome to
   __
/ __/__  ___ _/ /__
   _\ \/ _ \/ _ `/ __/  '_/
  /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
 /_/
   
   Using Scala version 2.13.13 (OpenJDK 64-Bit Server VM, Java 21.0.3)
   Type in expressions to have them evaluated.
   Type :help for more information.
   24/05/09 22:54:09 WARN SparkConf: The configuration key 
'spark.network.remoteReadNioBufferConversion' has been deprecated as of Spark 
3.5.2 and may be removed in the future. Please open a JIRA ticket to report it 
if you need to use this configuration.
   24/05/09 22:54:09 WARN SparkConf: The configuration key 
'spark.network.remoteReadNioBufferConversion' has been deprecated as of Spark 
3.5.2 and may be removed in the future. Please open a JIRA ticket to report it 
if you need to use this configuration.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-48230][BUILD] Remove unused jodd-core [spark]

2024-05-09 Thread via GitHub


pan3793 opened a new pull request, #46520:
URL: https://github.com/apache/spark/pull/46520

   
   
   ### What changes were proposed in this pull request?
   
   Remove a jar that has CVE https://github.com/advisories/GHSA-jrg3-qq99-35g7
   
   ### Why are the changes needed?
   
   Previously, `jodd-core` came from Hive transitive deps, while 
https://github.com/apache/hive/pull/5151 (Hive 2.3.10) cut it out, so we can 
remove it from Spark now.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Pass GA.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48219][CORE] StreamReader Charset fix with UTF8 [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46509:
URL: https://github.com/apache/spark/pull/46509#issuecomment-2103908417

   Sorry but I'll leave this to the other reviewers, @xuzifu666 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48219][CORE] StreamReader Charset fix with UTF8 [spark]

2024-05-09 Thread via GitHub


xuzifu666 commented on PR #46509:
URL: https://github.com/apache/spark/pull/46509#issuecomment-2103906360

   @dongjoon-hyun Could you give a final review? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]

2024-05-09 Thread via GitHub


cloud-fan commented on PR #46512:
URL: https://github.com/apache/spark/pull/46512#issuecomment-2103907125

   I see, I'll install python 3.9 on the release docker image.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47847][CORE] Deprecate spark.network.remoteReadNioBufferConversion [spark]

2024-05-09 Thread via GitHub


pan3793 commented on PR #46047:
URL: https://github.com/apache/spark/pull/46047#issuecomment-2103900781

   @dongjoon-hyun thanks for your suggestion, updated the deprecated message, 
and we can consider removing it at 4.1.0 or later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48228][PYTHON][CONNECT] Implement the missing function validation in ApplyInXXX [spark]

2024-05-09 Thread via GitHub


zhengruifeng commented on PR #46519:
URL: https://github.com/apache/spark/pull/46519#issuecomment-2103877686

   @dongjoon-hyun and @HyukjinKwon thanks for reviews


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48201][DOCS][PYTHON] Make some corrections in the docstring of pyspark DataStreamReader methods [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46416:
URL: https://github.com/apache/spark/pull/46416#issuecomment-2103874418

   Welcome to the Apache Spark community, @chloeh13q . 
   I added you to the Apache Spark contributor group and assigned SPARK-48201 
to you.
   Congratulations for your first commit!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48201][DOCS][PYTHON] Make some corrections in the docstring of pyspark DataStreamReader methods [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun closed pull request #46416: [SPARK-48201][DOCS][PYTHON] Make some 
corrections in the docstring of pyspark DataStreamReader methods
URL: https://github.com/apache/spark/pull/46416


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] Fix previous reader checks in Vectorized DELTA_BYTE_ARRAY decoder [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on code in PR #46485:
URL: https://github.com/apache/spark/pull/46485#discussion_r1596258635


##
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java:
##
@@ -353,8 +353,9 @@ private void initDataReader(
   throw new IOException("could not read page in col " + descriptor, e);
 }
 // for PARQUET-246 (See VectorizedDeltaByteArrayReader.setPreviousValues)
-if (CorruptDeltaByteArrays.requiresSequentialReads(writerVersion, 
dataEncoding) &&
-previousReader instanceof RequiresPreviousReader) {
+if (CorruptDeltaByteArrays.requiresSequentialReads(writerVersion, 
dataEncoding)
+&& previousReader != null
+&& dataColumn instanceof RequiresPreviousReader) {

Review Comment:
   Could you file a JIRA issue, @yutsareva ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] Fix previous reader checks in Vectorized DELTA_BYTE_ARRAY decoder [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46485:
URL: https://github.com/apache/spark/pull/46485#issuecomment-2103871560

   cc @sunchao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48228][PYTHON][CONNECT] Implement the missing function validation in ApplyInXXX [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46519:
URL: https://github.com/apache/spark/pull/46519#issuecomment-2103868519

   Merged to master for Apache Spark 4.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47834][SQL][CONNECT] Mark deprecated functions with `@deprecated` in `SQLImplicits` [spark]

2024-05-09 Thread via GitHub


LuciferYang commented on PR #46029:
URL: https://github.com/apache/spark/pull/46029#issuecomment-2103868266

   Thanks @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48228][PYTHON][CONNECT] Implement the missing function validation in ApplyInXXX [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun closed pull request #46519: [SPARK-48228][PYTHON][CONNECT] 
Implement the missing function validation in ApplyInXXX
URL: https://github.com/apache/spark/pull/46519


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48210][DOC]Modify the description of whether dynamic partition… [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46496:
URL: https://github.com/apache/spark/pull/46496#issuecomment-2103866076

   cc @mridulm and @tgravescs 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48224][SQL] Disallow map keys from being of variant type [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun closed pull request #46516: [SPARK-48224][SQL] Disallow map keys 
from being of variant type
URL: https://github.com/apache/spark/pull/46516


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46468:
URL: https://github.com/apache/spark/pull/46468#issuecomment-2103844998

   Also, cc @cloud-fan and @HyukjinKwon 
   
   This fixes not only Hive dependency but also a long standing `libthrift` 
library issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46468:
URL: https://github.com/apache/spark/pull/46468#issuecomment-2103844347

   Merged to master!
   
   Thank you so much, @pan3793 and @sunchao .
   
   From now, many people will use Hive 2.3.10. I believe we can build more 
confidence before Apache Spark 4.0.0 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun closed pull request #46468: [SPARK-47018][BUILD][SQL] Bump 
built-in Hive to 2.3.10
URL: https://github.com/apache/spark/pull/46468


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [DRAFT][BUILD] Test upgrading built-in Hive to 2.3.10 [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun closed pull request #45372: [DRAFT][BUILD] Test upgrading 
built-in Hive to 2.3.10
URL: https://github.com/apache/spark/pull/45372


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #45201:
URL: https://github.com/apache/spark/pull/45201#issuecomment-2103839853

   It's supposed to be here as a last resort until we release Apache Spark 
4.0.0 successfully without reverting Hive 2.3.10, @pan3793 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46512:
URL: https://github.com/apache/spark/pull/46512#issuecomment-2103835797

   Ya, as @nchammas mentioned, it seems that we missed to bump Python to 3.9 in 
`spark-rm` in the following PR.
   - #46228


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]

2024-05-09 Thread via GitHub


nchammas commented on PR #46512:
URL: https://github.com/apache/spark/pull/46512#issuecomment-2103835083

   Yes, and we dropped support for Python 3.8 in #46228.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46512:
URL: https://github.com/apache/spark/pull/46512#issuecomment-2103834606

   It seems that `files` attribute is added at Python 3.9, but the running 
python version is `Python 3.8`, @cloud-fan .
   
   - 
https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47441][YARN] Do not add log link for unmanaged AM in Spark UI [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #45565:
URL: https://github.com/apache/spark/pull/45565#issuecomment-2103823764

   Also, cc @mridulm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47847][CORE] Deprecate spark.network.remoteReadNioBufferConversion [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46047:
URL: https://github.com/apache/spark/pull/46047#issuecomment-2103818630

   BTW, we need to give enough time to report the issue from users. So, we 
cannot delete this configuration at Apache Spark 4.0.0 because Apache Spark 
3.5.2 is not released yet and we need to wait for one whole release after this.
   >  there are no negative reports,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47847][CORE] Deprecate spark.network.remoteReadNioBufferConversion [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46047:
URL: https://github.com/apache/spark/pull/46047#issuecomment-2103817471

   To @pan3793 , I rethink about this.
   
   > I fill the deprecated message with "Not used anymore", to be consistent 
with existing items
   > 
   > ```
   > DeprecatedConfig("spark.yarn.am.port", "2.0.0", "Not used anymore"),
   > DeprecatedConfig("spark.executor.port", "2.0.0", "Not used anymore"),
   > ...
   > ```
   
   Since `remoteReadNioBufferConversion` is used in the code, `Not used 
anymore` is technically wrong.
   
   Shall we ask the user to report like the following?
   
   
https://github.com/apache/spark/blob/1138b2a68b5408e6d079bdbce8026323694628e5/sql/core/src/main/scala/org/apache/spark/sql/execution/analysis/DetectAmbiguousSelfJoin.scala#L101
   
   For example, we can use `Please open a JIRA ticket to report it if you need 
to use this configuration.`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]

2024-05-09 Thread via GitHub


cloud-fan commented on PR #46512:
URL: https://github.com/apache/spark/pull/46512#issuecomment-2103814645

   cc @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]

2024-05-09 Thread via GitHub


cloud-fan commented on PR #46512:
URL: https://github.com/apache/spark/pull/46512#issuecomment-2103814454

   the Bundler issue is resolved, but I hit a new issue for generating pyspark 
docs
   ```
   Configuration error:
   There is a programmable error in your configuration file:
   
   Traceback (most recent call last):
 File "/usr/local/lib/python3.8/dist-packages/sphinx/config.py", line 332, 
in eval_config_file
   exec(code, namespace)
 File "/opt/spark-rm/output/spark/python/docs/source/conf.py", line 27, in 

   from pyspark.pandas.supported_api_gen import generate_supported_api
 File "/opt/spark-rm/output/spark/python/pyspark/__init__.py", line 53, in 

   from pyspark.util import is_remote_only
 File "/opt/spark-rm/output/spark/python/pyspark/util.py", line 33, in 

   from pyspark.errors import PySparkRuntimeError
 File "/opt/spark-rm/output/spark/python/pyspark/errors/__init__.py", line 
21, in 
   from pyspark.errors.exceptions.base import (  # noqa: F401
 File 
"/opt/spark-rm/output/spark/python/pyspark/errors/exceptions/base.py", line 21, 
in 
   from pyspark.errors.utils import ErrorClassesReader
 File "/opt/spark-rm/output/spark/python/pyspark/errors/utils.py", line 23, 
in 
   from pyspark.errors.error_classes import ERROR_CLASSES_MAP
 File "/opt/spark-rm/output/spark/python/pyspark/errors/error_classes.py", 
line 26, in 
   importlib.resources
   AttributeError: module 'importlib.resources' has no attribute 'files'
   ```
   
   maybe there is some python library version inconsistency between Github 
Action and release docker


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [MINOR][BUILD] Remove duplicate configuration of maven-compiler-plugin [spark]

2024-05-09 Thread via GitHub


zml1206 commented on PR #46024:
URL: https://github.com/apache/spark/pull/46024#issuecomment-2103810977

   Thank you for review. @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [MINOR][BUILD] Remove duplicate configuration of maven-compiler-plugin [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun closed pull request #46024: [MINOR][BUILD] Remove duplicate 
configuration of maven-compiler-plugin
URL: https://github.com/apache/spark/pull/46024


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [MINOR][BUILD] Remove duplicate configuration of maven-compiler-plugin [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46024:
URL: https://github.com/apache/spark/pull/46024#issuecomment-2103808514

   Sorry for being late. I missed your ping here, @zml1206 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48228][PYTHON][CONNECT] Implement the missing function validation in ApplyInXXX [spark]

2024-05-09 Thread via GitHub


zhengruifeng commented on code in PR #46519:
URL: https://github.com/apache/spark/pull/46519#discussion_r1596217131


##
python/pyspark/sql/connect/group.py:
##
@@ -34,6 +34,7 @@
 from pyspark.util import PythonEvalType
 from pyspark.sql.group import GroupedData as PySparkGroupedData
 from pyspark.sql.pandas.group_ops import PandasCogroupedOps as 
PySparkPandasCogroupedOps
+from pyspark.sql.pandas.functions import _validate_pandas_udf  # type: 
ignore[attr-defined]

Review Comment:
   Spark Classic invoke `pandas_udf` in Pandas Functions (ApplyInXXX), 
`pandas_udf` includes the function validation.
   While in Spark Connect, we can not use `pandas_udf` due to the differences 
in underlying implementations: `pandas_udf` returns a wrapper while Spark 
Connect requires a `UserDefinedFunction` object.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-48228][PYTHON][CONNECT] Implement the missing function validation in ApplyInXXX [spark]

2024-05-09 Thread via GitHub


zhengruifeng opened a new pull request, #46519:
URL: https://github.com/apache/spark/pull/46519

   ### What changes were proposed in this pull request?
   Implement the missing function validation in ApplyInXXX
   
   https://github.com/apache/spark/pull/46397 fixed this issue for 
`Cogrouped.ApplyInPandas`, this PR fix remaining methods.
   
   ### Why are the changes needed?
   for better error message:
   
   ```
   In [12]: df1 = spark.range(11)
   
   In [13]: df2 = df1.groupby("id").applyInPandas(lambda: 1, 
StructType([StructField("d", DoubleType())]))
   
   In [14]: df2.show()
   ```
   
   before this PR, an invalid function causes weird execution errors:
   ```
   24/05/10 11:37:36 ERROR Executor: Exception in task 0.0 in stage 10.0 (TID 
36)
   org.apache.spark.api.python.PythonException: Traceback (most recent call 
last):
 File 
"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1834, in main
   process()
 File 
"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1826, in process
   serializer.dump_stream(out_iter, outfile)
 File 
"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
 line 531, in dump_stream
   return ArrowStreamSerializer.dump_stream(self, 
init_stream_yield_batches(), stream)
  

 File 
"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
 line 104, in dump_stream
   for batch in iterator:
 File 
"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py",
 line 524, in init_stream_yield_batches
   for series in iterator:
 File 
"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1610, in mapper
   return f(keys, vals)
  ^
 File 
"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
488, in 
   return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))]
 ^
 File 
"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
483, in wrapped
   result, return_type, _assign_cols_by_name, truncate_return_schema=False
   ^^
   UnboundLocalError: cannot access local variable 'result' where it is not 
associated with a value
   
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:523)
at 
org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:117)
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:479)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:601)
at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:896)
   
...
   ```
   
   After this PR, the error happens before execution, which is consistent with 
Spark Classic, and 
much clear
   ```
   PySparkValueError: [INVALID_PANDAS_UDF] Invalid function: the function in 
groupby.applyInArrow must take either one argument (data) or two arguments 
(key, data).
   
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   yes, error message changes
   
   ### How was this patch tested?
   added tests
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47834][SQL][CONNECT] Mark deprecated functions with `@deprecated` in `SQLImplicits` [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46029:
URL: https://github.com/apache/spark/pull/46029#issuecomment-2103806851

   Merged to master for Apache Spark 4.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47834][SQL][CONNECT] Mark deprecated functions with `@deprecated` in `SQLImplicits` [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun closed pull request #46029: [SPARK-47834][SQL][CONNECT] Mark 
deprecated functions with `@deprecated` in `SQLImplicits`
URL: https://github.com/apache/spark/pull/46029


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47954][K8S] Support creating ingress entry for external UI access [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46184:
URL: https://github.com/apache/spark/pull/46184#issuecomment-2103803949

   Just FYI, please take your time. We can target this for Apache Spark 4.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48144][SQL] Fix `canPlanAsBroadcastHashJoin` to respect shuffle join hints [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46401:
URL: https://github.com/apache/spark/pull/46401#issuecomment-2103802765

   Gentle ping, @fred-db .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47995][PYTHON][INFRA][TESTS] Upgrade `pyarrow` to 16.0.0 in GitHub Action CI [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46232:
URL: https://github.com/apache/spark/pull/46232#issuecomment-2103795700

   This is still blocked by `mlflow 2.12.2`
   ```
   mlflow 2.12.2 requires pyarrow<16,>=4.0.0, but you have pyarrow 16.0.0 which 
is incompatible.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-27900][CORE][K8s] Add `spark.driver.killOnOOMError` flag in cluster mode [spark]

2024-05-09 Thread via GitHub


dimon222 commented on PR #26161:
URL: https://github.com/apache/spark/pull/26161#issuecomment-2103793011

   Was this ever fixed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48225][BUILD] Upgrade `sbt` to 1.10.0 [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun closed pull request #46517: [SPARK-48225][BUILD] Upgrade `sbt` to 
1.10.0
URL: https://github.com/apache/spark/pull/46517


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48176][SQL] Adjust name of FIELD_ALREADY_EXISTS error condition [spark]

2024-05-09 Thread via GitHub


HyukjinKwon closed pull request #46510: [SPARK-48176][SQL] Adjust name of 
FIELD_ALREADY_EXISTS error condition
URL: https://github.com/apache/spark/pull/46510


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48176][SQL] Adjust name of FIELD_ALREADY_EXISTS error condition [spark]

2024-05-09 Thread via GitHub


HyukjinKwon commented on PR #46510:
URL: https://github.com/apache/spark/pull/46510#issuecomment-2103788152

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]

2024-05-09 Thread via GitHub


LuciferYang commented on PR #46502:
URL: https://github.com/apache/spark/pull/46502#issuecomment-2103783129

   Or how about having these modules depend on the `common/utils` module? 
`common/utils` doesn't seem to be a heavyweight module.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46468:
URL: https://github.com/apache/spark/pull/46468#issuecomment-2103781151

   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48225][BUILD] Upgrade `sbt` to 1.10.0 [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46517:
URL: https://github.com/apache/spark/pull/46517#issuecomment-2103780307

   Thank you so much for sharing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48197][SQL][TESTS][FOLLOWUP][3.5] Regenerate golden files [spark]

2024-05-09 Thread via GitHub


cloud-fan commented on PR #46514:
URL: https://github.com/apache/spark/pull/46514#issuecomment-2103779630

   thanks for the fix!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on PR #46493:
URL: https://github.com/apache/spark/pull/46493#issuecomment-2103774486

   cc @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48225][BUILD] Upgrade `sbt` to 1.10.0 [spark]

2024-05-09 Thread via GitHub


LuciferYang commented on PR #46517:
URL: https://github.com/apache/spark/pull/46517#issuecomment-2103771513

   @dongjoon-hyun I was discussing this issue with @panbingkun  offline 
yesterday. From the responses of sbt and coursier, it seems difficult to solve 
this problem in the short term without some handling of `Resolver.mavenLocal`. 
I suggest that @panbingkun conduct some performance tests to determine whether 
this issue can be bypassed by disabling `Resolver.mavenLocal`:
   
   - https://github.com/sbt/sbt/issues/7506#issuecomment-1972591578
   - https://github.com/coursier/coursier/issues/2942
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596192395


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1999,7 +2042,9 @@ private AppPathsInfo(
   this.subDirsPerLocalDir = subDirsPerLocalDir;
   if (logger.isInfoEnabled()) {
 logger.info("Updated active local dirs {} and sub dirs {} for 
application {}",
-  Arrays.toString(activeLocalDirs),subDirsPerLocalDir, appId);
+  MDC.of(LogKeys.LOCAL_DIRS$.MODULE$, 
Arrays.toString(activeLocalDirs)),
+  MDC.of(LogKeys.NUM_SUB_DIRS$.MODULE$, subDirsPerLocalDir),

Review Comment:
   The `subDirsPerLocalDir` is actually a `numbe`r (the number of sub 
directories), not a sub directory `path`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48226][BUILD] Add `spark-ganglia-lgpl` to `lint-java` & `spark-ganglia-lgpl` and `jvm-profiler` to `sbt-checkstyle` [spark]

2024-05-09 Thread via GitHub


LuciferYang commented on PR #46501:
URL: https://github.com/apache/spark/pull/46501#issuecomment-2103760020

   late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596186119


##
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:
##
@@ -363,7 +367,8 @@ static MergedShuffleFileManager 
newMergedShuffleFileManagerInstance(
   return mergeManagerSubClazz.getConstructor(TransportConf.class, 
File.class)
 .newInstance(conf, mergeManagerFile);
 } catch (Exception e) {
-  defaultLogger.error("Unable to create an instance of {}", 
mergeManagerImplClassName);
+  defaultLogger.error("Unable to create an instance of {}",
+MDC.of(LogKeys.CLASS_NAME$.MODULE$, mergeManagerImplClassName));

Review Comment:
   Or call `MERGED_SHUFFLE_FILE_MANAGER` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]

2024-05-09 Thread via GitHub


srielau commented on code in PR #46267:
URL: https://github.com/apache/spark/pull/46267#discussion_r1596184905


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##
@@ -945,54 +945,73 @@ class SessionCatalog(
   throw QueryCompilationErrors.invalidViewText(viewText, 
metadata.qualifiedName)
   }
 }
-val projectList = if (!isHiveCreatedView(metadata)) {
-  val viewColumnNames = if (metadata.viewQueryColumnNames.isEmpty) {
-// For view created before Spark 2.2.0, the view text is already fully 
qualified, the plan
-// output is the same with the view output.
-metadata.schema.fieldNames.toImmutableArraySeq
-  } else {
-assert(metadata.viewQueryColumnNames.length == metadata.schema.length)
-metadata.viewQueryColumnNames
-  }
+val schemaMode = metadata.viewSchemaMode
+if (schemaMode == SchemaEvolution) {
+  View(desc = metadata, isTempView = isTempView, child = parsedPlan)
+} else {
+  val projectList = if (!isHiveCreatedView(metadata)) {
+val viewColumnNames = if (metadata.viewQueryColumnNames.isEmpty) {
+  // For view created before Spark 2.2.0, the view text is already 
fully qualified, the plan
+  // output is the same with the view output.
+  metadata.schema.fieldNames.toImmutableArraySeq
+} else {
+  assert(metadata.viewQueryColumnNames.length == 
metadata.schema.length)
+  metadata.viewQueryColumnNames
+}
 
-  // For view queries like `SELECT * FROM t`, the schema of the referenced 
table/view may
-  // change after the view has been created. We need to add an extra 
SELECT to pick the columns
-  // according to the recorded column names (to get the correct view 
column ordering and omit
-  // the extra columns that we don't require), with UpCast (to make sure 
the type change is
-  // safe) and Alias (to respect user-specified view column names) 
according to the view schema
-  // in the catalog.
-  // Note that, the column names may have duplication, e.g. `CREATE VIEW 
v(x, y) AS
-  // SELECT 1 col, 2 col`. We need to make sure that the matching 
attributes have the same
-  // number of duplications, and pick the corresponding attribute by 
ordinal.
-  val viewConf = View.effectiveSQLConf(metadata.viewSQLConfigs, isTempView)
-  val normalizeColName: String => String = if 
(viewConf.caseSensitiveAnalysis) {
-identity
+// For view queries like `SELECT * FROM t`, the schema of the 
referenced table/view may
+// change after the view has been created. We need to add an extra 
SELECT to pick the
+// columns according to the recorded column names (to get the correct 
view column ordering
+// and omit the extra columns that we don't require), with UpCast (to 
make sure the type
+// change is safe) and Alias (to respect user-specified view column 
names) according to the
+// view schema in the catalog.
+// Note that, the column names may have duplication, e.g. `CREATE VIEW 
v(x, y) AS
+// SELECT 1 col, 2 col`. We need to make sure that the matching 
attributes have the same
+// number of duplications, and pick the corresponding attribute by 
ordinal.
+val viewConf = View.effectiveSQLConf(metadata.viewSQLConfigs, 
isTempView)
+val normalizeColName: String => String = if 
(viewConf.caseSensitiveAnalysis) {
+  identity
+} else {
+  _.toLowerCase(Locale.ROOT)
+}
+val nameToCounts = 
viewColumnNames.groupBy(normalizeColName).transform((_, v) => v.length)
+val nameToCurrentOrdinal = 
scala.collection.mutable.HashMap.empty[String, Int]
+val viewDDL = buildViewDDL(metadata, isTempView)
+
+viewColumnNames.zip(metadata.schema).map { case (name, field) =>
+  val normalizedName = normalizeColName(name)
+  val count = nameToCounts(normalizedName)
+  val ordinal = nameToCurrentOrdinal.getOrElse(normalizedName, 0)
+  nameToCurrentOrdinal(normalizedName) = ordinal + 1
+  val col = GetViewColumnByNameAndOrdinal(
+metadata.identifier.toString, name, ordinal, count, viewDDL)
+  val cast = schemaMode match {
+/*
+** For schema binding, we cast the column to the expected type 
using safe cast only.
+** For legacy behavior, we cast the column to the expected type 
using safe cast only.
+** For schema compensation, we cast the column to the expected 
type using any cast
+*  in ansi mode.
+** For schema (type) evolution, we take teh column as is.
+*/
+case SchemaBinding => UpCast(col, field.dataType)
+case SchemaUnsupported => UpCast(col, field.dataType)
+case SchemaCompensation => Cast(col, field.dataType, ansiEnabled = 
true)
+

Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]

2024-05-09 Thread via GitHub


srielau commented on code in PR #46267:
URL: https://github.com/apache/spark/pull/46267#discussion_r1596184739


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##
@@ -945,54 +945,73 @@ class SessionCatalog(
   throw QueryCompilationErrors.invalidViewText(viewText, 
metadata.qualifiedName)
   }
 }
-val projectList = if (!isHiveCreatedView(metadata)) {
-  val viewColumnNames = if (metadata.viewQueryColumnNames.isEmpty) {
-// For view created before Spark 2.2.0, the view text is already fully 
qualified, the plan
-// output is the same with the view output.
-metadata.schema.fieldNames.toImmutableArraySeq
-  } else {
-assert(metadata.viewQueryColumnNames.length == metadata.schema.length)
-metadata.viewQueryColumnNames
-  }
+val schemaMode = metadata.viewSchemaMode
+if (schemaMode == SchemaEvolution) {
+  View(desc = metadata, isTempView = isTempView, child = parsedPlan)
+} else {
+  val projectList = if (!isHiveCreatedView(metadata)) {
+val viewColumnNames = if (metadata.viewQueryColumnNames.isEmpty) {
+  // For view created before Spark 2.2.0, the view text is already 
fully qualified, the plan
+  // output is the same with the view output.
+  metadata.schema.fieldNames.toImmutableArraySeq
+} else {
+  assert(metadata.viewQueryColumnNames.length == 
metadata.schema.length)
+  metadata.viewQueryColumnNames
+}
 
-  // For view queries like `SELECT * FROM t`, the schema of the referenced 
table/view may
-  // change after the view has been created. We need to add an extra 
SELECT to pick the columns
-  // according to the recorded column names (to get the correct view 
column ordering and omit
-  // the extra columns that we don't require), with UpCast (to make sure 
the type change is
-  // safe) and Alias (to respect user-specified view column names) 
according to the view schema
-  // in the catalog.
-  // Note that, the column names may have duplication, e.g. `CREATE VIEW 
v(x, y) AS
-  // SELECT 1 col, 2 col`. We need to make sure that the matching 
attributes have the same
-  // number of duplications, and pick the corresponding attribute by 
ordinal.
-  val viewConf = View.effectiveSQLConf(metadata.viewSQLConfigs, isTempView)
-  val normalizeColName: String => String = if 
(viewConf.caseSensitiveAnalysis) {
-identity
+// For view queries like `SELECT * FROM t`, the schema of the 
referenced table/view may
+// change after the view has been created. We need to add an extra 
SELECT to pick the
+// columns according to the recorded column names (to get the correct 
view column ordering
+// and omit the extra columns that we don't require), with UpCast (to 
make sure the type
+// change is safe) and Alias (to respect user-specified view column 
names) according to the
+// view schema in the catalog.
+// Note that, the column names may have duplication, e.g. `CREATE VIEW 
v(x, y) AS
+// SELECT 1 col, 2 col`. We need to make sure that the matching 
attributes have the same
+// number of duplications, and pick the corresponding attribute by 
ordinal.
+val viewConf = View.effectiveSQLConf(metadata.viewSQLConfigs, 
isTempView)
+val normalizeColName: String => String = if 
(viewConf.caseSensitiveAnalysis) {
+  identity
+} else {
+  _.toLowerCase(Locale.ROOT)
+}
+val nameToCounts = 
viewColumnNames.groupBy(normalizeColName).transform((_, v) => v.length)
+val nameToCurrentOrdinal = 
scala.collection.mutable.HashMap.empty[String, Int]
+val viewDDL = buildViewDDL(metadata, isTempView)
+
+viewColumnNames.zip(metadata.schema).map { case (name, field) =>
+  val normalizedName = normalizeColName(name)
+  val count = nameToCounts(normalizedName)
+  val ordinal = nameToCurrentOrdinal.getOrElse(normalizedName, 0)
+  nameToCurrentOrdinal(normalizedName) = ordinal + 1
+  val col = GetViewColumnByNameAndOrdinal(
+metadata.identifier.toString, name, ordinal, count, viewDDL)
+  val cast = schemaMode match {
+/*
+** For schema binding, we cast the column to the expected type 
using safe cast only.
+** For legacy behavior, we cast the column to the expected type 
using safe cast only.
+** For schema compensation, we cast the column to the expected 
type using any cast
+*  in ansi mode.
+** For schema (type) evolution, we take teh column as is.
+*/
+case SchemaBinding => UpCast(col, field.dataType)
+case SchemaUnsupported => UpCast(col, field.dataType)
+case SchemaCompensation => Cast(col, field.dataType, ansiEnabled = 
true)
+

Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]

2024-05-09 Thread via GitHub


srielau commented on code in PR #46267:
URL: https://github.com/apache/spark/pull/46267#discussion_r1596180917


##
sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala:
##
@@ -224,6 +224,7 @@ abstract class BaseSessionStateBuilder(
 TableCapabilityCheck +:
 CommandCheck +:
 CollationCheck +:
+SyncViewsCheck +:

Review Comment:
   "ViewSyncSchemaToMetaStore" coming



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596176953


##
common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:
##
@@ -298,7 +303,9 @@ public void onFailure(Throwable e) {
   });
 } catch (Exception e) {
   logger.error("Error while invoking receiveMergeBlockMetaReq() for appId 
{} shuffleId {} "
-+ "reduceId {}", req.appId, req.shuffleId, req.appId, e);

Review Comment:
   fix typo
   `reduceId {req.appId}` -> `reduceId {req.reduceId}`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children [spark]

2024-05-09 Thread via GitHub


zml1206 commented on PR #46135:
URL: https://github.com/apache/spark/pull/46135#issuecomment-2103726355

   @peter-toth Can this PR be merged?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596171652


##
common/network-common/src/main/java/org/apache/spark/network/ssl/ReloadingX509TrustManager.java:
##
@@ -211,13 +210,13 @@ public void run() {
 this.reloadCount += 1;
   } catch (Exception ex) {
 logger.warn(
-  "Could not load truststore (keep using existing one) : " + 
ex.toString(),

Review Comment:
   Remove redundancy `ex.toString()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]

2024-05-09 Thread via GitHub


cloud-fan commented on code in PR #46267:
URL: https://github.com/apache/spark/pull/46267#discussion_r1596169480


##
sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala:
##
@@ -224,6 +224,7 @@ abstract class BaseSessionStateBuilder(
 TableCapabilityCheck +:
 CommandCheck +:
 CollationCheck +:
+SyncViewsCheck +:

Review Comment:
   Unfortunately we don't have an analyzer extension point to run at the very 
end of the analysis phase. We can add one later, but for now `CheckAnlysis` is 
the best place to do it. Mabybe we can rename this rule to indicate that it has 
side effects?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47119][BUILD] Add `hive-jackson-provided` profile [spark]

2024-05-09 Thread via GitHub


pan3793 commented on PR #45201:
URL: https://github.com/apache/spark/pull/45201#issuecomment-2103713964

   @dongjoon-hyun Jackson 1.x can be removed after SPARK-47018 (bump Hive 
2.3.10), what should we do for `hive-jackson-provided`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596167842


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java:
##
@@ -472,7 +487,8 @@ static ConcurrentMap 
reloadRegisteredExecutors(D
 break;
   }
   AppExecId id = parseDbAppExecKey(key);
-  logger.info("Reloading registered executors: " +  id.toString());
+  logger.info("Reloading registered executors: {}",
+MDC.of(LogKeys.APP_EXECUTOR_ID$.MODULE$, id.toString()));

Review Comment:
   Remove redundancy `.toString()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]

2024-05-09 Thread via GitHub


cloud-fan closed pull request #46512: [SPARK-48222][INFRA][DOCS] Sync Ruby 
Bundler to 2.4.22 and refresh Gem lock file
URL: https://github.com/apache/spark/pull/46512


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596166648


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java:
##
@@ -368,7 +382,8 @@ public int removeBlocks(String appId, String execId, 
String[] blockIds) {
   if (file.delete()) {
 numRemovedBlocks++;
   } else {
-logger.warn("Failed to delete block: " + file.getAbsolutePath());
+logger.warn("Failed to delete block: {}",

Review Comment:
   I'm not sure it's appropriate to call this `LogKeys.PATH$.MODULE$`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file [spark]

2024-05-09 Thread via GitHub


cloud-fan commented on PR #46512:
URL: https://github.com/apache/spark/pull/46512#issuecomment-2103711389

   thanks, merging to master! (it's easier for me to test after merging it)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]

2024-05-09 Thread via GitHub


pan3793 commented on PR #46468:
URL: https://github.com/apache/spark/pull/46468#issuecomment-2103709988

   Hive 2.3.10 jars should be available on Google Maven Central Mirror now, 
re-triggered CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48219][CORE] StreamReader Charset fix with UTF8 [spark]

2024-05-09 Thread via GitHub


xuzifu666 commented on PR #46509:
URL: https://github.com/apache/spark/pull/46509#issuecomment-2103709532

   > Do you think you can provide a test coverage to protect your contribution 
from potential future regression, @xuzifu666 ?
   > 
   > > Not need
   
   @dongjoon-hyun Thanks for you attentions,In my option this code change not 
need to provide tests for it's a specification for ReadStream usage,if not set 
utf8 charset may occur error when system default charset not contains Chinese 
Chars. You can refer it in other framework such as Calcite,Hive,all set utf8 
when this method be called.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596163455


##
common/utils/src/main/java/org/apache/spark/internal/LoggerFactory.java:
##
@@ -19,6 +19,11 @@
 
 public class LoggerFactory {
 
+  public static Logger getLogger(String name) {

Review Comment:
   `YarnShuffleService` will use it:
   https://github.com/apache/spark/assets/15246973/8a4e79bd-43e2-4995-9e0a-57a047bd1e50;>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596162272


##
core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala:
##
@@ -30,7 +30,7 @@ import com.codahale.metrics.{Metric, MetricSet}
 
 import org.apache.spark.{SecurityManager, SparkConf}
 import org.apache.spark.ExecutorDeadException
-import org.apache.spark.internal.config
+import org.apache.spark.internal.{config, LogKeys, MDC}

Review Comment:
   Because `NettyBlockTransferService` extends `BlockTransferService`
   `BlockTransferService` is java code, and this change `triggered` it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]

2024-05-09 Thread via GitHub


pan3793 commented on code in PR #46468:
URL: https://github.com/apache/spark/pull/46468#discussion_r1596161241


##
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala:
##
@@ -211,7 +211,7 @@ class HiveExternalCatalogVersionsSuite extends 
SparkSubmitTestUtils {
 tryDownloadSpark(version, sparkTestingDir.getCanonicalPath)
   }
 
-  // Extract major.minor for testing Spark 3.1.x and 3.0.x with metastore 
2.3.9 and Java 11.
+  // Extract major.minor for testing Spark 3.1.x and 3.0.x with metastore 
2.3.10 and Java 11.

Review Comment:
   removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596158815


##
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala:
##
@@ -200,7 +200,7 @@ private[sql] object GrpcRetryHandler extends Logging {
 if (time.isDefined) {
   logWarning(
 log"Non-Fatal error during RPC execution: ${MDC(ERROR, 
lastException)}, " +
-  log"retrying (wait=${MDC(WAIT_TIME, time.get.toMillis)} ms, " +

Review Comment:
   Unify `WAIT_TIME` into `RETRY_WAIT_TIME`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596157433


##
common/utils/src/main/java/org/apache/spark/internal/LoggerFactory.java:
##
@@ -19,6 +19,11 @@
 
 public class LoggerFactory {
 
+  public static Logger getLogger(String name) {

Review Comment:
   `NettyLogger` use it
   https://github.com/apache/spark/assets/15246973/e11bf41b-9dd8-4fcf-81a2-77986f15d8b1;>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596157433


##
common/utils/src/main/java/org/apache/spark/internal/LoggerFactory.java:
##
@@ -19,6 +19,11 @@
 
 public class LoggerFactory {
 
+  public static Logger getLogger(String name) {

Review Comment:
   `NettyLogger` use it
   https://github.com/apache/spark/assets/15246973/e11bf41b-9dd8-4fcf-81a2-77986f15d8b1;>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596151348


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java:
##
@@ -177,10 +179,16 @@ private void transferAllOutstanding() {
 try {
   transferStarter.createAndStart(blockIdsToTransfer, myListener);
 } catch (Exception e) {
-  logger.error(String.format("Exception while beginning %s of %s 
outstanding blocks %s",
-listener.getTransferType(), blockIdsToTransfer.length,
-numRetries > 0 ? "(after " + numRetries + " retries)" : ""), e);
-
+  if (numRetries > 0) {

Review Comment:
   Print different log message according to the value of `numRetries`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-09 Thread via GitHub


panbingkun commented on code in PR #46493:
URL: https://github.com/apache/spark/pull/46493#discussion_r1596148922


##
common/network-common/src/main/java/org/apache/spark/network/ssl/SSLFactory.java:
##
@@ -136,7 +135,7 @@ public void destroy() {
   try {
 manager.destroy();
   } catch (InterruptedException ex) {
-logger.info("Interrupted while destroying trust manager: " + 
ex.toString(), ex);

Review Comment:
   Remove redundant `ex.toString()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]

2024-05-09 Thread via GitHub


gengliangwang commented on code in PR #46267:
URL: https://github.com/apache/spark/pull/46267#discussion_r1596122256


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##
@@ -945,54 +945,73 @@ class SessionCatalog(
   throw QueryCompilationErrors.invalidViewText(viewText, 
metadata.qualifiedName)
   }
 }
-val projectList = if (!isHiveCreatedView(metadata)) {
-  val viewColumnNames = if (metadata.viewQueryColumnNames.isEmpty) {
-// For view created before Spark 2.2.0, the view text is already fully 
qualified, the plan
-// output is the same with the view output.
-metadata.schema.fieldNames.toImmutableArraySeq
-  } else {
-assert(metadata.viewQueryColumnNames.length == metadata.schema.length)
-metadata.viewQueryColumnNames
-  }
+val schemaMode = metadata.viewSchemaMode
+if (schemaMode == SchemaEvolution) {
+  View(desc = metadata, isTempView = isTempView, child = parsedPlan)
+} else {
+  val projectList = if (!isHiveCreatedView(metadata)) {
+val viewColumnNames = if (metadata.viewQueryColumnNames.isEmpty) {
+  // For view created before Spark 2.2.0, the view text is already 
fully qualified, the plan
+  // output is the same with the view output.
+  metadata.schema.fieldNames.toImmutableArraySeq
+} else {
+  assert(metadata.viewQueryColumnNames.length == 
metadata.schema.length)
+  metadata.viewQueryColumnNames
+}
 
-  // For view queries like `SELECT * FROM t`, the schema of the referenced 
table/view may
-  // change after the view has been created. We need to add an extra 
SELECT to pick the columns
-  // according to the recorded column names (to get the correct view 
column ordering and omit
-  // the extra columns that we don't require), with UpCast (to make sure 
the type change is
-  // safe) and Alias (to respect user-specified view column names) 
according to the view schema
-  // in the catalog.
-  // Note that, the column names may have duplication, e.g. `CREATE VIEW 
v(x, y) AS
-  // SELECT 1 col, 2 col`. We need to make sure that the matching 
attributes have the same
-  // number of duplications, and pick the corresponding attribute by 
ordinal.
-  val viewConf = View.effectiveSQLConf(metadata.viewSQLConfigs, isTempView)
-  val normalizeColName: String => String = if 
(viewConf.caseSensitiveAnalysis) {
-identity
+// For view queries like `SELECT * FROM t`, the schema of the 
referenced table/view may
+// change after the view has been created. We need to add an extra 
SELECT to pick the
+// columns according to the recorded column names (to get the correct 
view column ordering
+// and omit the extra columns that we don't require), with UpCast (to 
make sure the type
+// change is safe) and Alias (to respect user-specified view column 
names) according to the
+// view schema in the catalog.
+// Note that, the column names may have duplication, e.g. `CREATE VIEW 
v(x, y) AS
+// SELECT 1 col, 2 col`. We need to make sure that the matching 
attributes have the same
+// number of duplications, and pick the corresponding attribute by 
ordinal.
+val viewConf = View.effectiveSQLConf(metadata.viewSQLConfigs, 
isTempView)
+val normalizeColName: String => String = if 
(viewConf.caseSensitiveAnalysis) {
+  identity
+} else {
+  _.toLowerCase(Locale.ROOT)
+}
+val nameToCounts = 
viewColumnNames.groupBy(normalizeColName).transform((_, v) => v.length)
+val nameToCurrentOrdinal = 
scala.collection.mutable.HashMap.empty[String, Int]
+val viewDDL = buildViewDDL(metadata, isTempView)
+
+viewColumnNames.zip(metadata.schema).map { case (name, field) =>
+  val normalizedName = normalizeColName(name)
+  val count = nameToCounts(normalizedName)
+  val ordinal = nameToCurrentOrdinal.getOrElse(normalizedName, 0)
+  nameToCurrentOrdinal(normalizedName) = ordinal + 1
+  val col = GetViewColumnByNameAndOrdinal(
+metadata.identifier.toString, name, ordinal, count, viewDDL)
+  val cast = schemaMode match {
+/*
+** For schema binding, we cast the column to the expected type 
using safe cast only.
+** For legacy behavior, we cast the column to the expected type 
using safe cast only.
+** For schema compensation, we cast the column to the expected 
type using any cast
+*  in ansi mode.
+** For schema (type) evolution, we take teh column as is.
+*/
+case SchemaBinding => UpCast(col, field.dataType)
+case SchemaUnsupported => UpCast(col, field.dataType)
+case SchemaCompensation => Cast(col, field.dataType, ansiEnabled = 
true)
+  

Re: [PR] [SPARK-44609][K8S] Remove executor pod from PodsAllocator if it was removed from scheduler backend [spark]

2024-05-09 Thread via GitHub


github-actions[bot] closed pull request #42297: [SPARK-44609][K8S] Remove 
executor pod from PodsAllocator if it was removed from scheduler backend
URL: https://github.com/apache/spark/pull/42297


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46885][SQL] Push down filters through `TypedFilter` [spark]

2024-05-09 Thread via GitHub


github-actions[bot] closed pull request #44911: [SPARK-46885][SQL] Push down 
filters through `TypedFilter`
URL: https://github.com/apache/spark/pull/44911


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46108][SQL] keepInnerXmlAsRaw option for Built-in XML Data Source [spark]

2024-05-09 Thread via GitHub


github-actions[bot] commented on PR #44022:
URL: https://github.com/apache/spark/pull/44022#issuecomment-2103637588

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [WIP] docs: restructure the docs index page [spark]

2024-05-09 Thread via GitHub


github-actions[bot] commented on PR #44812:
URL: https://github.com/apache/spark/pull/44812#issuecomment-2103637571

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45708][BUILD] Retry mvn deploy [spark]

2024-05-09 Thread via GitHub


github-actions[bot] commented on PR #43559:
URL: https://github.com/apache/spark/pull/43559#issuecomment-2103637610

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]

2024-05-09 Thread via GitHub


panbingkun commented on PR #46502:
URL: https://github.com/apache/spark/pull/46502#issuecomment-2103636458

   > I am +1 for the idea. However, I wonder if there will be suggestions about 
why the two imports are not allowed and how to fix the style error. If that's 
not feasible with `IllegalImport`, shall we use `RegexpSinglelineJava` and show 
proper suggestions instead?
   
   Well, it makes sense, 
   the `illegalimport` does not provide a friendly prompt information interface,
   let me to update it later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos [spark]

2024-05-09 Thread via GitHub


zhengruifeng commented on PR #46518:
URL: https://github.com/apache/spark/pull/46518#issuecomment-2103633099

   thanks @HyukjinKwon and @dongjoon-hyun for reviews


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]

2024-05-09 Thread via GitHub


gengliangwang commented on PR #46502:
URL: https://github.com/apache/spark/pull/46502#issuecomment-2103626513

   I am +1 for the idea.
   However, I wonder if there will be suggestions about why the two imports are 
not allowed and how to fix the style error.
   If that's not feasible with `IllegalImport`, shall we use 
`RegexpSinglelineJava` and show proper suggestions instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46518:
URL: https://github.com/apache/spark/pull/46518#issuecomment-2103621276

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun closed pull request #46518: [SPARK-48227][PYTHON][DOC] Document 
the requirement of seed in protos
URL: https://github.com/apache/spark/pull/46518


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46502:
URL: https://github.com/apache/spark/pull/46502#issuecomment-2103617009

   cc @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48214][INFRA] Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` [spark]

2024-05-09 Thread via GitHub


panbingkun commented on PR #46502:
URL: https://github.com/apache/spark/pull/46502#issuecomment-2103613069

   - According to @gengliangwang's suggestion, we did not migrate the `test` 
code in the `structured log`, so we need to exclude them, eg:
   https://github.com/apache/spark/assets/15246973/374ec683-5f13-439a-bafa-b7deafdb23dd;>
   
   - Other exclusion is: 
`common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java`
   https://github.com/apache/spark/assets/15246973/8068a294-a631-4ccc-a081-052f72aeb43a;>
   A.the module `common/kvstore`, because it does not rely on 'utils' when 
compiling, if we want to `migrate` it, we need to add a `dependency` on 'utils' 
in `pom.xml`
   https://github.com/apache/spark/blob/master/common/kvstore/pom.xml#L38-L66
   
   B.And only one place in this module use 'slf4j', as shown below:
   
https://github.com/apache/spark/blob/e1fb1d7e063af7e8eb6e992c800902aff6e19e15/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java#L324
   And we found that this `error` level log does not use `variables`
   So at present, it seems that migration is `not necessary`. Of course, 
migration is also possible.
   
   - After complete the `last` structured log migration pr 
https://github.com/apache/spark/pull/46493/files on the java side, this rule 
should be applied to the spark code base (I have tested it on local env)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47793][TEST][FOLLOWUP] Fix flaky test for Python data source exactly once. [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on PR #46481:
URL: https://github.com/apache/spark/pull/46481#issuecomment-2103611989

   Could you do the final review and sign-off, please, @HyukjinKwon ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]

2024-05-09 Thread via GitHub


HyukjinKwon closed pull request #46451: [SPARK-48180][SQL] Improve error when 
UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY 
exprs
URL: https://github.com/apache/spark/pull/46451


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]

2024-05-09 Thread via GitHub


HyukjinKwon commented on PR #46451:
URL: https://github.com/apache/spark/pull/46451#issuecomment-2103611091

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48226][BUILD] Add `spark-ganglia-lgpl` to `lint-java` & `spark-ganglia-lgpl` and `jvm-profiler` to `sbt-checkstyle` [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun closed pull request #46501: [SPARK-48226][BUILD] Add 
`spark-ganglia-lgpl` to `lint-java` & `spark-ganglia-lgpl` and `jvm-profiler` 
to `sbt-checkstyle`
URL: https://github.com/apache/spark/pull/46501


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on code in PR #46518:
URL: https://github.com/apache/spark/pull/46518#discussion_r1596102379


##
connector/connect/common/src/main/protobuf/spark/connect/relations.proto:
##
@@ -467,7 +467,9 @@ message Sample {
   // (Optional) Whether to sample with replacement.
   optional bool with_replacement = 4;
 
-  // (Optional) The random seed.
+  // (Required) The random seed.
+  // This filed is required to avoid generate mutable dataframes (see 
SPARK-48184 for details),
+  // however, still keep it 'optional' here for backward compatibility.
   optional int64 seed = 5;

Review Comment:
   Ya, this looks like inevitable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

2024-05-09 Thread via GitHub


HyukjinKwon commented on PR #46408:
URL: https://github.com/apache/spark/pull/46408#issuecomment-2103609498

   btw you can trigger on your own 
https://github.com/eric-maynard/spark/runs/24789350525 I can't trigger :-).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

2024-05-09 Thread via GitHub


HyukjinKwon closed pull request #46408: [SPARK-48148][CORE] JSON objects should 
not be modified when read as STRING
URL: https://github.com/apache/spark/pull/46408


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

2024-05-09 Thread via GitHub


HyukjinKwon commented on PR #46408:
URL: https://github.com/apache/spark/pull/46408#issuecomment-2103609100

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48089][SS][CONNECT] Fix 3.5 <> 4.0 StreamingQueryListener compatibility test [spark]

2024-05-09 Thread via GitHub


HyukjinKwon commented on PR #46513:
URL: https://github.com/apache/spark/pull/46513#issuecomment-2103607909

   Merged to branch-3.5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48089][SS][CONNECT] Fix 3.5 <> 4.0 StreamingQueryListener compatibility test [spark]

2024-05-09 Thread via GitHub


HyukjinKwon closed pull request #46513: [SPARK-48089][SS][CONNECT] Fix 3.5 <> 
4.0 StreamingQueryListener compatibility test
URL: https://github.com/apache/spark/pull/46513


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos [spark]

2024-05-09 Thread via GitHub


zhengruifeng opened a new pull request, #46518:
URL: https://github.com/apache/spark/pull/46518

   ### What changes were proposed in this pull request?
   Document the requirement of seed in protos
   
   
   ### Why are the changes needed?
   the seed should be set at client side
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   ci
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >