[GitHub] [spark] gaoyajun02 commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-01 Thread GitBox
gaoyajun02 commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1298452168 We have now located the cause of zero-size chunk loss, We have located the cause of the zero-size chunk problem on the shuffle service node. and there is the following information

[GitHub] [spark] beliefer commented on pull request #38461: [SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries

2022-11-01 Thread GitBox
beliefer commented on PR #38461: URL: https://github.com/apache/spark/pull/38461#issuecomment-1298447704 ping @peter-toth cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wangyum commented on pull request #38458: [SPARK-40983][DOC] Remove Hadoop requirements for zstd mentioned in Parquet compression codec

2022-11-01 Thread GitBox
wangyum commented on PR #38458: URL: https://github.com/apache/spark/pull/38458#issuecomment-1298446195 Merged to master, branch-3.3 and branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] wangyum closed pull request #38458: [SPARK-40983][DOC] Remove Hadoop requirements for zstd mentioned in Parquet compression codec

2022-11-01 Thread GitBox
wangyum closed pull request #38458: [SPARK-40983][DOC] Remove Hadoop requirements for zstd mentioned in Parquet compression codec URL: https://github.com/apache/spark/pull/38458 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] MaxGekk commented on pull request #38454: [SPARK-40978][SQL] Migrate `failAnalysis()` w/o a context onto error classes

2022-11-01 Thread GitBox
MaxGekk commented on PR #38454: URL: https://github.com/apache/spark/pull/38454#issuecomment-1298428612 @cloud-fan @srielau @itholic @LuciferYang @panbingkun Could you review this PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] EnricoMi commented on pull request #38356: [SPARK-40885] `Sort` may not take effect when it is the last 'Transform' operator

2022-11-01 Thread GitBox
EnricoMi commented on PR #38356: URL: https://github.com/apache/spark/pull/38356#issuecomment-1298425304 @allisonwang-db can you elaborate on mapping `write.requiredOrdering` to the projected columns that you introduced in f98f9f8566243a8a01edcaad3b847bbd2f52305b? Was that existing code

[GitHub] [spark] zhengruifeng commented on pull request #38459: [SPARK-40980][CONNECT][TEST] Support session.sql in Connect DSL

2022-11-01 Thread GitBox
zhengruifeng commented on PR #38459: URL: https://github.com/apache/spark/pull/38459#issuecomment-1298422766 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #38459: [SPARK-40980][CONNECT][TEST] Support session.sql in Connect DSL

2022-11-01 Thread GitBox
zhengruifeng closed pull request #38459: [SPARK-40980][CONNECT][TEST] Support session.sql in Connect DSL URL: https://github.com/apache/spark/pull/38459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] panbingkun commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox
panbingkun commented on code in PR #38463: URL: https://github.com/apache/spark/pull/38463#discussion_r1010356813 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala: ## @@ -444,17 +460,32 @@ case class CreateNamedStruct(children:

[GitHub] [spark] panbingkun commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox
panbingkun commented on code in PR #38463: URL: https://github.com/apache/spark/pull/38463#discussion_r1010355945 ## core/src/main/resources/error/error-classes.json: ## @@ -155,6 +160,21 @@ "To convert values from to , you can use the functions instead."

[GitHub] [spark] panbingkun commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox
panbingkun commented on code in PR #38463: URL: https://github.com/apache/spark/pull/38463#discussion_r1010355945 ## core/src/main/resources/error/error-classes.json: ## @@ -155,6 +160,21 @@ "To convert values from to , you can use the functions instead."

[GitHub] [spark] panbingkun commented on a diff in pull request #38438: [SPARK-40748][SQL] Migrate type check failures of conditions onto error classes

2022-11-01 Thread GitBox
panbingkun commented on code in PR #38438: URL: https://github.com/apache/spark/pull/38438#discussion_r1010349649 ## sql/core/src/test/java/test/org/apache/spark/sql/JavaColumnExpressionSuite.java: ## @@ -79,12 +80,8 @@ public void isInCollectionCheckExceptionMessage() {

[GitHub] [spark] eejbyfeldt commented on a diff in pull request #38428: [SPARK-40912][CORE][WIP] Overhead of Exceptions in DeserializationStream

2022-11-01 Thread GitBox
eejbyfeldt commented on code in PR #38428: URL: https://github.com/apache/spark/pull/38428#discussion_r1010340881 ## core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala: ## @@ -301,15 +300,18 @@ class KryoDeserializationStream( private[this] var kryo: Kryo

[GitHub] [spark] zhengruifeng opened a new pull request, #38468: [WIP][CONNECT][PYTHON] Arrow-based collect

2022-11-01 Thread GitBox
zhengruifeng opened a new pull request, #38468: URL: https://github.com/apache/spark/pull/38468 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] cxzl25 opened a new pull request, #38467: [SPARK-40987][CORE] Avoid creating a directory when deleting a block, causing DAGScheduler to not work

2022-11-01 Thread GitBox
cxzl25 opened a new pull request, #38467: URL: https://github.com/apache/spark/pull/38467 ### What changes were proposed in this pull request? Avoid creating a directory when deleting a block. ### Why are the changes needed? When the driver submits a job, DAGScheduler calls

[GitHub] [spark] panbingkun commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox
panbingkun commented on code in PR #38463: URL: https://github.com/apache/spark/pull/38463#discussion_r1010317628 ## core/src/main/resources/error/error-classes.json: ## @@ -155,6 +160,21 @@ "To convert values from to , you can use the functions instead."

[GitHub] [spark] beliefer opened a new pull request, #38466: [WIP][SPARK-40986][SQL] Using distinct to reduce the data size for bloom filter

2022-11-01 Thread GitBox
beliefer opened a new pull request, #38466: URL: https://github.com/apache/spark/pull/38466 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this

[GitHub] [spark] dongjoon-hyun commented on pull request #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning

2022-11-01 Thread GitBox
dongjoon-hyun commented on PR #38464: URL: https://github.com/apache/spark/pull/38464#issuecomment-1298287656 Thank you for pinging me, @wangyum . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang opened a new pull request, #38465: [SPARK-40985][BUILD] Upgrade RoaringBitmap to 0.9.35

2022-11-01 Thread GitBox
LuciferYang opened a new pull request, #38465: URL: https://github.com/apache/spark/pull/38465 ### What changes were proposed in this pull request? This pr aims upgrade RoaringBitmap 0.9.35 ### Why are the changes needed? This version bring some bug fix: -

[GitHub] [spark] wangyum commented on pull request #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning

2022-11-01 Thread GitBox
wangyum commented on PR #38464: URL: https://github.com/apache/spark/pull/38464#issuecomment-1298235836 cc @cloud-fan @sigmod @aokolnychyi @dongjoon-hyun @huaxingao @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] wangyum opened a new pull request, #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning

2022-11-01 Thread GitBox
wangyum opened a new pull request, #38464: URL: https://github.com/apache/spark/pull/38464 ### What changes were proposed in this pull request? This PR enhances DPP to use bloom filters if `spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly` is disabled and build plan

[GitHub] [spark] zhengruifeng commented on pull request #38460: [SPARK-40981][CONNECT][PYTHON] Support session.range in Python client

2022-11-01 Thread GitBox
zhengruifeng commented on PR #38460: URL: https://github.com/apache/spark/pull/38460#issuecomment-1298229126 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #38460: [SPARK-40981][CONNECT][PYTHON] Support session.range in Python client

2022-11-01 Thread GitBox
zhengruifeng closed pull request #38460: [SPARK-40981][CONNECT][PYTHON] Support session.range in Python client URL: https://github.com/apache/spark/pull/38460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk closed pull request #38439: [SPARK-40890][SQL][TESTS] Check error classes in DataSourceV2SQLSuite

2022-11-01 Thread GitBox
MaxGekk closed pull request #38439: [SPARK-40890][SQL][TESTS] Check error classes in DataSourceV2SQLSuite URL: https://github.com/apache/spark/pull/38439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on pull request #38439: [SPARK-40890][SQL][TESTS] Check error classes in DataSourceV2SQLSuite

2022-11-01 Thread GitBox
MaxGekk commented on PR #38439: URL: https://github.com/apache/spark/pull/38439#issuecomment-1298219391 +1, LGTM. Merging to master. I ran the test locally. Thank you, @panbingkun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox
LuciferYang commented on code in PR #38463: URL: https://github.com/apache/spark/pull/38463#discussion_r1010207329 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala: ## @@ -444,17 +460,32 @@ case class CreateNamedStruct(children:

[GitHub] [spark] LuciferYang commented on pull request #38457: [SPARK-40371][SQL] Migrate type check failures of NthValue and NTile onto error classes

2022-11-01 Thread GitBox
LuciferYang commented on PR #38457: URL: https://github.com/apache/spark/pull/38457#issuecomment-1298209124 Thanks @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] MaxGekk closed pull request #38457: [SPARK-40371][SQL] Migrate type check failures of NthValue and NTile onto error classes

2022-11-01 Thread GitBox
MaxGekk closed pull request #38457: [SPARK-40371][SQL] Migrate type check failures of NthValue and NTile onto error classes URL: https://github.com/apache/spark/pull/38457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] MaxGekk commented on pull request #38457: [SPARK-40371][SQL] Migrate type check failures of NthValue and NTile onto error classes

2022-11-01 Thread GitBox
MaxGekk commented on PR #38457: URL: https://github.com/apache/spark/pull/38457#issuecomment-1298207749 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] itholic commented on a diff in pull request #38170: [WIP][SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225

2022-11-01 Thread GitBox
itholic commented on code in PR #38170: URL: https://github.com/apache/spark/pull/38170#discussion_r1010170342 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2171,11 +2168,7 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] itholic commented on a diff in pull request #38170: [WIP][SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225

2022-11-01 Thread GitBox
itholic commented on code in PR #38170: URL: https://github.com/apache/spark/pull/38170#discussion_r1010170342 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2171,11 +2168,7 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] panbingkun opened a new pull request, #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox
panbingkun opened a new pull request, #38463: URL: https://github.com/apache/spark/pull/38463 ### What changes were proposed in this pull request? This pr replaces TypeCheckFailure by DataTypeMismatch in type checks in the complex type creator expressions, includes: 1. CreateMap

[GitHub] [spark] cloud-fan closed pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side

2022-11-01 Thread GitBox
cloud-fan closed pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side URL: https://github.com/apache/spark/pull/35594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side

2022-11-01 Thread GitBox
cloud-fan commented on PR #35594: URL: https://github.com/apache/spark/pull/35594#issuecomment-1298126483 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] grundprinzip opened a new pull request, #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-01 Thread GitBox
grundprinzip opened a new pull request, #38462: URL: https://github.com/apache/spark/pull/38462 ### What changes were proposed in this pull request? This PR implements the client-side serialization of most Python literals into Spark Connect literals. ### Why are the changes

[GitHub] [spark] beliefer opened a new pull request, #38461: [SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries

2022-11-01 Thread GitBox
beliefer opened a new pull request, #38461: URL: https://github.com/apache/spark/pull/38461 ### What changes were proposed in this pull request? Recently, I read the `MergeScalarSubqueries` because it is a feature used for improve performance. I fount the parameters of

[GitHub] [spark] MaxGekk commented on a diff in pull request #38438: [SPARK-40748][SQL] Migrate type check failures of conditions onto error classes

2022-11-01 Thread GitBox
MaxGekk commented on code in PR #38438: URL: https://github.com/apache/spark/pull/38438#discussion_r1010145931 ## sql/core/src/test/java/test/org/apache/spark/sql/JavaColumnExpressionSuite.java: ## @@ -79,12 +80,8 @@ public void isInCollectionCheckExceptionMessage() {

[GitHub] [spark] MaxGekk commented on a diff in pull request #38447: [SPARK-40973][SQL] Rename `_LEGACY_ERROR_TEMP_0055` to `UNCLOSED_BRACKETED_COMMENT`

2022-11-01 Thread GitBox
MaxGekk commented on code in PR #38447: URL: https://github.com/apache/spark/pull/38447#discussion_r1010126072 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -608,8 +608,12 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] MaxGekk commented on pull request #38170: [WIP][SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225

2022-11-01 Thread GitBox
MaxGekk commented on PR #38170: URL: https://github.com/apache/spark/pull/38170#issuecomment-1298094912 @itholic Can you fix GAs? Some test failures are related to your changes: ``` 2022-10-30 23:01:03.735 - stderr> org.apache.spark.SparkException: spark.sql.catalog.default

[GitHub] [spark] MaxGekk closed pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox
MaxGekk closed pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL` URL: https://github.com/apache/spark/pull/38448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] MaxGekk commented on pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox
MaxGekk commented on PR #38448: URL: https://github.com/apache/spark/pull/38448#issuecomment-1298090370 Merging to master. Thank you, @cloud-fan @LuciferYang @itholic @srielau for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] MaxGekk commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox
MaxGekk commented on code in PR #38448: URL: https://github.com/apache/spark/pull/38448#discussion_r1010108090 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -442,9 +442,10 @@ struct<> -- !query output

[GitHub] [spark] MaxGekk commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox
MaxGekk commented on code in PR #38448: URL: https://github.com/apache/spark/pull/38448#discussion_r1010108090 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -442,9 +442,10 @@ struct<> -- !query output

[GitHub] [spark] LuciferYang commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox
LuciferYang commented on code in PR #38448: URL: https://github.com/apache/spark/pull/38448#discussion_r1010108242 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -442,9 +442,10 @@ struct<> -- !query output

[GitHub] [spark] MaxGekk commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox
MaxGekk commented on code in PR #38448: URL: https://github.com/apache/spark/pull/38448#discussion_r1010108090 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -442,9 +442,10 @@ struct<> -- !query output

[GitHub] [spark] MaxGekk commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox
MaxGekk commented on code in PR #38448: URL: https://github.com/apache/spark/pull/38448#discussion_r1010108090 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -442,9 +442,10 @@ struct<> -- !query output

[GitHub] [spark] HyukjinKwon closed pull request #38455: [SPARK-40827][PS][TESTS] Re-enable the DataFrame.corrwith test after fixing in future pandas.

2022-11-01 Thread GitBox
HyukjinKwon closed pull request #38455: [SPARK-40827][PS][TESTS] Re-enable the DataFrame.corrwith test after fixing in future pandas. URL: https://github.com/apache/spark/pull/38455 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #38455: [SPARK-40827][PS][TESTS] Re-enable the DataFrame.corrwith test after fixing in future pandas.

2022-11-01 Thread GitBox
HyukjinKwon commented on PR #38455: URL: https://github.com/apache/spark/pull/38455#issuecomment-1298081042 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox
cloud-fan commented on code in PR #38448: URL: https://github.com/apache/spark/pull/38448#discussion_r1010103819 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -442,9 +442,10 @@ struct<> -- !query output

[GitHub] [spark] cloud-fan commented on pull request #38263: [SPARK-40692][SQL] Support data masking built-in function 'mask_hash'

2022-11-01 Thread GitBox
cloud-fan commented on PR #38263: URL: https://github.com/apache/spark/pull/38263#issuecomment-1298074997 I'd like to reach a consensus on https://github.com/apache/spark/pull/38263#discussion_r1009055117 before moving forward. -- This is an automated message from the Apache Git

[GitHub] [spark] amaliujia commented on a diff in pull request #38347: [SPARK-40883][CONNECT] Support Range in Connect proto

2022-11-01 Thread GitBox
amaliujia commented on code in PR #38347: URL: https://github.com/apache/spark/pull/38347#discussion_r1010097145 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -217,3 +218,23 @@ message Sample { int64 seed = 1; } } + +// Relation of type

[GitHub] [spark] amaliujia commented on pull request #38459: [SPARK-40980][CONNECT][DSL] Support session.sql in Connect DSL

2022-11-01 Thread GitBox
amaliujia commented on PR #38459: URL: https://github.com/apache/spark/pull/38459#issuecomment-1298066340 R: @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

<    1   2