[GitHub] [spark] grundprinzip commented on pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-03 Thread GitBox
grundprinzip commented on PR #38468: URL: https://github.com/apache/spark/pull/38468#issuecomment-1303010466 @zhengruifeng can you please add test cases for things like `select * from table limit 0` where the optimizer decides there are no qualifying rows but we have to return an

[GitHub] [spark] sadikovi commented on pull request #38277: [SPARK-40815][SQL] Add `DelegateSymlinkTextInputFormat` to workaround `SymlinkTextInputSplit` bug

2022-11-03 Thread GitBox
sadikovi commented on PR #38277: URL: https://github.com/apache/spark/pull/38277#issuecomment-1303009433 @dongjoon-hyun I am trying to repro with JDK 11 (11.0.16) and the test passes just fine. Did you have to do any special setup to trigger the problem? -- This is an automated message

[GitHub] [spark] grundprinzip commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-03 Thread GitBox
grundprinzip commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1013643400 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,65 @@ private[sql] object ArrowConverters extends Logging

[GitHub] [spark] grundprinzip commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-03 Thread GitBox
grundprinzip commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1013642456 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,65 @@ private[sql] object ArrowConverters extends Logging

[GitHub] [spark] mridulm commented on pull request #38064: [SPARK-40622][SQL][CORE]Result of a single task in collect() must fit in 2GB

2022-11-03 Thread GitBox
mridulm commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1303008250 If the failures are unrelated to your changes, try triggering the tests again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] mridulm commented on pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-03 Thread GitBox
mridulm commented on PR #38377: URL: https://github.com/apache/spark/pull/38377#issuecomment-1303007426 Two points: * spark.driver.log.dfsDir is typically expected to be a path to hdfs - so resolving it relative to current working directory does not make sense * If `rootDir` is

[GitHub] [spark] HeartSaVioR closed pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes.

2022-11-03 Thread GitBox
HeartSaVioR closed pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes. URL: https://github.com/apache/spark/pull/38344 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HeartSaVioR commented on pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes.

2022-11-03 Thread GitBox
HeartSaVioR commented on PR #38344: URL: https://github.com/apache/spark/pull/38344#issuecomment-1303006260 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk closed pull request #38170: [SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225

2022-11-03 Thread GitBox
MaxGekk closed pull request #38170: [SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225 URL: https://github.com/apache/spark/pull/38170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk commented on pull request #38170: [SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225

2022-11-03 Thread GitBox
MaxGekk commented on PR #38170: URL: https://github.com/apache/spark/pull/38170#issuecomment-1302998081 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on a diff in pull request #38507: [WIP][SPARK-40372][SQL] Migrate failures of array type checks onto error classes

2022-11-03 Thread GitBox
LuciferYang commented on code in PR #38507: URL: https://github.com/apache/spark/pull/38507#discussion_r1013586770 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -2810,15 +2875,15 @@ case class Sequence( if

[GitHub] [spark] MaxGekk commented on pull request #38490: [SPARK-41009][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1070` to `LOCATION_ALREADY_EXISTS`

2022-11-03 Thread GitBox
MaxGekk commented on PR #38490: URL: https://github.com/apache/spark/pull/38490#issuecomment-1302993756 @cloud-fan @srielau Any objections to the changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] itholic commented on pull request #38508: [SPARK-41012][SQL] Rename `_LEGACY_ERROR_TEMP_1022` to `ORDER_BY_POS_OUT_OF_RANGE`

2022-11-03 Thread GitBox
itholic commented on PR #38508: URL: https://github.com/apache/spark/pull/38508#issuecomment-1302986435 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-03 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1013618805 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] itholic opened a new pull request, #38508: [SPARK-41012][SQL] Rename _LEGACY_ERROR_TEMP_1022 to ORDER_BY_POS_OUT_OF_RANGE

2022-11-03 Thread GitBox
itholic opened a new pull request, #38508: URL: https://github.com/apache/spark/pull/38508 ### What changes were proposed in this pull request? This PR proposes to rename `_LEGACY_ERROR_TEMP_1022` to `ORDER_BY_POS_OUT_OF_RANGE` ### Why are the changes needed? Error

[GitHub] [spark] grundprinzip commented on pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client

2022-11-03 Thread GitBox
grundprinzip commented on PR #38485: URL: https://github.com/apache/spark/pull/38485#issuecomment-1302976715 accepted suggestion and fixed a doc example with missing quote -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] wmoustafa commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-03 Thread GitBox
wmoustafa commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1013611947 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] SandishKumarHN commented on pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes.

2022-11-03 Thread GitBox
SandishKumarHN commented on PR #38344: URL: https://github.com/apache/spark/pull/38344#issuecomment-1302961885 > @MaxGekk are you still reviewing this? @SandishKumarHN is there any more review to be addressed? If we are ready, I can ask @HeartSaVioR to merge this (before his weekend starts

[GitHub] [spark] itholic commented on pull request #38170: [SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225

2022-11-03 Thread GitBox
itholic commented on PR #38170: URL: https://github.com/apache/spark/pull/38170#issuecomment-1302937080 @MaxGekk CI passed finally.. Could you take a look when you find some time ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on a diff in pull request #38507: [WIP][SPARK-40372][SQL] Migrate failures of array type checks onto error classes

2022-11-03 Thread GitBox
LuciferYang commented on code in PR #38507: URL: https://github.com/apache/spark/pull/38507#discussion_r1013586770 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -2810,15 +2875,15 @@ case class Sequence( if

[GitHub] [spark] LuciferYang commented on pull request #38507: [WIP][SPARK-40372][SQL] Migrate failures of array type checks onto error classes

2022-11-03 Thread GitBox
LuciferYang commented on PR #38507: URL: https://github.com/apache/spark/pull/38507#issuecomment-1302927257 Still in wip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang opened a new pull request, #38507: [SPARK-40372][SQL] Migrate failures of array type checks onto error classes

2022-11-03 Thread GitBox
LuciferYang opened a new pull request, #38507: URL: https://github.com/apache/spark/pull/38507 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] rangadi commented on pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes.

2022-11-03 Thread GitBox
rangadi commented on PR #38344: URL: https://github.com/apache/spark/pull/38344#issuecomment-1302907015 @MaxGekk are you still reviewing this? @SandishKumarHN is there any more review to be addressed? If we are ready, I can ask @HeartSaVioR to merge this (before his weekend starts in

[GitHub] [spark] LuciferYang commented on pull request #38502: [SPARK-40976][BUILD] Upgrade sbt to 1.7.3

2022-11-03 Thread GitBox
LuciferYang commented on PR #38502: URL: https://github.com/apache/spark/pull/38502#issuecomment-1302901630 @linhongliu-db Please ping me if you encounter any problems -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-03 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1013560372 ## python/pyspark/sql/connect/client.py: ## @@ -182,6 +191,10 @@ def _to_pandas(self, plan: pb2.Plan) -> Optional[pandas.DataFrame]: req = pb2.Request()

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-03 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1013560372 ## python/pyspark/sql/connect/client.py: ## @@ -182,6 +191,10 @@ def _to_pandas(self, plan: pb2.Plan) -> Optional[pandas.DataFrame]: req = pb2.Request()

[GitHub] [spark] wangyum commented on pull request #38071: [SPARK-36290][SQL] Pull out complex join condition

2022-11-03 Thread GitBox
wangyum commented on PR #38071: URL: https://github.com/apache/spark/pull/38071#issuecomment-1302892265 Case from production: Before | After -- | -- ![image](https://user-images.githubusercontent.com/5399861/199874725-8e8b3111-bf57-47fe-b123-b95fe1f0ed01.png) |

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-03 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1013556642 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,65 @@ private[sql] object ArrowConverters extends Logging

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-03 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1013555606 ## python/pyspark/sql/connect/client.py: ## @@ -182,6 +191,10 @@ def _to_pandas(self, plan: pb2.Plan) -> Optional[pandas.DataFrame]: req = pb2.Request()

[GitHub] [spark] LuciferYang commented on a diff in pull request #38490: [SPARK-41009][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1070` to `LOCATION_ALREADY_EXISTS`

2022-11-03 Thread GitBox
LuciferYang commented on code in PR #38490: URL: https://github.com/apache/spark/pull/38490#discussion_r1013555153 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2750,4 +2750,26 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-03 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1013550540 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +131,36 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-03 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1013549587 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,65 @@ private[sql] object ArrowConverters extends Logging

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-03 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1013547085 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -49,21 +51,33 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-03 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1013540451 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +131,36 @@ class

[GitHub] [spark] AmplabJenkins commented on pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38485: URL: https://github.com/apache/spark/pull/38485#issuecomment-1302866609 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38486: [SPARK-41000][SQL] Make CommandResult extend Command trait

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38486: URL: https://github.com/apache/spark/pull/38486#issuecomment-1302866589 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38488: [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in Python client

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38488: URL: https://github.com/apache/spark/pull/38488#issuecomment-1302866570 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38489: URL: https://github.com/apache/spark/pull/38489#issuecomment-1302866545 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] pan3793 commented on pull request #38483: [SPARK-40997][K8S] K8s resource name prefix should start w/ alphanumeric

2022-11-03 Thread GitBox
pan3793 commented on PR #38483: URL: https://github.com/apache/spark/pull/38483#issuecomment-1302865058 Close as duplicated, thanks all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] pan3793 closed pull request #38483: [SPARK-40997][K8S] K8s resource name prefix should start w/ alphanumeric

2022-11-03 Thread GitBox
pan3793 closed pull request #38483: [SPARK-40997][K8S] K8s resource name prefix should start w/ alphanumeric URL: https://github.com/apache/spark/pull/38483 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-03 Thread GitBox
HyukjinKwon commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1013528536 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +131,36 @@ class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client

2022-11-03 Thread GitBox
HyukjinKwon commented on code in PR #38485: URL: https://github.com/apache/spark/pull/38485#discussion_r1013528124 ## python/pyspark/sql/connect/client.py: ## @@ -35,13 +35,146 @@ from pyspark.sql.connect.plan import SQL, Range from pyspark.sql.types import DataType,

[GitHub] [spark] HyukjinKwon commented on pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-03 Thread GitBox
HyukjinKwon commented on PR #38497: URL: https://github.com/apache/spark/pull/38497#issuecomment-1302849566 cc @allisonwang-db FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38501: [SPARK-41001] [CONNECT] [DOC] Note: Connection string parameters are case-sensitive.

2022-11-03 Thread GitBox
HyukjinKwon closed pull request #38501: [SPARK-41001] [CONNECT] [DOC] Note: Connection string parameters are case-sensitive. URL: https://github.com/apache/spark/pull/38501 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #38501: [SPARK-41001] [CONNECT] [DOC] Note: Connection string parameters are case-sensitive.

2022-11-03 Thread GitBox
HyukjinKwon commented on PR #38501: URL: https://github.com/apache/spark/pull/38501#issuecomment-1302848407 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38504: [SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputFormat tests for JDK 9+

2022-11-03 Thread GitBox
HyukjinKwon closed pull request #38504: [SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputFormat tests for JDK 9+ URL: https://github.com/apache/spark/pull/38504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #38504: [SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputFormat tests for JDK 9+

2022-11-03 Thread GitBox
HyukjinKwon commented on PR #38504: URL: https://github.com/apache/spark/pull/38504#issuecomment-1302848120 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38502: [SPARK-40976][BUILD] Upgrade sbt to 1.7.3

2022-11-03 Thread GitBox
HyukjinKwon closed pull request #38502: [SPARK-40976][BUILD] Upgrade sbt to 1.7.3 URL: https://github.com/apache/spark/pull/38502 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #38502: [SPARK-40976][BUILD] Upgrade sbt to 1.7.3

2022-11-03 Thread GitBox
HyukjinKwon commented on PR #38502: URL: https://github.com/apache/spark/pull/38502#issuecomment-1302847765 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on pull request #38461: [SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries

2022-11-03 Thread GitBox
beliefer commented on PR #38461: URL: https://github.com/apache/spark/pull/38461#issuecomment-1302842869 @cloud-fan @peter-toth Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] liuzqt commented on pull request #38064: [SPARK-40622][SQL][CORE]Result of a single task in collect() must fit in 2GB

2022-11-03 Thread GitBox
liuzqt commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1302834138 @mridulm I'm looking into it, but haven't any clue so far.it's weird that many compilation errors seems not related to this PR, and I was able to build on my local machine, not sure

[GitHub] [spark] amaliujia commented on pull request #38488: [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in Python client

2022-11-03 Thread GitBox
amaliujia commented on PR #38488: URL: https://github.com/apache/spark/pull/38488#issuecomment-1302821036 actually I will follow https://spark.apache.org/contributing.html. to add a short description for test cases in this PR/ -- This is an automated message from the Apache Git

[GitHub] [spark] amaliujia commented on pull request #38506: [SPARK-41010][CONNECT][PYTHON] Complete Support for Except and Intersect in Python client

2022-11-03 Thread GitBox
amaliujia commented on PR #38506: URL: https://github.com/apache/spark/pull/38506#issuecomment-1302820909 actually I will follow https://spark.apache.org/contributing.html. to add a short description for test cases in this PR/ -- This is an automated message from the Apache Git Service.

[GitHub] [spark] github-actions[bot] closed pull request #37265: [SPARK-39850][YARN]Print applicationId once applied from yarn rm

2022-11-03 Thread GitBox
github-actions[bot] closed pull request #37265: [SPARK-39850][YARN]Print applicationId once applied from yarn rm URL: https://github.com/apache/spark/pull/37265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] commented on pull request #37239: [SPARK-39825][SQL] Fix PushDownLeftSemiAntiJoin push through project

2022-11-03 Thread GitBox
github-actions[bot] commented on PR #37239: URL: https://github.com/apache/spark/pull/37239#issuecomment-1302818434 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37163: [SPARK-39750][SQL] Enable `spark.sql.cbo.enabled` by default

2022-11-03 Thread GitBox
github-actions[bot] closed pull request #37163: [SPARK-39750][SQL] Enable `spark.sql.cbo.enabled` by default URL: https://github.com/apache/spark/pull/37163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] github-actions[bot] commented on pull request #37226: [MINOR][SQL] Simplify the description of built-in function.

2022-11-03 Thread GitBox
github-actions[bot] commented on PR #37226: URL: https://github.com/apache/spark/pull/37226#issuecomment-1302818443 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] AmplabJenkins commented on pull request #38491: [MINOR][CONNECT] Remove unused import in commands.proto

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38491: URL: https://github.com/apache/spark/pull/38491#issuecomment-1302809788 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38494: [SPARK-41004][CONNECT][TESTS] Check error classes in InterceptorRegistrySuite

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38494: URL: https://github.com/apache/spark/pull/38494#issuecomment-1302809759 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38495: URL: https://github.com/apache/spark/pull/38495#issuecomment-1302809732 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38496: [SPARK-40708][SQL] Auto update table statistics based on write metrics

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38496: URL: https://github.com/apache/spark/pull/38496#issuecomment-1302809709 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38497: URL: https://github.com/apache/spark/pull/38497#issuecomment-1302809681 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] amaliujia commented on a diff in pull request #38488: [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in Python client

2022-11-03 Thread GitBox
amaliujia commented on code in PR #38488: URL: https://github.com/apache/spark/pull/38488#discussion_r1013499840 ## python/pyspark/sql/connect/dataframe.py: ## @@ -211,14 +212,66 @@ def filter(self, condition: Expression) -> "DataFrame":

[GitHub] [spark] ueshin commented on a diff in pull request #38223: [SPARK-40770][PYTHON] Improved error messages for applyInPandas for schema mismatch

2022-11-03 Thread GitBox
ueshin commented on code in PR #38223: URL: https://github.com/apache/spark/pull/38223#discussion_r1013488987 ## python/pyspark/worker.py: ## @@ -188,22 +241,7 @@ def wrapped(key_series, value_series): elif len(argspec.args) == 2: key = tuple(s[0] for s in

[GitHub] [spark] amaliujia commented on a diff in pull request #38488: [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in Python client

2022-11-03 Thread GitBox
amaliujia commented on code in PR #38488: URL: https://github.com/apache/spark/pull/38488#discussion_r1013486011 ## python/pyspark/sql/connect/dataframe.py: ## @@ -211,14 +212,66 @@ def filter(self, condition: Expression) -> "DataFrame":

[GitHub] [spark] amaliujia commented on a diff in pull request #38488: [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in Python client

2022-11-03 Thread GitBox
amaliujia commented on code in PR #38488: URL: https://github.com/apache/spark/pull/38488#discussion_r1013486011 ## python/pyspark/sql/connect/dataframe.py: ## @@ -211,14 +212,66 @@ def filter(self, condition: Expression) -> "DataFrame":

[GitHub] [spark] amaliujia commented on pull request #38506: [SPARK-41010][CONNECT][PYTHON] Complete Support for Except and Intersect in Python client

2022-11-03 Thread GitBox
amaliujia commented on PR #38506: URL: https://github.com/apache/spark/pull/38506#issuecomment-1302788809 R: @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] amaliujia opened a new pull request, #38506: [SPARK-41010][CONNECT][PYTHON] Complete Support for Except and Intersect in Python client

2022-11-03 Thread GitBox
amaliujia opened a new pull request, #38506: URL: https://github.com/apache/spark/pull/38506 ### What changes were proposed in this pull request? 1. Add support for intersect and except. 2. Unify union, intersect and except into `SetOperation`. ### Why are the

[GitHub] [spark] liuzqt opened a new pull request, #38505: [SPARK-40622][WIP]do not merge(try to fix build error)

2022-11-03 Thread GitBox
liuzqt opened a new pull request, #38505: URL: https://github.com/apache/spark/pull/38505 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] swamirishi commented on pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-03 Thread GitBox
swamirishi commented on PR #38377: URL: https://github.com/apache/spark/pull/38377#issuecomment-1302754420 > If we do necessarily need the fully qualified path, we can use `fileSystem.resolvePath(new Path ...)` instead. FileUtils.get().absolutePath() would resolve file path based on

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38504: [SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputSplit feature and ignore the test

2022-11-03 Thread GitBox
dongjoon-hyun commented on code in PR #38504: URL: https://github.com/apache/spark/pull/38504#discussion_r1013445606 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeReadWriteSuite.scala: ## @@ -226,7 +226,8 @@ class HiveSerDeReadWriteSuite extends

[GitHub] [spark] sadikovi commented on a diff in pull request #38504: [SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputSplit feature and ignore the test

2022-11-03 Thread GitBox
sadikovi commented on code in PR #38504: URL: https://github.com/apache/spark/pull/38504#discussion_r1013444207 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeReadWriteSuite.scala: ## @@ -226,7 +226,8 @@ class HiveSerDeReadWriteSuite extends QueryTest

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38504: [SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputSplit feature and ignore the test

2022-11-03 Thread GitBox
dongjoon-hyun commented on code in PR #38504: URL: https://github.com/apache/spark/pull/38504#discussion_r1013443220 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeReadWriteSuite.scala: ## @@ -226,7 +226,8 @@ class HiveSerDeReadWriteSuite extends

[GitHub] [spark] sadikovi commented on pull request #38504: [SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputSplit feature and ignore the test

2022-11-03 Thread GitBox
sadikovi commented on PR #38504: URL: https://github.com/apache/spark/pull/38504#issuecomment-1302735079 No, I don't know yet. I have opened a PR to unblock the CI and I am planning to debug the failure later today. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38504: [SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputSplit feature and ignore the test

2022-11-03 Thread GitBox
dongjoon-hyun commented on code in PR #38504: URL: https://github.com/apache/spark/pull/38504#discussion_r1013443220 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeReadWriteSuite.scala: ## @@ -226,7 +226,8 @@ class HiveSerDeReadWriteSuite extends

[GitHub] [spark] AmplabJenkins commented on pull request #38499: [MINOR][DOC] updated some grammar and a missed period in the tuning doc

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38499: URL: https://github.com/apache/spark/pull/38499#issuecomment-1302732856 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38500: [SPARK-41007][SQL] Add missing serializer for java.math.BigInteger

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38500: URL: https://github.com/apache/spark/pull/38500#issuecomment-1302732806 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38501: [SPARK-41001] [CONNECT] [DOC] Note: Connection string parameters are case-sensitive.

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38501: URL: https://github.com/apache/spark/pull/38501#issuecomment-1302732762 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] grundprinzip commented on pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

2022-11-03 Thread GitBox
grundprinzip commented on PR #38475: URL: https://github.com/apache/spark/pull/38475#issuecomment-1302731774 Yes, my probably my question is if (1) will work and we don't have to do heavy lifting? Once you did (1) and have the schema you could wrap the input plan with the select. --

[GitHub] [spark] sadikovi commented on pull request #38504: [SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputSplit feature and ignore the test

2022-11-03 Thread GitBox
sadikovi commented on PR #38504: URL: https://github.com/apache/spark/pull/38504#issuecomment-1302731459 @HyukjinKwon @dongjoon-hyun I have ignored the test and also disabled the feature by default. Let me know if you would like to revert the change completely instead. -- This is an

[GitHub] [spark] sadikovi opened a new pull request, #38504: [SPARK-40815][SQL][FOLLOW-UP] Disable DelegateSymlinkTextInputSplit feature and ignore the test

2022-11-03 Thread GitBox
sadikovi opened a new pull request, #38504: URL: https://github.com/apache/spark/pull/38504 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] srowen closed pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-11-03 Thread GitBox
srowen closed pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10 URL: https://github.com/apache/spark/pull/38352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] srowen commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-11-03 Thread GitBox
srowen commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1302727893 OK, merged to 3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia commented on pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

2022-11-03 Thread GitBox
amaliujia commented on PR #38475: URL: https://github.com/apache/spark/pull/38475#issuecomment-1302727166 To have select with alias, we need to know the previous schema first. I think it is covered by my previous comment if you were talking about the same thing

[GitHub] [spark] grundprinzip commented on pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

2022-11-03 Thread GitBox
grundprinzip commented on PR #38475: URL: https://github.com/apache/spark/pull/38475#issuecomment-1302724487 @amaliujia quick question, looking at the code in Dataset, why do we actually need to implement this on the server? The renaming could in theory be rewritten into a select with

[GitHub] [spark] grundprinzip commented on a diff in pull request #38488: [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in Python client

2022-11-03 Thread GitBox
grundprinzip commented on code in PR #38488: URL: https://github.com/apache/spark/pull/38488#discussion_r1013432734 ## python/pyspark/sql/connect/dataframe.py: ## @@ -211,14 +212,66 @@ def filter(self, condition: Expression) -> "DataFrame":

[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-11-03 Thread GitBox
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1013416152 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-11-03 Thread GitBox
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1013415700 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-11-03 Thread GitBox
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1013415091 ## sql/core/src/test/resources/sql-tests/inputs/string-functions.sql: ## @@ -58,6 +60,54 @@ SELECT substring('Spark SQL' from 5); SELECT substring('Spark SQL' from

[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-11-03 Thread GitBox
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1013415091 ## sql/core/src/test/resources/sql-tests/inputs/string-functions.sql: ## @@ -58,6 +60,54 @@ SELECT substring('Spark SQL' from 5); SELECT substring('Spark SQL' from

[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-11-03 Thread GitBox
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1013413707 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-11-03 Thread GitBox
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1013412397 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] sunchao commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-11-03 Thread GitBox
sunchao commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1302682518 I'm actually holding 3.2.3 for this PR. Once it's merged I'll start the release process. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] sadikovi commented on pull request #38277: [SPARK-40815][SQL] Add `DelegateSymlinkTextInputFormat` to workaround `SymlinkTextInputSplit` bug

2022-11-03 Thread GitBox
sadikovi commented on PR #38277: URL: https://github.com/apache/spark/pull/38277#issuecomment-1302660409 Let me open a PR to disable the test and I will open a fix as a follow-up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] MaxGekk commented on a diff in pull request #38490: [SPARK-41009][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1070` to `LOCATION_ALREADY_EXISTS`

2022-11-03 Thread GitBox
MaxGekk commented on code in PR #38490: URL: https://github.com/apache/spark/pull/38490#discussion_r1013389572 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2750,4 +2750,26 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] carlfu-db commented on pull request #38404: [SPARK-40956] SQL Equivalent for Dataframe overwrite command

2022-11-03 Thread GitBox
carlfu-db commented on PR #38404: URL: https://github.com/apache/spark/pull/38404#issuecomment-1302645064 I am able to repro the problem locally. I verified that the commit, which this PR based on, don't have the test failure. Not very sure how to go next as the change in this PR seems

[GitHub] [spark] AmplabJenkins commented on pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-03 Thread GitBox
AmplabJenkins commented on PR #38503: URL: https://github.com/apache/spark/pull/38503#issuecomment-1302620081 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] tobiasstadler commented on pull request #38331: [SPARK-40869][K8S] Resource name prefix should not start with a hyphen

2022-11-03 Thread GitBox
tobiasstadler commented on PR #38331: URL: https://github.com/apache/spark/pull/38331#issuecomment-1302567569 Thank You! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] EnricoMi commented on pull request #38356: [SPARK-40885] `Sort` may not take effect when it is the last 'Transform' operator

2022-11-03 Thread GitBox
EnricoMi commented on PR #38356: URL: https://github.com/apache/spark/pull/38356#issuecomment-1302534716 I had another deeper look into this issue. The `V1Writes` rule introduced in Spark 3.4 adds the `empty2null` to all nullable string partition columns:

[GitHub] [spark] WweiL opened a new pull request, #38503: [SPARK-40940] remove multi state checkers

2022-11-03 Thread GitBox
WweiL opened a new pull request, #38503: URL: https://github.com/apache/spark/pull/38503 ### What changes were proposed in this pull request? As a followup to [SPARK-40925], [github PR](https://github.com/apache/spark/pull/38405), Remove corresponding checks in

  1   2   3   >