[GitHub] [spark] mridulm commented on pull request #38371: [SPARK-40968] Fix a few wrong/misleading comments in DAGSchedulerSuite

2022-11-01 Thread GitBox
mridulm commented on PR #38371: URL: https://github.com/apache/spark/pull/38371#issuecomment-1299617022 Merged to master, thanks or fixing this @JiexingLi ! Thanks for looking into this @HyukjinKwon :-) -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] asfgit closed pull request #38371: [SPARK-40968] Fix a few wrong/misleading comments in DAGSchedulerSuite

2022-11-01 Thread GitBox
asfgit closed pull request #38371: [SPARK-40968] Fix a few wrong/misleading comments in DAGSchedulerSuite URL: https://github.com/apache/spark/pull/38371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] panbingkun commented on pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox
panbingkun commented on PR #38463: URL: https://github.com/apache/spark/pull/38463#issuecomment-1299615584 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] mridulm commented on pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-01 Thread GitBox
mridulm commented on PR #38377: URL: https://github.com/apache/spark/pull/38377#issuecomment-1299613189 Makes sense ... why not simply `val dfsLogFile = new Path(rootDir, appId + DRIVER_LOG_FILE_SUFFIX)` instead btw ? I am trying to see if I am missing anything here ... -- This is an

[GitHub] [spark] grundprinzip commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

2022-11-01 Thread GitBox
grundprinzip commented on PR #38470: URL: https://github.com/apache/spark/pull/38470#issuecomment-1299609859 @HyukjinKwon I will add a Jira this is just the starting point to align where we want to go. My idea would be that once this is merged I will create a pr for the python

[GitHub] [spark] cloud-fan commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2022-11-01 Thread GitBox
cloud-fan commented on PR #38171: URL: https://github.com/apache/spark/pull/38171#issuecomment-1299607243 How much confidence do we have in joni? Is it widely adopted by other open-source projects? I'm a bit concerned about moving away from JDK regex and picking a project that I just heard

[GitHub] [spark] LuciferYang commented on pull request #38476: Revert "[SPARK-40976][BUILD] Upgrade sbt to 1.7.3"

2022-11-01 Thread GitBox
LuciferYang commented on PR #38476: URL: https://github.com/apache/spark/pull/38476#issuecomment-1299589185 Sorry for the late reply. I want to know why GA doesn't have this issue? master CI always seems healthy, how can we reproduce this? Let me investigate this. -- This is an

[GitHub] [spark] MaxGekk closed pull request #38478: [MINOR][SQL] Wrap `given` in backticks to fix compilation warning

2022-11-01 Thread GitBox
MaxGekk closed pull request #38478: [MINOR][SQL] Wrap `given` in backticks to fix compilation warning URL: https://github.com/apache/spark/pull/38478 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] MaxGekk commented on pull request #38478: [MINOR][SQL] Wrap `given` in backticks to fix compilation warning

2022-11-01 Thread GitBox
MaxGekk commented on PR #38478: URL: https://github.com/apache/spark/pull/38478#issuecomment-1299585051 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk closed pull request #38438: [SPARK-40748][SQL] Migrate type check failures of conditions onto error classes

2022-11-01 Thread GitBox
MaxGekk closed pull request #38438: [SPARK-40748][SQL] Migrate type check failures of conditions onto error classes URL: https://github.com/apache/spark/pull/38438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk commented on pull request #38438: [SPARK-40748][SQL] Migrate type check failures of conditions onto error classes

2022-11-01 Thread GitBox
MaxGekk commented on PR #38438: URL: https://github.com/apache/spark/pull/38438#issuecomment-1299581375 +1, LGTM. Merging to master. Thank you, @panbingkun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HeartSaVioR commented on pull request #38404: [SPARK-40956] SQL Equivalent for Dataframe overwrite command

2022-11-01 Thread GitBox
HeartSaVioR commented on PR #38404: URL: https://github.com/apache/spark/pull/38404#issuecomment-1299562145 (Just to remind, please update PR title and description as this PR is no longer a draft.) -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] amaliujia commented on pull request #38477: [SPARK-40993][CONNECT]PYTHON[DOCS] Migrate markdown style README to PySpark Development Documentation

2022-11-01 Thread GitBox
amaliujia commented on PR #38477: URL: https://github.com/apache/spark/pull/38477#issuecomment-1299548329 cc @HyukjinKwon @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #38476: Revert "[SPARK-40976][BUILD] Upgrade sbt to 1.7.3"

2022-11-01 Thread GitBox
dongjoon-hyun commented on PR #38476: URL: https://github.com/apache/spark/pull/38476#issuecomment-1299540936 Oh, thank you for reverting, @linhongliu-db and @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-01 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r109299 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-01 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r108516 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-01 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r108516 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] dongjoon-hyun commented on pull request #38474: [SPARK-40991][PYTHON] Update `cloudpickle` to v2.2.0

2022-11-01 Thread GitBox
dongjoon-hyun commented on PR #38474: URL: https://github.com/apache/spark/pull/38474#issuecomment-1299514591 Thank you for review, @HyukjinKwon and @itholic . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] lyy-pineapple commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2022-11-01 Thread GitBox
lyy-pineapple commented on PR #38171: URL: https://github.com/apache/spark/pull/38171#issuecomment-1299505071 Add new benchmark that compared with java 11 and java 17 . cc @cloud-fan @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] LuciferYang opened a new pull request, #38478: [MINOR][SQL] Wrap `given` in backticks to fix compilation warning

2022-11-01 Thread GitBox
LuciferYang opened a new pull request, #38478: URL: https://github.com/apache/spark/pull/38478 ### What changes were proposed in this pull request? A minor change to fix the a Scala related compilation warning ``` [WARNING]

[GitHub] [spark] amaliujia opened a new pull request, #38477: [SPARK-40993][CONNECT]PYTHON[DOCS] Migrate markdown style README to PySpark Development Documentation

2022-11-01 Thread GitBox
amaliujia opened a new pull request, #38477: URL: https://github.com/apache/spark/pull/38477 ### What changes were proposed in this pull request? This PR consolidates the development facing documentation of Spark Connect Python client into existing PySpark development doc

[GitHub] [spark] LuciferYang commented on a diff in pull request #38465: [SPARK-40985][BUILD] Upgrade RoaringBitmap to 0.9.35

2022-11-01 Thread GitBox
LuciferYang commented on code in PR #38465: URL: https://github.com/apache/spark/pull/38465#discussion_r1011088808 ## core/benchmarks/MapStatusesConvertBenchmark-jdk11-results.txt: ## @@ -2,12 +2,12 @@ MapStatuses Convert Benchmark

[GitHub] [spark] beliefer commented on a diff in pull request #38461: [SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries

2022-11-01 Thread GitBox
beliefer commented on code in PR #38461: URL: https://github.com/apache/spark/pull/38461#discussion_r1011086001 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala: ## @@ -346,25 +346,19 @@ object MergeScalarSubqueries extends

[GitHub] [spark] ulysses-you commented on pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-11-01 Thread GitBox
ulysses-you commented on PR #36698: URL: https://github.com/apache/spark/pull/36698#issuecomment-1299431744 @gengliangwang it is a bug fix and also have improvement for saving unnecessary cast. The query will produce the unexpected precision and scale. before: `decimal(28,2)`, after:

[GitHub] [spark] itholic commented on pull request #38474: [SPARK-40991][PYTHON] Update `cloudpickle` to v2.2.0

2022-11-01 Thread GitBox
itholic commented on PR #38474: URL: https://github.com/apache/spark/pull/38474#issuecomment-1299422328 +1 for upgrading the `cloudpickle` version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] itholic commented on a diff in pull request #38465: [SPARK-40985][BUILD] Upgrade RoaringBitmap to 0.9.35

2022-11-01 Thread GitBox
itholic commented on code in PR #38465: URL: https://github.com/apache/spark/pull/38465#discussion_r1011048696 ## core/benchmarks/MapStatusesConvertBenchmark-jdk11-results.txt: ## @@ -2,12 +2,12 @@ MapStatuses Convert Benchmark

[GitHub] [spark] HyukjinKwon commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

2022-11-01 Thread GitBox
HyukjinKwon commented on PR #38470: URL: https://github.com/apache/spark/pull/38470#issuecomment-1299418917 Maybe it's better to have a JIRA. BTW, wonder if we have an e2e example for users can copy and paste to try. (e.g., like most of docs in

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

2022-11-01 Thread GitBox
HyukjinKwon commented on code in PR #38470: URL: https://github.com/apache/spark/pull/38470#discussion_r1011045449 ## connector/connect/doc/client_connection_string.md: ## @@ -0,0 +1,110 @@ +# Connecting to Spark Connect using Clients Review Comment: The usage documentation

[GitHub] [spark] HyukjinKwon closed pull request #38473: [SPARK-40990][PYTHON] DataFrame creation from 2d NumPy array with arbitrary columns

2022-11-01 Thread GitBox
HyukjinKwon closed pull request #38473: [SPARK-40990][PYTHON] DataFrame creation from 2d NumPy array with arbitrary columns URL: https://github.com/apache/spark/pull/38473 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #38473: [SPARK-40990][PYTHON] DataFrame creation from 2d NumPy array with arbitrary columns

2022-11-01 Thread GitBox
HyukjinKwon commented on PR #38473: URL: https://github.com/apache/spark/pull/38473#issuecomment-1299413739 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38476: Revert "[SPARK-40976][BUILD] Upgrade sbt to 1.7.3"

2022-11-01 Thread GitBox
HyukjinKwon closed pull request #38476: Revert "[SPARK-40976][BUILD] Upgrade sbt to 1.7.3" URL: https://github.com/apache/spark/pull/38476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #38476: Revert "[SPARK-40976][BUILD] Upgrade sbt to 1.7.3"

2022-11-01 Thread GitBox
HyukjinKwon commented on PR #38476: URL: https://github.com/apache/spark/pull/38476#issuecomment-1299411174 Merged to master Since this is a clean revert. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon closed pull request #38409: [SPARK-40930][CONNECT] Support Collect() in Python client

2022-11-01 Thread GitBox
HyukjinKwon closed pull request #38409: [SPARK-40930][CONNECT] Support Collect() in Python client URL: https://github.com/apache/spark/pull/38409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #38409: [SPARK-40930][CONNECT] Support Collect() in Python client

2022-11-01 Thread GitBox
HyukjinKwon commented on PR #38409: URL: https://github.com/apache/spark/pull/38409#issuecomment-1299410618 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] linhongliu-db commented on pull request #38476: Revert "[SPARK-40976][BUILD] Upgrade sbt to 1.7.3"

2022-11-01 Thread GitBox
linhongliu-db commented on PR #38476: URL: https://github.com/apache/spark/pull/38476#issuecomment-1299401798 BTW, I really couldn't understand how this is problematic: https://github.com/sbt/sbt/compare/v1.7.2...v1.7.3 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] linhongliu-db commented on pull request #38476: Revert "[SPARK-40976][BUILD] Upgrade sbt to 1.7.3"

2022-11-01 Thread GitBox
linhongliu-db commented on PR #38476: URL: https://github.com/apache/spark/pull/38476#issuecomment-1299401226 cc @LuciferYang, maybe you'll have a fix so we won't need to revert it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] linhongliu-db opened a new pull request, #38476: Revert "[SPARK-40976][BUILD] Upgrade sbt to 1.7.3"

2022-11-01 Thread GitBox
linhongliu-db opened a new pull request, #38476: URL: https://github.com/apache/spark/pull/38476 ### What changes were proposed in this pull request? This reverts commit 9fc3aa0b1c092ab1f13b26582e3ece7440fbfc3b. ### Why are the changes needed? The upgrade breaks

[GitHub] [spark] github-actions[bot] commented on pull request #37259: spark-submit: throw an error when duplicate argument is provided

2022-11-01 Thread GitBox
github-actions[bot] commented on PR #37259: URL: https://github.com/apache/spark/pull/37259#issuecomment-1299388827 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] AmplabJenkins commented on pull request #38452: [SPARK-40802][SQL] Resolve JDBCRelation's schema with preparing the statement

2022-11-01 Thread GitBox
AmplabJenkins commented on PR #38452: URL: https://github.com/apache/spark/pull/38452#issuecomment-1299377919 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38453: [SPARK-40977][CONNECT][PYTHON] Complete Support for Union in Python client

2022-11-01 Thread GitBox
AmplabJenkins commented on PR #38453: URL: https://github.com/apache/spark/pull/38453#issuecomment-1299377890 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] amaliujia commented on pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

2022-11-01 Thread GitBox
amaliujia commented on PR #38475: URL: https://github.com/apache/spark/pull/38475#issuecomment-1299371686 @cloud-fan This is a good example that one API can be implemented with or without a plan. Basically if we don't add a new plan to the proto, clients can still implement

[GitHub] [spark] amaliujia opened a new pull request, #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

2022-11-01 Thread GitBox
amaliujia opened a new pull request, #38475: URL: https://github.com/apache/spark/pull/38475 ### What changes were proposed in this pull request? Add `RenameColumns` to proto to support the implementation for `toDF(columnNames: String*)` which renames the input relation to a

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1010991074 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1010991074 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] srowen commented on pull request #38469: [MINOR][BUILD] Correct the `files` contend in `checkstyle-suppressions.xml`

2022-11-01 Thread GitBox
srowen commented on PR #38469: URL: https://github.com/apache/spark/pull/38469#issuecomment-1299341398 Merged to master/3.3/3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srowen closed pull request #38469: [MINOR][BUILD] Correct the `files` contend in `checkstyle-suppressions.xml`

2022-11-01 Thread GitBox
srowen closed pull request #38469: [MINOR][BUILD] Correct the `files` contend in `checkstyle-suppressions.xml` URL: https://github.com/apache/spark/pull/38469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun opened a new pull request, #38474: [SPARK-XXX][PYTHON] Update cloudpickle to v2.2.0

2022-11-01 Thread GitBox
dongjoon-hyun opened a new pull request, #38474: URL: https://github.com/apache/spark/pull/38474 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] xinrong-meng opened a new pull request, #38473: [SPARK-40990][PYTHON] DataFrame creation from 2d NumPy array with arbitrary columns

2022-11-01 Thread GitBox
xinrong-meng opened a new pull request, #38473: URL: https://github.com/apache/spark/pull/38473 ### What changes were proposed in this pull request? Support DataFrame creation from 2d NumPy array with arbitrary columns. ### Why are the changes needed? Currently, DataFrame

[GitHub] [spark] AmplabJenkins commented on pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-01 Thread GitBox
AmplabJenkins commented on PR #38462: URL: https://github.com/apache/spark/pull/38462#issuecomment-1299294257 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox
AmplabJenkins commented on PR #38463: URL: https://github.com/apache/spark/pull/38463#issuecomment-1299294202 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] amaliujia commented on a diff in pull request #38409: [SPARK-40930][CONNECT] Support Collect() in Python client

2022-11-01 Thread GitBox
amaliujia commented on code in PR #38409: URL: https://github.com/apache/spark/pull/38409#discussion_r1010931805 ## python/pyspark/sql/connect/dataframe.py: ## @@ -305,8 +308,12 @@ def _print_plan(self) -> str: return self._plan.print() return "" -

[GitHub] [spark] amaliujia commented on a diff in pull request #38409: [SPARK-40930][CONNECT] Support Collect() in Python client

2022-11-01 Thread GitBox
amaliujia commented on code in PR #38409: URL: https://github.com/apache/spark/pull/38409#discussion_r1010931805 ## python/pyspark/sql/connect/dataframe.py: ## @@ -305,8 +308,12 @@ def _print_plan(self) -> str: return self._plan.print() return "" -

[GitHub] [spark] dtenedor commented on a diff in pull request #38418: [SPARK-40944][SQL] Relax ordering constraint for CREATE TABLE column options

2022-11-01 Thread GitBox
dtenedor commented on code in PR #38418: URL: https://github.com/apache/spark/pull/38418#discussion_r1010897010 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1001,7 +1001,13 @@ createOrReplaceTableColTypeList ;

[GitHub] [spark] amaliujia commented on a diff in pull request #38418: [SPARK-40944][SQL] Relax ordering constraint for CREATE TABLE column options

2022-11-01 Thread GitBox
amaliujia commented on code in PR #38418: URL: https://github.com/apache/spark/pull/38418#discussion_r1010890792 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1001,7 +1001,13 @@ createOrReplaceTableColTypeList ;

[GitHub] [spark] kristopherkane commented on pull request #38358: [SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering

2022-11-01 Thread GitBox
kristopherkane commented on PR #38358: URL: https://github.com/apache/spark/pull/38358#issuecomment-1299119179 Thanks for the fix! Is it possible this could land in 3.1 as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] grundprinzip commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

2022-11-01 Thread GitBox
grundprinzip commented on PR #38470: URL: https://github.com/apache/spark/pull/38470#issuecomment-1299084859 Good point, I will incorporate that into the doc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] anchovYu commented on pull request #38169: [SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2176-2220

2022-11-01 Thread GitBox
anchovYu commented on PR #38169: URL: https://github.com/apache/spark/pull/38169#issuecomment-1299073908 the title needs to be updated from 2220 to 2200 :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] amaliujia commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

2022-11-01 Thread GitBox
amaliujia commented on PR #38470: URL: https://github.com/apache/spark/pull/38470#issuecomment-1299054036 Overall LGTM Is the `user_id` (or the user session token) be relevant to this

[GitHub] [spark] amaliujia commented on pull request #38472: [SPARK-40989][CONNECT][PYTHON][TESTS] Improve `session.sql` testing coverage in Python client

2022-11-01 Thread GitBox
amaliujia commented on PR #38472: URL: https://github.com/apache/spark/pull/38472#issuecomment-1299035641 R: @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] amaliujia opened a new pull request, #38472: [SPARK-40989][CONNECT][PYTHON][TESTS] Improve `session.sql` testing coverage in Python client

2022-11-01 Thread GitBox
amaliujia opened a new pull request, #38472: URL: https://github.com/apache/spark/pull/38472 ### What changes were proposed in this pull request? This PR tests `session.sql` in Python client both in `toProto` path and the data collection path. ### Why are the

[GitHub] [spark] gengliangwang commented on pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-11-01 Thread GitBox
gengliangwang commented on PR #36698: URL: https://github.com/apache/spark/pull/36698#issuecomment-1299022752 @ulysses-you Is the following query an actual bug before the refactor? Or did the refactor just remove the redundant cast? ``` SELECT CAST(1 AS DECIMAL(28, 2)) UNION ALL

[GitHub] [spark] amaliujia opened a new pull request, #38471: [SC-114545][SPARK-40883][CONNECT] Range.step is required and Python client should have a default value=1

2022-11-01 Thread GitBox
amaliujia opened a new pull request, #38471: URL: https://github.com/apache/spark/pull/38471 ### What changes were proposed in this pull request? To match existing Python DataFarme API, this PR changes the `Range.step` as required and Python client keep `1` as a default value

[GitHub] [spark] amaliujia commented on pull request #38471: [SC-114545][SPARK-40883][CONNECT] Range.step is required and Python client should have a default value=1

2022-11-01 Thread GitBox
amaliujia commented on PR #38471: URL: https://github.com/apache/spark/pull/38471#issuecomment-1299015533 R: @zhengruifeng I sent out this PR based on your suggestion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] carlfu-db commented on pull request #38404: [SPARK-40956] SQL Equivalent for Dataframe overwrite command

2022-11-01 Thread GitBox
carlfu-db commented on PR #38404: URL: https://github.com/apache/spark/pull/38404#issuecomment-1298960790 https://user-images.githubusercontent.com/114777395/199313517-3122d622-ba62-4ac5-8fbf-d01b4e59c394.png;> I have rebase the PR on to the latest apache/master, not sure how to

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes.

2022-11-01 Thread GitBox
SandishKumarHN commented on code in PR #38344: URL: https://github.com/apache/spark/pull/38344#discussion_r1010721467 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -178,46 +176,73 @@ private[sql] object ProtobufUtils extends

[GitHub] [spark] jerrypeng commented on a diff in pull request #38430: [SPARK-40957] Add in memory cache in HDFSMetadataLog

2022-11-01 Thread GitBox
jerrypeng commented on code in PR #38430: URL: https://github.com/apache/spark/pull/38430#discussion_r1010692304 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala: ## @@ -277,10 +295,34 @@ class HDFSMetadataLog[T <: AnyRef :

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1010663824 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1010663824 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] MaxGekk commented on a diff in pull request #38438: [SPARK-40748][SQL] Migrate type check failures of conditions onto error classes

2022-11-01 Thread GitBox
MaxGekk commented on code in PR #38438: URL: https://github.com/apache/spark/pull/38438#discussion_r1010683264 ## sql/core/src/test/java/test/org/apache/spark/sql/JavaColumnExpressionSuite.java: ## @@ -79,12 +83,16 @@ public void isInCollectionCheckExceptionMessage() {

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1010681773 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] rangadi commented on a diff in pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes.

2022-11-01 Thread GitBox
rangadi commented on code in PR #38344: URL: https://github.com/apache/spark/pull/38344#discussion_r1010672856 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -178,46 +176,73 @@ private[sql] object ProtobufUtils extends

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1010660127 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] MaxGekk commented on pull request #38175: [SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2251-2275

2022-11-01 Thread GitBox
MaxGekk commented on PR #38175: URL: https://github.com/apache/spark/pull/38175#issuecomment-1298834297 +1, LGTM. Merged to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1010670125 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] MaxGekk closed pull request #38175: [SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2251-2275

2022-11-01 Thread GitBox
MaxGekk closed pull request #38175: [SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2251-2275 URL: https://github.com/apache/spark/pull/38175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1010663824 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-01 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1010660127 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] AmplabJenkins commented on pull request #38467: [SPARK-40987][CORE] Avoid creating a directory when deleting a block, causing DAGScheduler to not work

2022-11-01 Thread GitBox
AmplabJenkins commented on PR #38467: URL: https://github.com/apache/spark/pull/38467#issuecomment-1298786146 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan closed pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

2022-11-01 Thread GitBox
cloud-fan closed pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO URL: https://github.com/apache/spark/pull/38400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] cloud-fan commented on pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

2022-11-01 Thread GitBox
cloud-fan commented on PR #38400: URL: https://github.com/apache/spark/pull/38400#issuecomment-1298784075 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] pan3793 commented on pull request #32456: [SPARK-35328][Core] Use 'SPARK_DRIVER_LOG_URL_' as env prefix for getting driver log urls by default

2022-11-01 Thread GitBox
pan3793 commented on PR #32456: URL: https://github.com/apache/spark/pull/32456#issuecomment-1298769165 I'm working on https://github.com/apache/spark/pull/38357, it covers this functionality. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] grundprinzip opened a new pull request, #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

2022-11-01 Thread GitBox
grundprinzip opened a new pull request, #38470: URL: https://github.com/apache/spark/pull/38470 ### What changes were proposed in this pull request? This patch adds documentation to describe how clients should implement handling connecting to the Spark Connect endpoint. GRPC as a

[GitHub] [spark] srowen commented on pull request #38469: [MINOR][BUILD] Correct the `files` contend in `checkstyle-suppressions.xml`

2022-11-01 Thread GitBox
srowen commented on PR #38469: URL: https://github.com/apache/spark/pull/38469#issuecomment-1298759901 Oops yeah that looks like the right fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on a diff in pull request #38461: [SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries

2022-11-01 Thread GitBox
cloud-fan commented on code in PR #38461: URL: https://github.com/apache/spark/pull/38461#discussion_r1010607741 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala: ## @@ -346,25 +346,19 @@ object MergeScalarSubqueries extends

[GitHub] [spark] LuciferYang commented on pull request #38469: [MINOR][BUILD] Correct the `files` contend in `checkstyle-suppressions.xml`

2022-11-01 Thread GitBox
LuciferYang commented on PR #38469: URL: https://github.com/apache/spark/pull/38469#issuecomment-1298758798 cc @dongjoon-hyun @srowen Is this change correct? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang opened a new pull request, #38469: [MINOR][BUILD] Correct the `files` contend in `checkstyle-suppressions.xml`

2022-11-01 Thread GitBox
LuciferYang opened a new pull request, #38469: URL: https://github.com/apache/spark/pull/38469 ### What changes were proposed in this pull request? The pr aims to change the suppress files from `sql/core/src/main/java/org/apache/spark/sql/api.java/*` to

[GitHub] [spark] thejdeep commented on pull request #35969: [SPARK-38651][SQL] Add configuration to support writing out empty schemas in supported filebased datasources

2022-11-01 Thread GitBox
thejdeep commented on PR #35969: URL: https://github.com/apache/spark/pull/35969#issuecomment-1298735509 @cloud-fan We have had users writing data with empty schemas in production and changing the schema of a non-trivial number of rows seems like a big change. Spark allows creating empty

[GitHub] [spark] LuciferYang opened a new pull request, #37646: [DON'T MERGE] investigate flaky test in ImageFileFormatSuite

2022-11-01 Thread GitBox
LuciferYang opened a new pull request, #37646: URL: https://github.com/apache/spark/pull/37646 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] LuciferYang closed pull request #37646: [DON'T MERGE] investigate flaky test in ImageFileFormatSuite

2022-11-01 Thread GitBox
LuciferYang closed pull request #37646: [DON'T MERGE] investigate flaky test in ImageFileFormatSuite URL: https://github.com/apache/spark/pull/37646 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] thejdeep commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-01 Thread GitBox
thejdeep commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1010567174 ## core/src/main/scala/org/apache/spark/executor/Executor.scala: ## @@ -654,6 +654,27 @@ private[spark] class Executor(

[GitHub] [spark] thejdeep commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-01 Thread GitBox
thejdeep commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1010566832 ## core/src/main/scala/org/apache/spark/status/storeTypes.scala: ## @@ -138,6 +138,16 @@ private[spark] object TaskIndexNames { final val SHUFFLE_WRITE_RECORDS =

[GitHub] [spark] thejdeep commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-01 Thread GitBox
thejdeep commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1010566209 ## core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala: ## @@ -2623,48 +2677,118 @@ private[spark] object JsonProtocolSuite extends Assertions {

[GitHub] [spark] LuciferYang commented on pull request #38467: [SPARK-40987][CORE] Avoid creating a directory when deleting a block, causing DAGScheduler to not work

2022-11-01 Thread GitBox
LuciferYang commented on PR #38467: URL: https://github.com/apache/spark/pull/38467#issuecomment-1298700972 In my impression, I have seen a similar scenario. Due to disk problems (such as disk no space), the driver will hang and not exiting, also ping @Yikf have you seen similar issue

[GitHub] [spark] grundprinzip commented on pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-01 Thread GitBox
grundprinzip commented on PR #38462: URL: https://github.com/apache/spark/pull/38462#issuecomment-1298698629 R: @HyukjinKwon @zhengruifeng @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang commented on pull request #38467: [SPARK-40987][CORE] Avoid creating a directory when deleting a block, causing DAGScheduler to not work

2022-11-01 Thread GitBox
LuciferYang commented on PR #38467: URL: https://github.com/apache/spark/pull/38467#issuecomment-1298690325 cc @Ngone51 FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] peter-toth commented on a diff in pull request #38461: [SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries

2022-11-01 Thread GitBox
peter-toth commented on code in PR #38461: URL: https://github.com/apache/spark/pull/38461#discussion_r1010548429 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala: ## @@ -346,25 +346,19 @@ object MergeScalarSubqueries extends

[GitHub] [spark] cloud-fan closed pull request #38429: [SPARK-40800][SQL][FOLLOW-UP] Add a config to control whether to always inline one-row relation subquery

2022-11-01 Thread GitBox
cloud-fan closed pull request #38429: [SPARK-40800][SQL][FOLLOW-UP] Add a config to control whether to always inline one-row relation subquery URL: https://github.com/apache/spark/pull/38429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] cloud-fan commented on pull request #38429: [SPARK-40800][SQL][FOLLOW-UP] Add a config to control whether to always inline one-row relation subquery

2022-11-01 Thread GitBox
cloud-fan commented on PR #38429: URL: https://github.com/apache/spark/pull/38429#issuecomment-1298586825 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk closed pull request #38454: [SPARK-40978][SQL] Migrate `failAnalysis()` w/o a context onto error classes

2022-11-01 Thread GitBox
MaxGekk closed pull request #38454: [SPARK-40978][SQL] Migrate `failAnalysis()` w/o a context onto error classes URL: https://github.com/apache/spark/pull/38454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk commented on pull request #38454: [SPARK-40978][SQL] Migrate `failAnalysis()` w/o a context onto error classes

2022-11-01 Thread GitBox
MaxGekk commented on PR #38454: URL: https://github.com/apache/spark/pull/38454#issuecomment-1298505198 Merging to master. Thank you, @LuciferYang @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

  1   2   >