[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310143393 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -64,6 +69,94 @@ private[client] object

[GitHub] [spark] vicennial commented on a diff in pull request #42731: [SPARK-45014][CONNECT] Clean up fileserver when cleaning up files, jars and archives in SparkContext

2023-08-30 Thread via GitHub
vicennial commented on code in PR #42731: URL: https://github.com/apache/spark/pull/42731#discussion_r1310015820 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -208,12 +209,38 @@ class

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42737: [SPARK-44987][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_1100`

2023-08-30 Thread via GitHub
Hisoka-X commented on code in PR #42737: URL: https://github.com/apache/spark/pull/42737#discussion_r1310155968 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -2210,6 +2210,12 @@ ], "sqlState" : "42607" }, + "NON_FOLDABLE_ARGUMENT" : { +

[GitHub] [spark] Hisoka-X opened a new pull request, #42738: [SPARK-44990][SQL] Reduce the frequency of get `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv`

2023-08-30 Thread via GitHub
Hisoka-X opened a new pull request, #42738: URL: https://github.com/apache/spark/pull/42738 ### What changes were proposed in this pull request? This PR move get config `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv` to lazy val of `UnivocityGenerator`. To reduce the

[GitHub] [spark] panbingkun opened a new pull request, #42733: [WIP] test scala213 run on container

2023-08-30 Thread via GitHub
panbingkun opened a new pull request, #42733: URL: https://github.com/apache/spark/pull/42733 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] MaxGekk opened a new pull request, #42737: [SPARK-44987][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_1100`

2023-08-30 Thread via GitHub
MaxGekk opened a new pull request, #42737: URL: https://github.com/apache/spark/pull/42737 ### What changes were proposed in this pull request? In the PR, I propose to assign the name `NON_FOLDABLE_ARGUMENT` to the legacy error class `_LEGACY_ERROR_TEMP_1100`, and improve the error

[GitHub] [spark] Hisoka-X commented on pull request #42738: [SPARK-44990][SQL] Reduce the frequency of get `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv`

2023-08-30 Thread via GitHub
Hisoka-X commented on PR #42738: URL: https://github.com/apache/spark/pull/42738#issuecomment-1699101489 Also, I'm trying to add benchmark of CSV write when most of value is null. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengruifeng opened a new pull request, #42734: [SPARK-45016][PYTHON][CONNECT] Add missing `try_remote_functions` annotations

2023-08-30 Thread via GitHub
zhengruifeng opened a new pull request, #42734: URL: https://github.com/apache/spark/pull/42734 ### What changes were proposed in this pull request? Add missing `try_remote_functions` annotations ### Why are the changes needed? to enable these functions in Connect

[GitHub] [spark] zhengruifeng opened a new pull request, #42735: [SPARK-45015][PYTHON][DOCS] Refine DocStrings of `try_{add, subtract, multiply, divide, avg, sum}`

2023-08-30 Thread via GitHub
zhengruifeng opened a new pull request, #42735: URL: https://github.com/apache/spark/pull/42735 ### What changes were proposed in this pull request? Refine DocStrings of `try_{add, subtract, multiply, divide, avg, sum}`: 1, unify the import `import pyspark.sql.functions as sf` 2,

[GitHub] [spark] Hisoka-X commented on pull request #42738: [SPARK-44990][SQL] Reduce the frequency of get `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv`

2023-08-30 Thread via GitHub
Hisoka-X commented on PR #42738: URL: https://github.com/apache/spark/pull/42738#issuecomment-1699104352 It came from #36110, so cc @cloud-fan @anchovYu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] wankunde commented on a diff in pull request #42488: [SPARK-44804][SQL] SortMergeJoin should respect the streamed side ordering

2023-08-30 Thread via GitHub
wankunde commented on code in PR #42488: URL: https://github.com/apache/spark/pull/42488#discussion_r1310006439 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala: ## @@ -53,12 +53,12 @@ case class SortMergeJoinExec( // For inner join,

[GitHub] [spark] zhengruifeng opened a new pull request, #42736: [SPARK-45017][PYTHON] Add `CalendarIntervalType` to PySpark

2023-08-30 Thread via GitHub
zhengruifeng opened a new pull request, #42736: URL: https://github.com/apache/spark/pull/42736 ### What changes were proposed in this pull request? Add `CalendarIntervalType` to PySpark ### Why are the changes needed? in scala: ``` scala> spark.sql("SELECT

[GitHub] [spark] panbingkun commented on pull request #42733: [WIP] [SPARK-45019][BUILD] Make workflow scala213 on container & clean env

2023-08-30 Thread via GitHub
panbingkun commented on PR #42733: URL: https://github.com/apache/spark/pull/42733#issuecomment-1698946601 cc @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on a diff in pull request #42737: [SPARK-44987][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_1100`

2023-08-30 Thread via GitHub
MaxGekk commented on code in PR #42737: URL: https://github.com/apache/spark/pull/42737#discussion_r1310134304 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -2210,6 +2210,12 @@ ], "sqlState" : "42607" }, + "NON_FOLDABLE_ARGUMENT" : { +

[GitHub] [spark] wankunde commented on a diff in pull request #42488: [SPARK-44804][SQL] SortMergeJoin should respect the streamed side ordering

2023-08-30 Thread via GitHub
wankunde commented on code in PR #42488: URL: https://github.com/apache/spark/pull/42488#discussion_r1309972508 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala: ## @@ -53,12 +53,12 @@ case class SortMergeJoinExec( // For inner join,

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42737: [SPARK-44987][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_1100`

2023-08-30 Thread via GitHub
Hisoka-X commented on code in PR #42737: URL: https://github.com/apache/spark/pull/42737#discussion_r1310125178 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -2210,6 +2210,12 @@ ], "sqlState" : "42607" }, + "NON_FOLDABLE_ARGUMENT" : { +

[GitHub] [spark] panbingkun commented on a diff in pull request #42109: [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

2023-09-05 Thread via GitHub
panbingkun commented on code in PR #42109: URL: https://github.com/apache/spark/pull/42109#discussion_r1315780224 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1134,20 +1134,21 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] zzzzming95 commented on a diff in pull request #42574: [SPARK-43149][SQL] `CreateDataSourceTableCommand` should create metadata first

2023-09-05 Thread via GitHub
ming95 commented on code in PR #42574: URL: https://github.com/apache/spark/pull/42574#discussion_r1315907482 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala: ## @@ -191,16 +193,26 @@ case class

[GitHub] [spark] cloud-fan commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1315957135 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -230,6 +230,15 @@ case class AlterColumn( val

[GitHub] [spark] cloud-fan commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1315957135 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -230,6 +230,15 @@ case class AlterColumn( val

[GitHub] [spark] cloud-fan commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1315981102 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn(

[GitHub] [spark] zzzzming95 commented on pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub
ming95 commented on PR #42804: URL: https://github.com/apache/spark/pull/42804#issuecomment-1706594949 @cloud-fan @wangyum Please merge it to master , thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan commented on pull request #41683: [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL

2023-09-05 Thread via GitHub
cloud-fan commented on PR #41683: URL: https://github.com/apache/spark/pull/41683#issuecomment-1706680230 Let's spend more time on the API design first, as different people may have different opinions and we should collect as much feedback as possible. Taking a step back, I think

[GitHub] [spark] cloud-fan commented on a diff in pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42481: URL: https://github.com/apache/spark/pull/42481#discussion_r1315972356 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -124,7 +136,7 @@ object SQLExecution { physicalPlanDescription =

[GitHub] [spark] MaxGekk closed pull request #42816: [WIP][SPARK-45022][SQL] Provide context for dataset API errors

2023-09-05 Thread via GitHub
MaxGekk closed pull request #42816: [WIP][SPARK-45022][SQL] Provide context for dataset API errors URL: https://github.com/apache/spark/pull/42816 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zzzzming95 commented on a diff in pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub
ming95 commented on code in PR #42804: URL: https://github.com/apache/spark/pull/42804#discussion_r1316079522 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -234,11 +240,7 @@ abstract class BinaryArithmetic extends

[GitHub] [spark] hvanhovell commented on pull request #42807: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

2023-09-05 Thread via GitHub
hvanhovell commented on PR #42807: URL: https://github.com/apache/spark/pull/42807#issuecomment-1706856355 I think it is a bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun commented on a diff in pull request #42109: [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

2023-09-05 Thread via GitHub
panbingkun commented on code in PR #42109: URL: https://github.com/apache/spark/pull/42109#discussion_r1315778389 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala: ## @@ -44,7 +44,7 @@ case class

[GitHub] [spark] MaxGekk commented on a diff in pull request #42801: [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar`

2023-09-05 Thread via GitHub
MaxGekk commented on code in PR #42801: URL: https://github.com/apache/spark/pull/42801#discussion_r1315744531 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -4284,12 +4285,22 @@ object functions { * prints '+' for positive

[GitHub] [spark] wangyum commented on a diff in pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub
wangyum commented on code in PR #42804: URL: https://github.com/apache/spark/pull/42804#discussion_r1315905477 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -234,11 +240,7 @@ abstract class BinaryArithmetic extends

[GitHub] [spark] dongjoon-hyun commented on pull request #42807: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

2023-09-05 Thread via GitHub
dongjoon-hyun commented on PR #42807: URL: https://github.com/apache/spark/pull/42807#issuecomment-1706851416 Let me fix that for you, @hvanhovell . If you don't think this is not a bug, please let me know, @hvanhovell . -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] MaxGekk opened a new pull request, #42817: [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy

2023-09-05 Thread via GitHub
MaxGekk opened a new pull request, #42817: URL: https://github.com/apache/spark/pull/42817 ### What changes were proposed in this pull request? In the PR, I propose to check the `accuracy` argument is not a NULL in `ApproximatePercentile`. And if it is, throw an `AnalysisException` with

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
Hisoka-X commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1315970129 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -230,6 +230,15 @@ case class AlterColumn( val

[GitHub] [spark] dtenedor commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
dtenedor commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316137847 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn(

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
Hisoka-X commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316039979 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn(

[GitHub] [spark] juliuszsompolski commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

2023-09-05 Thread via GitHub
juliuszsompolski commented on PR #42806: URL: https://github.com/apache/spark/pull/42806#issuecomment-1706640901 https://github.com/juliuszsompolski/apache-spark/actions/runs/6076122424/job/16483638602 This module timed out. All connect related tests finished successfuly. -- This is

[GitHub] [spark] hvanhovell closed pull request #42807: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

2023-09-05 Thread via GitHub
hvanhovell closed pull request #42807: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes URL: https://github.com/apache/spark/pull/42807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] ueshin commented on pull request #42726: [SPARK-44640][PYTHON][FOLLOW-UP] Update UDTF error messages to include method name

2023-09-02 Thread via GitHub
ueshin commented on PR #42726: URL: https://github.com/apache/spark/pull/42726#issuecomment-1703960044 Thanks! merging to master/3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] ueshin commented on pull request #42726: [SPARK-44640][PYTHON][FOLLOW-UP] Update UDTF error messages to include method name

2023-09-02 Thread via GitHub
ueshin commented on PR #42726: URL: https://github.com/apache/spark/pull/42726#issuecomment-1703960647 @allisonwang-db There is a conflict with 3.5. Could you help fix it and submit another PR? Thanks. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] ueshin closed pull request #42726: [SPARK-44640][PYTHON][FOLLOW-UP] Update UDTF error messages to include method name

2023-09-02 Thread via GitHub
ueshin closed pull request #42726: [SPARK-44640][PYTHON][FOLLOW-UP] Update UDTF error messages to include method name URL: https://github.com/apache/spark/pull/42726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] wangyum commented on pull request #42777: [SPARK-45054][SQL] HiveExternalCatalog.listPartitions should restore partition statistics

2023-09-02 Thread via GitHub
wangyum commented on PR #42777: URL: https://github.com/apache/spark/pull/42777#issuecomment-1703987072 LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] panbingkun commented on pull request #42764: [SPARK-45043][BUILD] Upgrade `scalafmt` to 3.7.13

2023-09-02 Thread via GitHub
panbingkun commented on PR #42764: URL: https://github.com/apache/spark/pull/42764#issuecomment-1703844726 > I think this is fine to update plugins in general. Some stuff like this doesn't seem like it does anything for Spark, so isn't that important. As long as we're not generating a

[GitHub] [spark] github-actions[bot] closed pull request #41168: [SPARK-43454][CORE] support substitution for SparkConf's get and getAllWithPrefix

2023-09-02 Thread via GitHub
github-actions[bot] closed pull request #41168: [SPARK-43454][CORE] support substitution for SparkConf's get and getAllWithPrefix URL: https://github.com/apache/spark/pull/41168 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] github-actions[bot] commented on pull request #40312: [SPARK-42695][SQL] Skew join handling in stream side of broadcast hash join

2023-09-02 Thread via GitHub
github-actions[bot] commented on PR #40312: URL: https://github.com/apache/spark/pull/40312#issuecomment-1703970923 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] mridulm commented on a diff in pull request #42742: [SPARK-45025][CORE] Allow block manager memory store iterator to handle thread interrupt and perform task completion gracefully

2023-09-02 Thread via GitHub
mridulm commented on code in PR #42742: URL: https://github.com/apache/spark/pull/42742#discussion_r1314096518 ## core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala: ## @@ -220,7 +220,8 @@ private[spark] class MemoryStore( } // Unroll this block

[GitHub] [spark] mridulm commented on a diff in pull request #42742: [SPARK-45025][CORE] Allow block manager memory store iterator to handle thread interrupt and perform task completion gracefully

2023-09-02 Thread via GitHub
mridulm commented on code in PR #42742: URL: https://github.com/apache/spark/pull/42742#discussion_r1314096518 ## core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala: ## @@ -220,7 +220,8 @@ private[spark] class MemoryStore( } // Unroll this block

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-09-02 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1313864858 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -213,4 +205,19 @@ object Connect { .version("3.5.0")

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-09-02 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1313864811 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -213,4 +205,19 @@ object Connect { .version("3.5.0")

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42235: [SPARK-44424][CONNECT][PYTHON] Python client for reattaching to existing execute in Spark Connect

2023-09-02 Thread via GitHub
allisonwang-db commented on code in PR #42235: URL: https://github.com/apache/spark/pull/42235#discussion_r1313969873 ## python/pyspark/testing/connectutils.py: ## @@ -170,6 +170,10 @@ def conf(cls): # Disable JVM stack trace in Spark Connect tests to prevent the

[GitHub] [spark] mridulm commented on a diff in pull request #42742: [SPARK-45025][CORE] Allow block manager memory store iterator to handle thread interrupt and perform task completion gracefully

2023-09-02 Thread via GitHub
mridulm commented on code in PR #42742: URL: https://github.com/apache/spark/pull/42742#discussion_r1314096518 ## core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala: ## @@ -220,7 +220,8 @@ private[spark] class MemoryStore( } // Unroll this block

[GitHub] [spark] mridulm commented on a diff in pull request #42742: [SPARK-45025][CORE] Allow block manager memory store iterator to handle thread interrupt and perform task completion gracefully

2023-09-02 Thread via GitHub
mridulm commented on code in PR #42742: URL: https://github.com/apache/spark/pull/42742#discussion_r1314096518 ## core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala: ## @@ -220,7 +220,8 @@ private[spark] class MemoryStore( } // Unroll this block

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42235: [SPARK-44424][CONNECT][PYTHON] Python client for reattaching to existing execute in Spark Connect

2023-09-02 Thread via GitHub
HyukjinKwon commented on code in PR #42235: URL: https://github.com/apache/spark/pull/42235#discussion_r1314102126 ## python/pyspark/testing/connectutils.py: ## @@ -170,6 +170,10 @@ def conf(cls): # Disable JVM stack trace in Spark Connect tests to prevent the

[GitHub] [spark] allisonwang-db opened a new pull request, #42782: [SPARK-45058][PYTHON][DOCS] Refine docstring of DataFrame.distinct

2023-09-02 Thread via GitHub
allisonwang-db opened a new pull request, #42782: URL: https://github.com/apache/spark/pull/42782 ### What changes were proposed in this pull request? This PR refines the docstring of `DataFrame.distinct` by adding more examples. ### Why are the changes needed?

[GitHub] [spark] allisonwang-db commented on pull request #42726: [SPARK-44640][PYTHON][FOLLOW-UP] Update UDTF error messages to include method name

2023-09-02 Thread via GitHub
allisonwang-db commented on PR #42726: URL: https://github.com/apache/spark/pull/42726#issuecomment-1703927817 We need this PR in branch-3.5 as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] Hisoka-X opened a new pull request, #42783: [SPARK-45059][CONNECT][PYTHON] Add `try_reflect` functions to Scala and Python

2023-09-02 Thread via GitHub
Hisoka-X opened a new pull request, #42783: URL: https://github.com/apache/spark/pull/42783 ### What changes were proposed in this pull request? Add new `try_reflect` funtion to python and connect. ### Why are the changes needed? for parity ###

[GitHub] [spark] MaxGekk opened a new pull request, #42781: [WIP][SPARK-45060][SQL] Fix an internal error from `to_char()`on `NULL` format

2023-09-02 Thread via GitHub
MaxGekk opened a new pull request, #42781: URL: https://github.com/apache/spark/pull/42781 ### What changes were proposed in this pull request? ### Why are the changes needed? To fix the issue demonstrated by the example: ``` $ spark-sql (default)> SELECT

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42770: [SPARK-45049][CONNECT][DOCS][TESTS] Refine docstrings of `coalesce/repartition/repartitionByRange`

2023-09-02 Thread via GitHub
allisonwang-db commented on code in PR #42770: URL: https://github.com/apache/spark/pull/42770#discussion_r1313926440 ## python/pyspark/sql/dataframe.py: ## @@ -1809,18 +1810,27 @@ def repartition( # type: ignore[misc] Repartition the data into 10 partitions. -

[GitHub] [spark] srowen commented on pull request #42815: [SPARK-45077][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread via GitHub
srowen commented on PR #42815: URL: https://github.com/apache/spark/pull/42815#issuecomment-1707031614 Are you saying 0.6.4 doesn't work well? is this just for your testing then and not to merge? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] srowen commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
srowen commented on PR #42819: URL: https://github.com/apache/spark/pull/42819#issuecomment-1707033259 Seems OK in principle, just need tests to pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] xuanyuanking commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
xuanyuanking commented on PR #42819: URL: https://github.com/apache/spark/pull/42819#issuecomment-1707142988 The test `On pull request update / Notify test workflow (pull_request_target)` that failed has passed for the initial commit. I think it's good to go. @srowen, could you give your

[GitHub] [spark] andylam-db commented on a diff in pull request #42725: [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition

2023-09-05 Thread via GitHub
andylam-db commented on code in PR #42725: URL: https://github.com/apache/spark/pull/42725#discussion_r1316384305 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3751,4 +3751,14 @@ private[sql] object QueryCompilationErrors

[GitHub] [spark] dtenedor commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
dtenedor commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316435058 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn(

[GitHub] [spark] srielau commented on pull request #41683: [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL

2023-09-05 Thread via GitHub
srielau commented on PR #41683: URL: https://github.com/apache/spark/pull/41683#issuecomment-1706980076 +1 on using a WITH clause. For UPDATE: > WITH (OPTIONS ( Why the nesting? -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] juliuszsompolski commented on pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable

2023-09-05 Thread via GitHub
juliuszsompolski commented on PR #42772: URL: https://github.com/apache/spark/pull/42772#issuecomment-170760 We maintain backwards compatibility, where older clients can connect to newer server. These older clients will not provide such UUIDs. What will happen then? Does it break any

[GitHub] [spark] mridulm commented on a diff in pull request #42529: [SPARK-44845][YARN][DEPLOY] Fix file system uri comparison function

2023-09-05 Thread via GitHub
mridulm commented on code in PR #42529: URL: https://github.com/apache/spark/pull/42529#discussion_r1316224064 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -1618,9 +1618,9 @@ private[spark] object Client extends Logging {

[GitHub] [spark] xuanyuanking commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
xuanyuanking commented on PR #42819: URL: https://github.com/apache/spark/pull/42819#issuecomment-1707175717 @srowen Checked manually for both master and branch-3.5, the Mima test passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] juliuszsompolski commented on pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable

2023-09-05 Thread via GitHub
juliuszsompolski commented on PR #42772: URL: https://github.com/apache/spark/pull/42772#issuecomment-1707008850 ... although, the currently existing (Spark 3.4) clients never generate operationId client side, so we can get away with adding an assertion that the client side id is UUID7 in

[GitHub] [spark] xuanyuanking closed pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
xuanyuanking closed pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0 URL: https://github.com/apache/spark/pull/42819 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] planga82 commented on a diff in pull request #42759: [SPARK-45039][SQL] Include full identifier in Storage tab

2023-09-05 Thread via GitHub
planga82 commented on code in PR #42759: URL: https://github.com/apache/spark/pull/42759#discussion_r1316342993 ## sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala: ## @@ -117,12 +117,20 @@ class CacheManager extends Logging with

[GitHub] [spark] xuanyuanking commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
xuanyuanking commented on PR #42819: URL: https://github.com/apache/spark/pull/42819#issuecomment-1707160026 good point, let me run Mima test manually on both master and 3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] agubichev commented on a diff in pull request #42778: [SPARK-45055] [SQL] Do not transpose windows if they conflict on ORDER BY / PROJECT clauses

2023-09-05 Thread via GitHub
agubichev commented on code in PR #42778: URL: https://github.com/apache/spark/pull/42778#discussion_r1316176643 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala: ## @@ -160,4 +160,18 @@ class TransposeWindowSuite extends

[GitHub] [spark] pan3793 opened a new pull request, #42820: [WIP][TEST] Remove obsolete repo of DB2 JDBC driver

2023-09-05 Thread via GitHub
pan3793 opened a new pull request, #42820: URL: https://github.com/apache/spark/pull/42820 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] dillitz commented on pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable

2023-09-05 Thread via GitHub
dillitz commented on PR #42772: URL: https://github.com/apache/spark/pull/42772#issuecomment-1707033407 We agreed that the benefits of adding this are not big enough because we can not rely on the operation ID being UUIDv7 and need to sort by startDate anyway. Closing this PR. -- This

[GitHub] [spark] dillitz closed pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable

2023-09-05 Thread via GitHub
dillitz closed pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable URL: https://github.com/apache/spark/pull/42772 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] juliuszsompolski opened a new pull request, #42818: [SPARK-44835] Make INVALID_CURSOR.DISCONNECTED a retriable error

2023-09-05 Thread via GitHub
juliuszsompolski opened a new pull request, #42818: URL: https://github.com/apache/spark/pull/42818 ### What changes were proposed in this pull request? Make INVALID_CURSOR.DISCONNECTED a retriable error. ### Why are the changes needed? This error can happen if two RPCs

[GitHub] [spark] xuanyuanking opened a new pull request, #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
xuanyuanking opened a new pull request, #42819: URL: https://github.com/apache/spark/pull/42819 ### What changes were proposed in this pull request? Compare the 3.4 API doc with the 3.5 RC3 cut. Fix the following issues: - Remove the leaking class/object in API doc ###

[GitHub] [spark] WweiL commented on pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener

2023-09-05 Thread via GitHub
WweiL commented on PR #42664: URL: https://github.com/apache/spark/pull/42664#issuecomment-1707179668 fixed in https://github.com/apache/spark/commit/7be69bf7da036282c2c7c0b62c32e7666fa1b579 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] WweiL closed pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener

2023-09-05 Thread via GitHub
WweiL closed pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener URL: https://github.com/apache/spark/pull/42664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] xuanyuanking commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
xuanyuanking commented on PR #42819: URL: https://github.com/apache/spark/pull/42819#issuecomment-1707217892 Thanks, merged in master and branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] allisonwang-db opened a new pull request, #42821: [SPARK-45083][PYTHON][DOCS] Refine the docstring of function `min`

2023-09-05 Thread via GitHub
allisonwang-db opened a new pull request, #42821: URL: https://github.com/apache/spark/pull/42821 ### What changes were proposed in this pull request? This PR refines the function `min` docstring by adding more examples. ### Why are the changes needed? To improve

[GitHub] [spark] sunchao commented on pull request #42757: [SPARK-45036][SQL] SPJ: Simplify the logic to handle partially clustered distribution

2023-09-05 Thread via GitHub
sunchao commented on PR #42757: URL: https://github.com/apache/spark/pull/42757#issuecomment-1706951663 Thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
srowen commented on PR #42819: URL: https://github.com/apache/spark/pull/42819#issuecomment-1707147907 I'm concerned that this may not pass mima tests and I don't see that this test was run because of other errors. Have you checked that it passes in both branches? -- This is an

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42770: [SPARK-45049][CONNECT][DOCS][TESTS] Refine docstrings of `coalesce/repartition/repartitionByRange`

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42770: URL: https://github.com/apache/spark/pull/42770#discussion_r1314345886 ## python/pyspark/sql/dataframe.py: ## @@ -1809,18 +1810,27 @@ def repartition( # type: ignore[misc] Repartition the data into 10 partitions. -

[GitHub] [spark] itholic opened a new pull request, #42787: [SPARK-43241][PS] `MultiIndex.append` not checking names for equality

2023-09-03 Thread via GitHub
itholic opened a new pull request, #42787: URL: https://github.com/apache/spark/pull/42787 ### What changes were proposed in this pull request? This PR proposes to fix the behavior of `MultiIndex.append` to do not checking names. ### Why are the changes needed? To match

[GitHub] [spark] LuciferYang commented on pull request #42598: [SPARK-44890][BUILD]Update miswritten remarks

2023-09-03 Thread via GitHub
LuciferYang commented on PR #42598: URL: https://github.com/apache/spark/pull/42598#issuecomment-1704566124 https://github.com/apache/spark/assets/1475305/913cfb25-6bab-4a33-ba73-60a2f4f4f43a;> Is there a problem with your GitHub Action configuration? Why does the GA page look like

[GitHub] [spark] sadikovi opened a new pull request, #42790: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-03 Thread via GitHub
sadikovi opened a new pull request, #42790: URL: https://github.com/apache/spark/pull/42790 ### What changes were proposed in this pull request? Backport of https://github.com/apache/spark/pull/42667 to branch-3.5. The PR improves JSON parsing when

[GitHub] [spark] sadikovi opened a new pull request, #42792: [SPARK-44940][SQL][3.4] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-03 Thread via GitHub
sadikovi opened a new pull request, #42792: URL: https://github.com/apache/spark/pull/42792 ### What changes were proposed in this pull request? Backport of https://github.com/apache/spark/pull/42667 to branch-3.4. The PR improves JSON parsing when

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42791: [SPARK-45064][PYTHON][CONNECT] Add the missing `scale` parameter in `ceil/ceiling`

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42791: URL: https://github.com/apache/spark/pull/42791#discussion_r1314419285 ## python/pyspark/sql/connect/functions.py: ## @@ -552,15 +552,23 @@ def cbrt(col: "ColumnOrName") -> Column: cbrt.__doc__ = pysparkfuncs.cbrt.__doc__ -def

[GitHub] [spark] cloud-fan commented on a diff in pull request #42752: [SPARK-45033][SQL] Support maps by parameterized `sql()`

2023-09-03 Thread via GitHub
cloud-fan commented on code in PR #42752: URL: https://github.com/apache/spark/pull/42752#discussion_r1314459460 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala: ## @@ -96,7 +96,11 @@ case class PosParameterizedQuery(child: LogicalPlan,

[GitHub] [spark] panbingkun commented on pull request #42761: [SPARK-45042][BUILD] Upgrade jetty to 9.4.52.v20230823

2023-09-03 Thread via GitHub
panbingkun commented on PR #42761: URL: https://github.com/apache/spark/pull/42761#issuecomment-1704645884 > Merged into master. There are conflicts with 3.5, could you please give a separate pr ? @panbingkun Sure, Let me do it now. -- This is an automated message from the Apache

[GitHub] [spark] panbingkun opened a new pull request, #42795: [SPARK-45042][BUILD][3.5] Upgrade jetty to 9.4.52.v20230823

2023-09-04 Thread via GitHub
panbingkun opened a new pull request, #42795: URL: https://github.com/apache/spark/pull/42795 ### What changes were proposed in this pull request? The pr aims to Upgrade jetty from 9.4.51.v20230217 to 9.4.52.v20230823. (Backport to Spark 3.5.0) ### Why are the changes needed? -

[GitHub] [spark] panbingkun opened a new pull request, #42797: [SPARK-45068][SQL] Make function output column name consistent in case

2023-09-04 Thread via GitHub
panbingkun opened a new pull request, #42797: URL: https://github.com/apache/spark/pull/42797 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42770: [SPARK-45049][CONNECT][DOCS][TESTS] Refine docstrings of `coalesce/repartition/repartitionByRange`

2023-09-03 Thread via GitHub
HyukjinKwon commented on code in PR #42770: URL: https://github.com/apache/spark/pull/42770#discussion_r1314343527 ## python/pyspark/sql/dataframe.py: ## @@ -1809,18 +1810,27 @@ def repartition( # type: ignore[misc] Repartition the data into 10 partitions. -

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42783: [SPARK-45059][CONNECT][PYTHON] Add `try_reflect` functions to Scala and Python

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42783: URL: https://github.com/apache/spark/pull/42783#discussion_r1314365390 ## python/pyspark/sql/functions.py: ## @@ -15748,6 +15749,33 @@ def java_method(*cols: "ColumnOrName") -> Column: return

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42775: [SPARK-45052][SQL][PYTHON][CONNECT] Make function aliases output column name consistent with SQL

2023-09-03 Thread via GitHub
HyukjinKwon commented on code in PR #42775: URL: https://github.com/apache/spark/pull/42775#discussion_r1314366796 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -1052,15 +1049,15 @@ object functions { * @group agg_funcs * @since 3.5.0 */ -

[GitHub] [spark] ueshin opened a new pull request, #42785: [SPARK-44876][PYTHON][FOLLOWUP][3.5] Fix Arrow-optimized Python UDF to delay wrapping the function with fail_on_stopiteration

2023-09-03 Thread via GitHub
ueshin opened a new pull request, #42785: URL: https://github.com/apache/spark/pull/42785 ### What changes were proposed in this pull request? This is a backport of https://github.com/apache/spark/pull/42784. Fixes Arrow-optimized Python UDF to delay wrapping the function with

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42783: [SPARK-45059][CONNECT][PYTHON] Add `try_reflect` functions to Scala and Python

2023-09-03 Thread via GitHub
Hisoka-X commented on code in PR #42783: URL: https://github.com/apache/spark/pull/42783#discussion_r1314377696 ## python/pyspark/sql/functions.py: ## @@ -15748,6 +15749,33 @@ def java_method(*cols: "ColumnOrName") -> Column: return

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42770: [SPARK-45049][CONNECT][DOCS][TESTS] Refine docstrings of `coalesce/repartition/repartitionByRange`

2023-09-03 Thread via GitHub
zhengruifeng commented on code in PR #42770: URL: https://github.com/apache/spark/pull/42770#discussion_r1314388163 ## python/pyspark/sql/dataframe.py: ## @@ -1809,18 +1810,27 @@ def repartition( # type: ignore[misc] Repartition the data into 10 partitions. -

[GitHub] [spark] itholic opened a new pull request, #42793: [SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

2023-09-03 Thread via GitHub
itholic opened a new pull request, #42793: URL: https://github.com/apache/spark/pull/42793 ### What changes were proposed in this pull request? This PR proposes to support pandas 2.1.0 for PySpark. See [What's new in 2.1.0](https://pandas.pydata.org/docs/dev/whatsnew/v2.1.0.html)

[GitHub] [spark] MaxGekk commented on a diff in pull request #42752: [SPARK-45033][SQL] Support maps by parameterized `sql()`

2023-09-03 Thread via GitHub
MaxGekk commented on code in PR #42752: URL: https://github.com/apache/spark/pull/42752#discussion_r131739 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala: ## @@ -96,7 +96,11 @@ case class PosParameterizedQuery(child: LogicalPlan,

  1   2   3   4   5   6   7   8   9   10   >