[GitHub] [spark] cloud-fan commented on pull request #38358: [SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering

2022-11-08 Thread GitBox
cloud-fan commented on PR #38358: URL: https://github.com/apache/spark/pull/38358#issuecomment-1308354718 thanks, merging to 3.3/3.2! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on a diff in pull request #38566: [SPARK-41046][CONNECT] Support CreateView in Connect DSL

2022-11-08 Thread GitBox
amaliujia commented on code in PR #38566: URL: https://github.com/apache/spark/pull/38566#discussion_r1017553189 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/command/SparkConnectCommandPlanner.scala: ## @@ -79,6 +85,32 @@ class

[GitHub] [spark] amaliujia commented on a diff in pull request #38566: [SPARK-41046][CONNECT] Support CreateView in Connect DSL

2022-11-08 Thread GitBox
amaliujia commented on code in PR #38566: URL: https://github.com/apache/spark/pull/38566#discussion_r1017551706 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/command/SparkConnectCommandPlanner.scala: ## @@ -79,6 +85,32 @@ class

[GitHub] [spark] cloud-fan commented on pull request #38573: [SPARK-41061][CONNECT] Support SelectExpr which applies Projection by expressions in Strings in Connect DSL

2022-11-08 Thread GitBox
cloud-fan commented on PR #38573: URL: https://github.com/apache/spark/pull/38573#issuecomment-1308348049 shall we add the python client API in this PR as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on a diff in pull request #38566: [SPARK-41046][CONNECT] Support CreateView in Connect DSL

2022-11-08 Thread GitBox
cloud-fan commented on code in PR #38566: URL: https://github.com/apache/spark/pull/38566#discussion_r1017549065 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/command/SparkConnectCommandPlanner.scala: ## @@ -79,6 +85,32 @@ class

[GitHub] [spark] cloud-fan closed pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

2022-11-08 Thread GitBox
cloud-fan closed pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL URL: https://github.com/apache/spark/pull/38475 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

2022-11-08 Thread GitBox
cloud-fan commented on PR #38475: URL: https://github.com/apache/spark/pull/38475#issuecomment-1308344423 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] wangyum commented on a diff in pull request #38577: [SPARK-41071][BUILD] Remove `MaxMetaspaceSize` option from `make-distribution.sh` to make it run successfully

2022-11-08 Thread GitBox
wangyum commented on code in PR #38577: URL: https://github.com/apache/spark/pull/38577#discussion_r1017537591 ## dev/make-distribution.sh: ## @@ -161,7 +161,7 @@ fi # Build uber fat JAR cd "$SPARK_HOME" -export MAVEN_OPTS="${MAVEN_OPTS:--Xss128m -Xmx4g -Xmx4g

[GitHub] [spark] HyukjinKwon closed pull request #38570: [SPARK-41056][R] Fix new R_LIBS_SITE behavior introduced in R 4.2

2022-11-08 Thread GitBox
HyukjinKwon closed pull request #38570: [SPARK-41056][R] Fix new R_LIBS_SITE behavior introduced in R 4.2 URL: https://github.com/apache/spark/pull/38570 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #38570: [SPARK-41056][R] Fix new R_LIBS_SITE behavior introduced in R 4.2

2022-11-08 Thread GitBox
HyukjinKwon commented on PR #38570: URL: https://github.com/apache/spark/pull/38570#issuecomment-1308338748 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #38570: [SPARK-41056][R] Fix new R_LIBS_SITE behavior introduced in R 4.2

2022-11-08 Thread GitBox
HyukjinKwon commented on PR #38570: URL: https://github.com/apache/spark/pull/38570#issuecomment-1308337531 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengchenyu commented on pull request #37949: [SPARK-40504][YARN] Make yarn appmaster load config from client

2022-11-08 Thread GitBox
zhengchenyu commented on PR #37949: URL: https://github.com/apache/spark/pull/37949#issuecomment-1308320736 @dongjoon-hyun @srowen Can you please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

2022-11-08 Thread GitBox
cloud-fan commented on PR #38511: URL: https://github.com/apache/spark/pull/38511#issuecomment-1308318576 cc @viirya @sigmod @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on pull request #38573: [SPARK-41061][CONNECT] Support SelectExpr which applies Projection by expressions in Strings in Connect DSL

2022-11-08 Thread GitBox
amaliujia commented on PR #38573: URL: https://github.com/apache/spark/pull/38573#issuecomment-1308314938 R: @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on a diff in pull request #38577: [SPARK-41071][BUILD] Remove `MaxMetaspaceSize` option from `make-distribution.sh` to make it run successfully

2022-11-08 Thread GitBox
LuciferYang commented on code in PR #38577: URL: https://github.com/apache/spark/pull/38577#discussion_r1017502280 ## dev/make-distribution.sh: ## @@ -161,7 +161,7 @@ fi # Build uber fat JAR cd "$SPARK_HOME" -export MAVEN_OPTS="${MAVEN_OPTS:--Xss128m -Xmx4g -Xmx4g

[GitHub] [spark] LuciferYang commented on pull request #38577: [SPARK-41071][BUILD] Remove `MaxMetaspaceSize` option from `make-distribution.sh` to make it run successfully

2022-11-08 Thread GitBox
LuciferYang commented on PR #38577: URL: https://github.com/apache/spark/pull/38577#issuecomment-1308298746 > As you mentioned in the PR description, did you hit this only at master branch issue? Let me double check this > Which java version are you using now? test on

[GitHub] [spark] viirya commented on a diff in pull request #38577: [SPARK-41071][BUILD] Remove `MaxMetaspaceSize` option from `make-distribution.sh` to make it run successfully

2022-11-08 Thread GitBox
viirya commented on code in PR #38577: URL: https://github.com/apache/spark/pull/38577#discussion_r1017499034 ## dev/make-distribution.sh: ## @@ -161,7 +161,7 @@ fi # Build uber fat JAR cd "$SPARK_HOME" -export MAVEN_OPTS="${MAVEN_OPTS:--Xss128m -Xmx4g -Xmx4g

[GitHub] [spark] pan3793 commented on pull request #38577: [SPARK-41071][BUILD] Remove `MaxMetaspaceSize` option from `make-distribution.sh` to make it run successfully

2022-11-08 Thread GitBox
pan3793 commented on PR #38577: URL: https://github.com/apache/spark/pull/38577#issuecomment-1308297255 I reproduced the issue w/ openjdk 1.8.0_332 macos-aarch64 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #38577: [SPARK-41071][BUILD] Remove `MaxMetaspaceSize` option from `make-distribution.sh` to make it run successfully

2022-11-08 Thread GitBox
dongjoon-hyun commented on PR #38577: URL: https://github.com/apache/spark/pull/38577#issuecomment-1308293543 To @LuciferYang , - As you mentioned in the PR description, did you hit this only at `master` branch issue? - Which java version are you using now? -- This is an automated

[GitHub] [spark] amaliujia commented on a diff in pull request #38566: [SPARK-41046][CONNECT] Support CreateView in Connect DSL

2022-11-08 Thread GitBox
amaliujia commented on code in PR #38566: URL: https://github.com/apache/spark/pull/38566#discussion_r1017494208 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/command/SparkConnectCommandPlanner.scala: ## @@ -79,6 +85,32 @@ class

[GitHub] [spark] dongjoon-hyun commented on pull request #38577: [SPARK-41071][BUILD] Remove `MaxMetaspaceSize` option from `make-distribution.sh` to make it run successfully

2022-11-08 Thread GitBox
dongjoon-hyun commented on PR #38577: URL: https://github.com/apache/spark/pull/38577#issuecomment-1308290786 cc @viirya too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38506: [SPARK-41010][CONNECT][PYTHON] Complete Support for Except and Intersect in Python client

2022-11-08 Thread GitBox
zhengruifeng commented on code in PR #38506: URL: https://github.com/apache/spark/pull/38506#discussion_r1017492795 ## python/pyspark/sql/connect/dataframe.py: ## @@ -317,7 +319,83 @@ def unionByName(self, other: "DataFrame", allowMissingColumns: bool = False) -> if

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38566: [SPARK-41046][CONNECT] Support CreateView in Connect DSL

2022-11-08 Thread GitBox
zhengruifeng commented on code in PR #38566: URL: https://github.com/apache/spark/pull/38566#discussion_r1017490854 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/command/SparkConnectCommandPlanner.scala: ## @@ -79,6 +85,32 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38566: [SPARK-41046][CONNECT] Support CreateView in Connect DSL

2022-11-08 Thread GitBox
zhengruifeng commented on code in PR #38566: URL: https://github.com/apache/spark/pull/38566#discussion_r1017490854 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/command/SparkConnectCommandPlanner.scala: ## @@ -79,6 +85,32 @@ class

[GitHub] [spark] cloud-fan commented on pull request #38491: [SPARK-41058][CONNECT] Remove unused import in commands.proto

2022-11-08 Thread GitBox
cloud-fan commented on PR #38491: URL: https://github.com/apache/spark/pull/38491#issuecomment-1308284185 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #38491: [SPARK-41058][CONNECT] Remove unused import in commands.proto

2022-11-08 Thread GitBox
cloud-fan closed pull request #38491: [SPARK-41058][CONNECT] Remove unused import in commands.proto URL: https://github.com/apache/spark/pull/38491 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on a diff in pull request #38490: [SPARK-41009][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1070` to `LOCATION_ALREADY_EXISTS`

2022-11-08 Thread GitBox
cloud-fan commented on code in PR #38490: URL: https://github.com/apache/spark/pull/38490#discussion_r101748 ## core/src/main/resources/error/error-classes.json: ## @@ -668,6 +668,23 @@ } } }, + "LOCATION_ALREADY_EXISTS" : { +"message" : [ + "Cannot

[GitHub] [spark] amaliujia commented on a diff in pull request #38546: [SPARK-41036][CONNECT][PYTHON] `columns` API should use `schema` API to avoid data fetching

2022-11-08 Thread GitBox
amaliujia commented on code in PR #38546: URL: https://github.com/apache/spark/pull/38546#discussion_r1017484449 ## python/pyspark/sql/connect/dataframe.py: ## @@ -139,11 +139,9 @@ def columns(self) -> List[str]: if self._plan is None: return []

[GitHub] [spark] amaliujia commented on a diff in pull request #38546: [SPARK-41036][CONNECT][PYTHON] `columns` API should use `schema` API to avoid data fetching

2022-11-08 Thread GitBox
amaliujia commented on code in PR #38546: URL: https://github.com/apache/spark/pull/38546#discussion_r1017484449 ## python/pyspark/sql/connect/dataframe.py: ## @@ -139,11 +139,9 @@ def columns(self) -> List[str]: if self._plan is None: return []

[GitHub] [spark] LuciferYang commented on a diff in pull request #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-08 Thread GitBox
LuciferYang commented on code in PR #38575: URL: https://github.com/apache/spark/pull/38575#discussion_r1017484528 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -2330,7 +2330,7 @@ class DataFrameSuite extends QueryTest new File(uuid,

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38546: [SPARK-41036][CONNECT][PYTHON] `columns` API should use `schema` API to avoid data fetching

2022-11-08 Thread GitBox
zhengruifeng commented on code in PR #38546: URL: https://github.com/apache/spark/pull/38546#discussion_r1017482919 ## python/pyspark/sql/connect/dataframe.py: ## @@ -139,11 +139,9 @@ def columns(self) -> List[str]: if self._plan is None: return []

[GitHub] [spark] LuciferYang commented on pull request #38091: [SPARK-40096][CORE][TESTS][FOLLOW-UP] Fix flaky test case

2022-11-08 Thread GitBox
LuciferYang commented on PR #38091: URL: https://github.com/apache/spark/pull/38091#issuecomment-1308279804 finally fixed, thanks @wankunde -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng opened a new pull request, #38578: [SPARK-41064][CONNECT][PYTHON] Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`

2022-11-08 Thread GitBox
zhengruifeng opened a new pull request, #38578: URL: https://github.com/apache/spark/pull/38578 ### What changes were proposed in this pull request? Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab` ### Why are the changes needed? for api coverage ###

[GitHub] [spark] LuciferYang commented on pull request #38577: [SPARK-41071][BUILD] Remove `MaxMetaspaceSize` option from `make-distribution.sh` to make it run successfully

2022-11-08 Thread GitBox
LuciferYang commented on PR #38577: URL: https://github.com/apache/spark/pull/38577#issuecomment-1308270119 cc @HyukjinKwon @wangyum @dongjoon-hyun @srowen @pan3793 @panbingkun Can you reproduce the issue? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] LuciferYang opened a new pull request, #38577: [SPARK-41071][BUILD] Remove `MaxMetaspaceSize` option from `make-distribution.sh` to make it run successfully

2022-11-08 Thread GitBox
LuciferYang opened a new pull request, #38577: URL: https://github.com/apache/spark/pull/38577 ### What changes were proposed in this pull request? Run ``` dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver

[GitHub] [spark] panbingkun commented on pull request #38545: [MINOR][DOCS] Fix links in the sql-pyspark-pandas-with-arrow

2022-11-08 Thread GitBox
panbingkun commented on PR #38545: URL: https://github.com/apache/spark/pull/38545#issuecomment-1308245910 cc @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] itholic opened a new pull request, #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-08 Thread GitBox
itholic opened a new pull request, #38576: URL: https://github.com/apache/spark/pull/38576 ### What changes were proposed in this pull request? This PR proposes to rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`. ### Why are the changes needed?

[GitHub] [spark] itholic commented on a diff in pull request #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-08 Thread GitBox
itholic commented on code in PR #38575: URL: https://github.com/apache/spark/pull/38575#discussion_r1017450594 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -2330,7 +2330,7 @@ class DataFrameSuite extends QueryTest new File(uuid,

[GitHub] [spark] itholic commented on a diff in pull request #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-08 Thread GitBox
itholic commented on code in PR #38575: URL: https://github.com/apache/spark/pull/38575#discussion_r1017450594 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -2330,7 +2330,7 @@ class DataFrameSuite extends QueryTest new File(uuid,

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-08 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1017447511 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsInPandasWithStateSuite.scala: ## @@ -240,25 +240,30 @@ class FlatMapGroupsInPandasWithStateSuite

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-08 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1017447320 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala: ## @@ -190,20 +190,25 @@ class StreamingDeduplicationSuite extends

[GitHub] [spark] itholic opened a new pull request, #38575: [WIP][SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-08 Thread GitBox
itholic opened a new pull request, #38575: URL: https://github.com/apache/spark/pull/38575 ### What changes were proposed in this pull request? The original PR to introduce the error class `PATH_NOT_FOUND` was reverted since it breaks the tests in different test env. This

[GitHub] [spark] AmplabJenkins commented on pull request #38535: [SPARK-41001] [CONNECT] Make `user_id` optional in SparkRemoteSession.

2022-11-08 Thread GitBox
AmplabJenkins commented on PR #38535: URL: https://github.com/apache/spark/pull/38535#issuecomment-1308229533 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] gengliangwang commented on pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-08 Thread GitBox
gengliangwang commented on PR #38567: URL: https://github.com/apache/spark/pull/38567#issuecomment-1308227478 > Or part of a larger set of changes Yes, I created Jira https://issues.apache.org/jira/browse/SPARK-41053, and this one is just the beginning. > Maintaining state

[GitHub] [spark] gengliangwang commented on pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-08 Thread GitBox
gengliangwang commented on PR #38567: URL: https://github.com/apache/spark/pull/38567#issuecomment-1308222974 > It seems to not match what is described in the pr description or add equivalent functionality which was reverted I believe it covers the reverted PR

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-08 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1017435393 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala: ## @@ -188,17 +194,26 @@ class UnsupportedOperationsSuite extends

[GitHub] [spark] alex-balikov commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-08 Thread GitBox
alex-balikov commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1017413562 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala: ## @@ -190,20 +190,25 @@ class StreamingDeduplicationSuite extends

[GitHub] [spark] 19Serhii99 opened a new pull request, #38574: [SPARK-41060] [K8S] Made the spark submitter generate new names for driver and executor config maps

2022-11-08 Thread GitBox
19Serhii99 opened a new pull request, #38574: URL: https://github.com/apache/spark/pull/38574 ### What changes were proposed in this pull request? There's a problem with submitting spark jobs to K8s cluster: the library generates and reuses the same name for config maps (for drivers

[GitHub] [spark] mridulm commented on pull request #38064: [SPARK-40622][SQL][CORE]Result of a single task in collect() must fit in 2GB

2022-11-08 Thread GitBox
mridulm commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1308159592 The test failures look like due to the memory requirements for the test 'collect data with single partition larger than 2GB bytes array limit' is too high - causing OOM. -- This is an

[GitHub] [spark] amaliujia opened a new pull request, #38573: [SPARK-41061][CONNECT] Support SelectExpr which applies Projection by expressions in Strings in Connect DSL

2022-11-08 Thread GitBox
amaliujia opened a new pull request, #38573: URL: https://github.com/apache/spark/pull/38573 ### What changes were proposed in this pull request? 1. support `def selectExpr(exprs: String*)` in Connect DSL. 2. Server side supports translation Expressions in Strings.

[GitHub] [spark] itholic opened a new pull request, #38572: [SPARK-41059][SQL] Rename `_LEGACY_ERROR_TEMP_2420` to `NESTED_AGGREGATE_FUNCTION`

2022-11-08 Thread GitBox
itholic opened a new pull request, #38572: URL: https://github.com/apache/spark/pull/38572 ### What changes were proposed in this pull request? This PR proposes to rename `_LEGACY_ERROR_TEMP_2420` to `NESTED_AGGREGATE_FUNCTION` ### Why are the changes needed? We

[GitHub] [spark] ulysses-you commented on a diff in pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache

2022-11-08 Thread GitBox
ulysses-you commented on code in PR #38558: URL: https://github.com/apache/spark/pull/38558#discussion_r1017369252 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -209,6 +209,19 @@ case class AdaptiveSparkPlanExec(

[GitHub] [spark] srielau commented on a diff in pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-08 Thread GitBox
srielau commented on code in PR #38569: URL: https://github.com/apache/spark/pull/38569#discussion_r1017369097 ## core/src/main/resources/error/error-classes.json: ## @@ -469,6 +469,11 @@ "Grouping sets size cannot be greater than " ] }, + "GROUP_BY_AGGREGATE" :

[GitHub] [spark] LuciferYang commented on pull request #38550: [SPARK-41039][BUILD] Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13

2022-11-08 Thread GitBox
LuciferYang commented on PR #38550: URL: https://github.com/apache/spark/pull/38550#issuecomment-1308146770 Thanks @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #34815: [SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend

2022-11-08 Thread GitBox
AngersZh commented on code in PR #34815: URL: https://github.com/apache/spark/pull/34815#discussion_r1017365581 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala: ## @@ -620,4 +620,17 @@ class CliSuite extends SparkFunSuite with

[GitHub] [spark] srowen commented on pull request #38550: [SPARK-41039][BUILD] Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13

2022-11-08 Thread GitBox
srowen commented on PR #38550: URL: https://github.com/apache/spark/pull/38550#issuecomment-1308144543 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen closed pull request #38550: [SPARK-41039][BUILD] Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13

2022-11-08 Thread GitBox
srowen closed pull request #38550: [SPARK-41039][BUILD] Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13 URL: https://github.com/apache/spark/pull/38550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] AngersZhuuuu opened a new pull request, #38571: [SPARK-37555][TEST][FOLLOWUP] Increase timeout of CLI test `spark-sql should pass last unclosed comment to backend`

2022-11-08 Thread GitBox
AngersZh opened a new pull request, #38571: URL: https://github.com/apache/spark/pull/38571 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] cloud-fan commented on a diff in pull request #38557: [SPARK-38959][SQL][FOLLOWUP] Optimizer batch `PartitionPruning` should optimize subqueries

2022-11-08 Thread GitBox
cloud-fan commented on code in PR #38557: URL: https://github.com/apache/spark/pull/38557#discussion_r1017361159 ## sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelOperationRuntimeGroupFiltering.scala: ## @@ -89,10 +88,8 @@ case class

[GitHub] [spark] amaliujia commented on pull request #38491: [SPARK-41058][CONNECT] Remove unused import in commands.proto

2022-11-08 Thread GitBox
amaliujia commented on PR #38491: URL: https://github.com/apache/spark/pull/38491#issuecomment-1308133299 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon opened a new pull request, #38570: [SPARK-41056][R] Fix new R_LIBS_SITE behavior introduced in R 4.2

2022-11-08 Thread GitBox
HyukjinKwon opened a new pull request, #38570: URL: https://github.com/apache/spark/pull/38570 ### What changes were proposed in this pull request? This PR proposes to keep the `R_LIBS_SITE` as was. It has been changed from R 4.2. ### Why are the changes needed? To keep

[GitHub] [spark] itholic commented on a diff in pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-08 Thread GitBox
itholic commented on code in PR #38569: URL: https://github.com/apache/spark/pull/38569#discussion_r1017351734 ## core/src/main/resources/error/error-classes.json: ## @@ -469,6 +469,11 @@ "Grouping sets size cannot be greater than " ] }, + "GROUP_BY_AGGREGATE" :

[GitHub] [spark] itholic commented on a diff in pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-08 Thread GitBox
itholic commented on code in PR #38569: URL: https://github.com/apache/spark/pull/38569#discussion_r1017351734 ## core/src/main/resources/error/error-classes.json: ## @@ -469,6 +469,11 @@ "Grouping sets size cannot be greater than " ] }, + "GROUP_BY_AGGREGATE" :

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38557: [SPARK-38959][SQL][FOLLOWUP] Optimizer batch `PartitionPruning` should optimize subqueries

2022-11-08 Thread GitBox
aokolnychyi commented on code in PR #38557: URL: https://github.com/apache/spark/pull/38557#discussion_r1017352098 ## sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelOperationRuntimeGroupFiltering.scala: ## @@ -89,10 +88,8 @@ case class

[GitHub] [spark] itholic commented on a diff in pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-08 Thread GitBox
itholic commented on code in PR #38569: URL: https://github.com/apache/spark/pull/38569#discussion_r1017351734 ## core/src/main/resources/error/error-classes.json: ## @@ -469,6 +469,11 @@ "Grouping sets size cannot be greater than " ] }, + "GROUP_BY_AGGREGATE" :

[GitHub] [spark] itholic commented on a diff in pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-08 Thread GitBox
itholic commented on code in PR #38569: URL: https://github.com/apache/spark/pull/38569#discussion_r1017351734 ## core/src/main/resources/error/error-classes.json: ## @@ -469,6 +469,11 @@ "Grouping sets size cannot be greater than " ] }, + "GROUP_BY_AGGREGATE" :

[GitHub] [spark] itholic commented on a diff in pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-08 Thread GitBox
itholic commented on code in PR #38569: URL: https://github.com/apache/spark/pull/38569#discussion_r1017351734 ## core/src/main/resources/error/error-classes.json: ## @@ -469,6 +469,11 @@ "Grouping sets size cannot be greater than " ] }, + "GROUP_BY_AGGREGATE" :

[GitHub] [spark] LuciferYang commented on a diff in pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-08 Thread GitBox
LuciferYang commented on code in PR #38569: URL: https://github.com/apache/spark/pull/38569#discussion_r1017347010 ## core/src/main/resources/error/error-classes.json: ## @@ -469,6 +469,11 @@ "Grouping sets size cannot be greater than " ] }, +

[GitHub] [spark] LuciferYang commented on a diff in pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-08 Thread GitBox
LuciferYang commented on code in PR #38569: URL: https://github.com/apache/spark/pull/38569#discussion_r1017346760 ## core/src/main/resources/error/error-classes.json: ## @@ -469,6 +469,11 @@ "Grouping sets size cannot be greater than " ] }, +

[GitHub] [spark] amaliujia commented on pull request #38491: [MINOR][CONNECT] Remove unused import in commands.proto

2022-11-08 Thread GitBox
amaliujia commented on PR #38491: URL: https://github.com/apache/spark/pull/38491#issuecomment-1308120960 @dengziming I have missed this PR because there was no JIRA created under https://issues.apache.org/jira/browse/SPARK-39375 (I was monitoring works happened there). Since you

[GitHub] [spark] mridulm commented on pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-08 Thread GitBox
mridulm commented on PR #38567: URL: https://github.com/apache/spark/pull/38567#issuecomment-1308118224 Also note that we cannot avoid parsing event files at history server with db generated at driver - unless the configs match for both (retained stages, tasks, queries, etc): particularly

[GitHub] [spark] cloud-fan commented on pull request #38491: [MINOR][CONNECT] Remove unused import in commands.proto

2022-11-08 Thread GitBox
cloud-fan commented on PR #38491: URL: https://github.com/apache/spark/pull/38491#issuecomment-1308116015 @amaliujia can you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] mridulm commented on pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-08 Thread GitBox
mridulm commented on PR #38567: URL: https://github.com/apache/spark/pull/38567#issuecomment-1308113372 To comment on proposal in description, based on past prototypes I have worked on/seen: Maintaining state at driver on disk backed store and copying that to dfs has a few things

[GitHub] [spark] itholic commented on pull request #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-08 Thread GitBox
itholic commented on PR #38569: URL: https://github.com/apache/spark/pull/38569#issuecomment-1308111274 cc @MaxGekk @srielau FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic opened a new pull request, #38569: [SPARK-41055][SQL] Rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE`

2022-11-08 Thread GitBox
itholic opened a new pull request, #38569: URL: https://github.com/apache/spark/pull/38569 ### What changes were proposed in this pull request? This PR proposes to rename `_LEGACY_ERROR_TEMP_2424` to `GROUP_BY_AGGREGATE` ### Why are the changes needed? To use proper

[GitHub] [spark] mridulm commented on pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-08 Thread GitBox
mridulm commented on PR #38567: URL: https://github.com/apache/spark/pull/38567#issuecomment-1308103370 This pr is mostly adding ability to use a disk backed store in addition to in memory store. It seems to not match what is described in the pr description - is this wip ? -- This is

[GitHub] [spark] LuciferYang commented on pull request #38507: [SPARK-40372][SQL] Migrate failures of array type checks onto error classes

2022-11-08 Thread GitBox
LuciferYang commented on PR #38507: URL: https://github.com/apache/spark/pull/38507#issuecomment-130810 GA passed, @MaxGekk could you help to review this pr again, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on pull request #38550: [SPARK-41039][BUILD] Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13

2022-11-08 Thread GitBox
LuciferYang commented on PR #38550: URL: https://github.com/apache/spark/pull/38550#issuecomment-1308100945 maven test all UTs with Scala 2.13 and this pr passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] maryannxue commented on a diff in pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache

2022-11-08 Thread GitBox
maryannxue commented on code in PR #38558: URL: https://github.com/apache/spark/pull/38558#discussion_r1017328200 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -209,6 +209,19 @@ case class AdaptiveSparkPlanExec(

[GitHub] [spark] cloud-fan commented on a diff in pull request #38263: [SPARK-40692][SQL] Support data masking built-in function 'mask_hash'

2022-11-08 Thread GitBox
cloud-fan commented on code in PR #38263: URL: https://github.com/apache/spark/pull/38263#discussion_r1017320620 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on a diff in pull request #38557: [SPARK-38959][SQL][FOLLOWUP] Optimizer batch `PartitionPruning` should optimize subqueries

2022-11-08 Thread GitBox
cloud-fan commented on code in PR #38557: URL: https://github.com/apache/spark/pull/38557#discussion_r1017318193 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -320,6 +320,9 @@ abstract class Optimizer(catalogManager:

[GitHub] [spark] bersprockets commented on pull request #38565: [SPARK-41035][SQL] Don't patch foldable children of aggregate functions in `RewriteDistinctAggregates`

2022-11-08 Thread GitBox
bersprockets commented on PR #38565: URL: https://github.com/apache/spark/pull/38565#issuecomment-1308083533 @HyukjinKwon Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #34815: [SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend

2022-11-08 Thread GitBox
cloud-fan commented on code in PR #34815: URL: https://github.com/apache/spark/pull/34815#discussion_r1017317387 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala: ## @@ -620,4 +620,17 @@ class CliSuite extends SparkFunSuite with

[GitHub] [spark] cloud-fan commented on pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2022-11-08 Thread GitBox
cloud-fan commented on PR #38419: URL: https://github.com/apache/spark/pull/38419#issuecomment-1308082603 ceil/floor also takes a second parameter for num of digits to retain. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] viirya commented on a diff in pull request #38557: [SPARK-38959][SQL][FOLLOWUP] Optimizer batch `PartitionPruning` should optimize subqueries

2022-11-08 Thread GitBox
viirya commented on code in PR #38557: URL: https://github.com/apache/spark/pull/38557#discussion_r1017317077 ## sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelOperationRuntimeGroupFiltering.scala: ## @@ -66,7 +65,7 @@ case class

[GitHub] [spark] viirya commented on a diff in pull request #38557: [SPARK-38959][SQL][FOLLOWUP] Optimizer batch `PartitionPruning` should optimize subqueries

2022-11-08 Thread GitBox
viirya commented on code in PR #38557: URL: https://github.com/apache/spark/pull/38557#discussion_r1017316276 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -320,6 +320,9 @@ abstract class Optimizer(catalogManager: CatalogManager)

[GitHub] [spark] HyukjinKwon commented on pull request #38565: [SPARK-41035][SQL] Don't patch foldable children of aggregate functions in `RewriteDistinctAggregates`

2022-11-08 Thread GitBox
HyukjinKwon commented on PR #38565: URL: https://github.com/apache/spark/pull/38565#issuecomment-1308081567 Merged to master, branch-3.3, and branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark-docker] Yikun closed pull request #21: [SPARK-40569][TESTS] Add smoke test in standalone cluster for spark-docker

2022-11-08 Thread GitBox
Yikun closed pull request #21: [SPARK-40569][TESTS] Add smoke test in standalone cluster for spark-docker URL: https://github.com/apache/spark-docker/pull/21 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark-docker] Yikun commented on pull request #21: [SPARK-40569][TESTS] Add smoke test in standalone cluster for spark-docker

2022-11-08 Thread GitBox
Yikun commented on PR #21: URL: https://github.com/apache/spark-docker/pull/21#issuecomment-1308072699 Merge to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark-docker] Yikun commented on pull request #21: [SPARK-40569][TESTS] Add smoke test in standalone cluster for spark-docker

2022-11-08 Thread GitBox
Yikun commented on PR #21: URL: https://github.com/apache/spark-docker/pull/21#issuecomment-1308072327 ``` testing/run_tests.sh --image-url ghcr.io/yikun/spark-docker/spark:python3 --scala-version 2.12 --spark-version 3.3.0 ===> Smoke test for

[GitHub] [spark] AmplabJenkins commented on pull request #38541: [SPARK-41034][CONNECT][PYTHON] Connect DataFrame should require a RemoteSparkSession

2022-11-08 Thread GitBox
AmplabJenkins commented on PR #38541: URL: https://github.com/apache/spark/pull/38541#issuecomment-1308071934 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon closed pull request #38565: [SPARK-41035][SQL] Don't patch foldable children of aggregate functions in `RewriteDistinctAggregates`

2022-11-08 Thread GitBox
HyukjinKwon closed pull request #38565: [SPARK-41035][SQL] Don't patch foldable children of aggregate functions in `RewriteDistinctAggregates` URL: https://github.com/apache/spark/pull/38565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] Narcasserun opened a new pull request, #38568: [SSPARK-41051][CORE] Optimize ProcfsMetrics file acquisition

2022-11-08 Thread GitBox
Narcasserun opened a new pull request, #38568: URL: https://github.com/apache/spark/pull/38568 What changes were proposed in this pull request? Reuse variables from declared procfs files instead of duplicate code Why are the changes needed? The cost of looking up the config is

[GitHub] [spark] cloud-fan commented on a diff in pull request #38557: [SPARK-38959][SQL][FOLLOWUP] Optimizer batch `PartitionPruning` should optimize subqueries

2022-11-08 Thread GitBox
cloud-fan commented on code in PR #38557: URL: https://github.com/apache/spark/pull/38557#discussion_r1017300121 ## sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelOperationRuntimeGroupFiltering.scala: ## @@ -89,10 +88,8 @@ case class

[GitHub] [spark] cloud-fan commented on a diff in pull request #38557: [SPARK-38959][SQL][FOLLOWUP] Optimizer batch `PartitionPruning` should optimize subqueries

2022-11-08 Thread GitBox
cloud-fan commented on code in PR #38557: URL: https://github.com/apache/spark/pull/38557#discussion_r1017300121 ## sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelOperationRuntimeGroupFiltering.scala: ## @@ -89,10 +88,8 @@ case class

[GitHub] [spark] zhengruifeng commented on pull request #38318: [SPARK-40852][CONNECT][PYTHON] Introduce `StatFunction` in proto and implement `DataFrame.summary`

2022-11-08 Thread GitBox
zhengruifeng commented on PR #38318: URL: https://github.com/apache/spark/pull/38318#issuecomment-1308066292 thanks @cloud-fan @HyukjinKwon @amaliujia for reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] ulysses-you commented on a diff in pull request #38558: [SPARK-41048][SQL] Improve output partitioning and ordering with AQE cache

2022-11-08 Thread GitBox
ulysses-you commented on code in PR #38558: URL: https://github.com/apache/spark/pull/38558#discussion_r1017299064 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -209,6 +209,19 @@ case class AdaptiveSparkPlanExec(

[GitHub] [spark] cloud-fan closed pull request #38318: [SPARK-40852][CONNECT][PYTHON] Introduce `StatFunction` in proto and implement `DataFrame.summary`

2022-11-08 Thread GitBox
cloud-fan closed pull request #38318: [SPARK-40852][CONNECT][PYTHON] Introduce `StatFunction` in proto and implement `DataFrame.summary` URL: https://github.com/apache/spark/pull/38318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] gengliangwang closed pull request #38542: Revert "[SPARK-38550][SQL][CORE] Use a disk-based store to save more debug information for live UI"

2022-11-08 Thread GitBox
gengliangwang closed pull request #38542: Revert "[SPARK-38550][SQL][CORE] Use a disk-based store to save more debug information for live UI" URL: https://github.com/apache/spark/pull/38542 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] cloud-fan commented on pull request #38318: [SPARK-40852][CONNECT][PYTHON] Introduce `StatFunction` in proto and implement `DataFrame.summary`

2022-11-08 Thread GitBox
cloud-fan commented on PR #38318: URL: https://github.com/apache/spark/pull/38318#issuecomment-1308063610 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

  1   2   3   >