[GitHub] [spark] MaxGekk closed pull request #38484: [SPARK-40998][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0040` to `INVALID_IDENTIFIER`

2022-11-02 Thread GitBox
MaxGekk closed pull request #38484: [SPARK-40998][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0040` to `INVALID_IDENTIFIER` URL: https://github.com/apache/spark/pull/38484 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk commented on pull request #38484: [SPARK-40998][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0040` to `INVALID_IDENTIFIER`

2022-11-02 Thread GitBox
MaxGekk commented on PR #38484: URL: https://github.com/apache/spark/pull/38484#issuecomment-1301673423 Merging to master. Thank you, @LuciferYang @srielau and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zzzzming95 commented on a diff in pull request #38356: [SPARK-40885] `Sort` may not take effect when it is the last 'Transform' operator

2022-11-02 Thread GitBox
ming95 commented on code in PR #38356: URL: https://github.com/apache/spark/pull/38356#discussion_r1012507610 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala: ## @@ -178,7 +180,15 @@ object V1WritesUtils { } else { // We

[GitHub] [spark] zzzzming95 commented on pull request #38356: [SPARK-40885] `Sort` may not take effect when it is the last 'Transform' operator

2022-11-02 Thread GitBox
ming95 commented on PR #38356: URL: https://github.com/apache/spark/pull/38356#issuecomment-1301649899 > @allisonwang-db can you elaborate on mapping `write.requiredOrdering` to the projected columns via `attrMap` that you introduced in

[GitHub] [spark] cxzl25 commented on pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

2022-11-02 Thread GitBox
cxzl25 commented on PR #38489: URL: https://github.com/apache/spark/pull/38489#issuecomment-1301639710 ## Current ### enable codegen https://user-images.githubusercontent.com/3898450/199650431-d6443f45-03f9-489c-b1de-72c619acb37e.png;> ### disable codegen

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38453: [SPARK-40977][CONNECT][PYTHON] Complete Support for Union in Python client

2022-11-02 Thread GitBox
HyukjinKwon commented on code in PR #38453: URL: https://github.com/apache/spark/pull/38453#discussion_r1012493266 ## python/pyspark/sql/tests/connect/test_connect_plan_only.py: ## @@ -128,6 +128,16 @@ def test_all_the_plans(self): self.assertIsNotNone(plan.root, "Root

[GitHub] [spark] cxzl25 opened a new pull request, #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

2022-11-02 Thread GitBox
cxzl25 opened a new pull request, #38489: URL: https://github.com/apache/spark/pull/38489 ### What changes were proposed in this pull request? BHJ LeftAnti does not update numOutputRows when codegen is disabled ### Why are the changes needed? PR #29104 Only update

[GitHub] [spark] HyukjinKwon closed pull request #38453: [SPARK-40977][CONNECT][PYTHON] Complete Support for Union in Python client

2022-11-02 Thread GitBox
HyukjinKwon closed pull request #38453: [SPARK-40977][CONNECT][PYTHON] Complete Support for Union in Python client URL: https://github.com/apache/spark/pull/38453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #38453: [SPARK-40977][CONNECT][PYTHON] Complete Support for Union in Python client

2022-11-02 Thread GitBox
HyukjinKwon commented on PR #38453: URL: https://github.com/apache/spark/pull/38453#issuecomment-1301637894 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client

2022-11-02 Thread GitBox
HyukjinKwon commented on code in PR #38485: URL: https://github.com/apache/spark/pull/38485#discussion_r1012492270 ## python/pyspark/sql/connect/client.py: ## @@ -42,6 +43,125 @@ logging.basicConfig(level=logging.INFO) +class ChannelBuilder: +""" +This is a helper

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client

2022-11-02 Thread GitBox
HyukjinKwon commented on code in PR #38485: URL: https://github.com/apache/spark/pull/38485#discussion_r1012492460 ## python/pyspark/sql/connect/client.py: ## @@ -42,6 +43,125 @@ logging.basicConfig(level=logging.INFO) +class ChannelBuilder: +""" +This is a helper

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client

2022-11-02 Thread GitBox
HyukjinKwon commented on code in PR #38485: URL: https://github.com/apache/spark/pull/38485#discussion_r1012492270 ## python/pyspark/sql/connect/client.py: ## @@ -42,6 +43,125 @@ logging.basicConfig(level=logging.INFO) +class ChannelBuilder: +""" +This is a helper

[GitHub] [spark] gaoyajun02 commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-02 Thread GitBox
gaoyajun02 commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1012466373 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,15 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-02 Thread GitBox
HyukjinKwon commented on code in PR #38462: URL: https://github.com/apache/spark/pull/38462#discussion_r1012491071 ## python/pyspark/sql/connect/_typing.py: ## @@ -15,5 +15,7 @@ # limitations under the License. # from typing import Union +from datetime import date, time,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-02 Thread GitBox
HyukjinKwon commented on code in PR #38462: URL: https://github.com/apache/spark/pull/38462#discussion_r1012489260 ## python/pyspark/sql/connect/column.py: ## @@ -99,11 +101,59 @@ def to_plan(self, session: Optional["RemoteSparkSession"]) -> "proto.Expression"

[GitHub] [spark] gaoyajun02 commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-02 Thread GitBox
gaoyajun02 commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1012466373 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,15 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] gaoyajun02 commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-02 Thread GitBox
gaoyajun02 commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1012466373 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,15 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] itholic commented on a diff in pull request #38447: [SPARK-40973][SQL] Rename `_LEGACY_ERROR_TEMP_0055` to `UNCLOSED_BRACKETED_COMMENT`

2022-11-02 Thread GitBox
itholic commented on code in PR #38447: URL: https://github.com/apache/spark/pull/38447#discussion_r1012485332 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -608,8 +608,12 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] itholic commented on a diff in pull request #38447: [SPARK-40973][SQL] Rename `_LEGACY_ERROR_TEMP_0055` to `UNCLOSED_BRACKETED_COMMENT`

2022-11-02 Thread GitBox
itholic commented on code in PR #38447: URL: https://github.com/apache/spark/pull/38447#discussion_r1012485332 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -608,8 +608,12 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] itholic commented on a diff in pull request #38447: [SPARK-40973][SQL] Rename `_LEGACY_ERROR_TEMP_0055` to `UNCLOSED_BRACKETED_COMMENT`

2022-11-02 Thread GitBox
itholic commented on code in PR #38447: URL: https://github.com/apache/spark/pull/38447#discussion_r1012485332 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -608,8 +608,12 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] itholic commented on a diff in pull request #38447: [SPARK-40973][SQL] Rename `_LEGACY_ERROR_TEMP_0055` to `UNCLOSED_BRACKETED_COMMENT`

2022-11-02 Thread GitBox
itholic commented on code in PR #38447: URL: https://github.com/apache/spark/pull/38447#discussion_r1012485332 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -608,8 +608,12 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] itholic commented on a diff in pull request #38447: [SPARK-40973][SQL] Rename `_LEGACY_ERROR_TEMP_0055` to `UNCLOSED_BRACKETED_COMMENT`

2022-11-02 Thread GitBox
itholic commented on code in PR #38447: URL: https://github.com/apache/spark/pull/38447#discussion_r1012485332 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -608,8 +608,12 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] amaliujia opened a new pull request, #38488: [SPARK-41002][CONNECT][PYTHON] Compatible `take` and `head` API in Python client

2022-11-02 Thread GitBox
amaliujia opened a new pull request, #38488: URL: https://github.com/apache/spark/pull/38488 ### What changes were proposed in this pull request? 1. Add `take(n)` API. 2. Change `head(n)` API to return `Union[Optional[Row], List[Row]]`. ### Why are the changes

[GitHub] [spark] beliefer commented on pull request #38466: [WIP][SPARK-40986][SQL] Add aggregate to reduce the data size for bloom filter

2022-11-02 Thread GitBox
beliefer commented on PR #38466: URL: https://github.com/apache/spark/pull/38466#issuecomment-1301628080 This PR is only good for q93. `q93 251.88 281.774 29.894 111.87%` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] beliefer closed pull request #38466: [WIP][SPARK-40986][SQL] Add aggregate to reduce the data size for bloom filter

2022-11-02 Thread GitBox
beliefer closed pull request #38466: [WIP][SPARK-40986][SQL] Add aggregate to reduce the data size for bloom filter URL: https://github.com/apache/spark/pull/38466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] amaliujia commented on a diff in pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

2022-11-02 Thread GitBox
amaliujia commented on code in PR #38475: URL: https://github.com/apache/spark/pull/38475#discussion_r1012477099 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -123,6 +125,24 @@ class SparkConnectPlanner(plan:

[GitHub] [spark] gaoyajun02 commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-02 Thread GitBox
gaoyajun02 commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1012468152 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,15 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] gaoyajun02 commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-02 Thread GitBox
gaoyajun02 commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1012466373 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,15 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

2022-11-02 Thread GitBox
zhengruifeng commented on code in PR #38475: URL: https://github.com/apache/spark/pull/38475#discussion_r1012465284 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -123,6 +125,24 @@ class SparkConnectPlanner(plan:

[GitHub] [spark] amaliujia commented on a diff in pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

2022-11-02 Thread GitBox
amaliujia commented on code in PR #38475: URL: https://github.com/apache/spark/pull/38475#discussion_r1012464469 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -123,6 +125,24 @@ class SparkConnectPlanner(plan:

[GitHub] [spark] jzhuge commented on pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-02 Thread GitBox
jzhuge commented on PR #37556: URL: https://github.com/apache/spark/pull/37556#issuecomment-1301598157 Looking at error: ``` SparkThrowableSuite.Error classes are correctly formatted ``` -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] LuciferYang commented on a diff in pull request #38481: [SPARK-40996][BUILD] Upgrade `sbt-checkstyle-plugin` to 4.0.0 to resolve `dev/sbt-checkstyle` run failed with sbt 1.7.3

2022-11-02 Thread GitBox
LuciferYang commented on code in PR #38481: URL: https://github.com/apache/spark/pull/38481#discussion_r1012461316 ## project/build.properties: ## @@ -15,4 +15,4 @@ # limitations under the License. # # Please update the version in appveyor-install-dependencies.ps1 together.

[GitHub] [spark] amaliujia commented on a diff in pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client

2022-11-02 Thread GitBox
amaliujia commented on code in PR #38485: URL: https://github.com/apache/spark/pull/38485#discussion_r1012460587 ## python/pyspark/sql/connect/client.py: ## @@ -42,6 +43,125 @@ logging.basicConfig(level=logging.INFO) +class ChannelBuilder: +""" +This is a helper

[GitHub] [spark] LuciferYang commented on pull request #38465: [SPARK-40985][BUILD] Upgrade RoaringBitmap to 0.9.35

2022-11-02 Thread GitBox
LuciferYang commented on PR #38465: URL: https://github.com/apache/spark/pull/38465#issuecomment-1301588097 Thanks @srowen @HyukjinKwon @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang commented on pull request #38469: [MINOR][BUILD] Correct the `files` contend in `checkstyle-suppressions.xml`

2022-11-02 Thread GitBox
LuciferYang commented on PR #38469: URL: https://github.com/apache/spark/pull/38469#issuecomment-1301587767 Thanks @srowen @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] pan3793 commented on pull request #38483: [SPARK-40997][K8S] K8s resource name prefix should start w/ alphanumeric

2022-11-02 Thread GitBox
pan3793 commented on PR #38483: URL: https://github.com/apache/spark/pull/38483#issuecomment-1301579448 cc @dongjoon-hyun @Yikun, would you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] amaliujia commented on a diff in pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

2022-11-02 Thread GitBox
amaliujia commented on code in PR #38475: URL: https://github.com/apache/spark/pull/38475#discussion_r1012449031 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -250,3 +251,15 @@ message SubqueryAlias { // Optional. Qualifier of the alias.

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [WIP][CONNECT][PYTHON] Arrow-based collect

2022-11-02 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1012441577 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +121,38 @@ class

[GitHub] [spark] HyukjinKwon closed pull request #38487: [SPARK-40995][CONNECT][DOC][FOLLOW-UP] Fix the type in the doc name

2022-11-02 Thread GitBox
HyukjinKwon closed pull request #38487: [SPARK-40995][CONNECT][DOC][FOLLOW-UP] Fix the type in the doc name URL: https://github.com/apache/spark/pull/38487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #38487: [SPARK-40995][CONNECT][DOC][FOLLOW-UP] Fix the type in the doc name

2022-11-02 Thread GitBox
HyukjinKwon commented on PR #38487: URL: https://github.com/apache/spark/pull/38487#issuecomment-1301565811 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on pull request #38487: [SPARK-40995][CONNECT][DOC][FOLLOW-UP] Fix the type in the doc name

2022-11-02 Thread GitBox
amaliujia commented on PR #38487: URL: https://github.com/apache/spark/pull/38487#issuecomment-1301565529 R: @HyukjinKwon cc @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon closed pull request #38472: [SPARK-40989][CONNECT][PYTHON][TESTS] Improve `session.sql` testing coverage in Python client

2022-11-02 Thread GitBox
HyukjinKwon closed pull request #38472: [SPARK-40989][CONNECT][PYTHON][TESTS] Improve `session.sql` testing coverage in Python client URL: https://github.com/apache/spark/pull/38472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] amaliujia opened a new pull request, #38487: [SPARK-40995][CONNECT][DOC][FOLLOW-UP] Fix the type in the doc name

2022-11-02 Thread GitBox
amaliujia opened a new pull request, #38487: URL: https://github.com/apache/spark/pull/38487 ### What changes were proposed in this pull request? Fix the type in the doc filename: `coient` -> `client`. ### Why are the changes needed? Fix typo. ### Does

[GitHub] [spark] HyukjinKwon commented on pull request #38472: [SPARK-40989][CONNECT][PYTHON][TESTS] Improve `session.sql` testing coverage in Python client

2022-11-02 Thread GitBox
HyukjinKwon commented on PR #38472: URL: https://github.com/apache/spark/pull/38472#issuecomment-1301565409 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #28450: [SPARK-31639] Revert SPARK-27528 Use Parquet logical type TIMESTAMP_MICROS by default

2022-11-02 Thread GitBox
cloud-fan commented on PR #28450: URL: https://github.com/apache/spark/pull/28450#issuecomment-1301565132 At that time, the ecosystem does not fully support standard parquet timestamp yet. We can recheck now. If the latest version of popular data systems (Hive, Presto, Flink, etc.) all

[GitHub] [spark] holdenk commented on pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-02 Thread GitBox
holdenk commented on PR #37556: URL: https://github.com/apache/spark/pull/37556#issuecomment-1301564329 pending CI and any other concerns I plan to merge this on Friday. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] zzzzming95 commented on pull request #38358: [SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering

2022-11-02 Thread GitBox
ming95 commented on PR #38358: URL: https://github.com/apache/spark/pull/38358#issuecomment-1301560852 > @kristopherkane 3.1 is EOL unfortunately. > > @ming95 Does this PR fix your problem? I have no problem with this issue. Another similar issue I found

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [WIP][CONNECT][PYTHON] Arrow-based collect

2022-11-02 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1012428719 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +121,38 @@ class

[GitHub] [spark] srowen commented on pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-02 Thread GitBox
srowen commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1301541574 Probably, if it's internal, and being mutable/immutable doesn't matter in the API -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] LuciferYang commented on pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-02 Thread GitBox
LuciferYang commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1301540958 @srowen Does this mean that for similar cases, if it is an internal api, we can explicitly specify `Seq` as `collection.Seq` to avoid unnecessary memory copying for Scala 2.13? There

[GitHub] [spark] zzzzming95 commented on a diff in pull request #38358: [SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering

2022-11-02 Thread GitBox
ming95 commented on code in PR #38358: URL: https://github.com/apache/spark/pull/38358#discussion_r1012409686 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala: ## @@ -187,8 +188,17 @@ object FileFormatWriter extends Logging {

[GitHub] [spark] HyukjinKwon closed pull request #38477: [SPARK-40993][CONNECT][PYTHON][DOCS] Migrate markdown style README to PySpark Development Documentation

2022-11-02 Thread GitBox
HyukjinKwon closed pull request #38477: [SPARK-40993][CONNECT][PYTHON][DOCS] Migrate markdown style README to PySpark Development Documentation URL: https://github.com/apache/spark/pull/38477 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on pull request #38477: [SPARK-40993][CONNECT][PYTHON][DOCS] Migrate markdown style README to PySpark Development Documentation

2022-11-02 Thread GitBox
HyukjinKwon commented on PR #38477: URL: https://github.com/apache/spark/pull/38477#issuecomment-1301524099 Fixed in https://github.com/apache/spark/pull/38470 for now. Per https://github.com/apache/spark/pull/38477#issuecomment-1299705536, let me close this for now. -- This is an

[GitHub] [spark] HyukjinKwon closed pull request #38470: [SPARK-40995] [CONNECT] [DOC] Defining Spark Connect Client Connection String

2022-11-02 Thread GitBox
HyukjinKwon closed pull request #38470: [SPARK-40995] [CONNECT] [DOC] Defining Spark Connect Client Connection String URL: https://github.com/apache/spark/pull/38470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #38470: [SPARK-40995] [CONNECT] [DOC] Defining Spark Connect Client Connection String

2022-11-02 Thread GitBox
HyukjinKwon commented on PR #38470: URL: https://github.com/apache/spark/pull/38470#issuecomment-1301523665 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] kelvinjian-db commented on pull request #38486: [SPARK-41000][SQL] Make CommandResult extend Command trait

2022-11-02 Thread GitBox
kelvinjian-db commented on PR #38486: URL: https://github.com/apache/spark/pull/38486#issuecomment-1301517613 cc @sigmod @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] github-actions[bot] commented on pull request #37163: [SPARK-39750][SQL] Enable `spark.sql.cbo.enabled` by default

2022-11-02 Thread GitBox
github-actions[bot] commented on PR #37163: URL: https://github.com/apache/spark/pull/37163#issuecomment-1301517583 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-11-02 Thread GitBox
github-actions[bot] commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1301517598 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37259: spark-submit: throw an error when duplicate argument is provided

2022-11-02 Thread GitBox
github-actions[bot] closed pull request #37259: spark-submit: throw an error when duplicate argument is provided URL: https://github.com/apache/spark/pull/37259 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] commented on pull request #37265: [SPARK-39850][YARN]Print applicationId once applied from yarn rm

2022-11-02 Thread GitBox
github-actions[bot] commented on PR #37265: URL: https://github.com/apache/spark/pull/37265#issuecomment-1301517562 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes.

2022-11-02 Thread GitBox
SandishKumarHN commented on code in PR #38344: URL: https://github.com/apache/spark/pull/38344#discussion_r1012398593 ## core/src/main/resources/error/error-classes.json: ## @@ -29,12 +44,22 @@ ], "sqlState" : "22007" }, + "CANNOT_LOAD_PROTOBUF_CLASS" : { +

[GitHub] [spark] srielau commented on a diff in pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes.

2022-11-02 Thread GitBox
srielau commented on code in PR #38344: URL: https://github.com/apache/spark/pull/38344#discussion_r1012385639 ## core/src/main/resources/error/error-classes.json: ## @@ -742,6 +832,11 @@ ], "sqlState" : "22023" }, + "SQL_TYPE_TO_PROTOBUF_ENUM_TYPE_ERROR" : {

[GitHub] [spark] AmplabJenkins commented on pull request #38475: [SPARK-40992][CONNECT] Support toDF(columnNames) in Connect DSL

2022-11-02 Thread GitBox
AmplabJenkins commented on PR #38475: URL: https://github.com/apache/spark/pull/38475#issuecomment-1301480681 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38472: [SPARK-40989][CONNECT][PYTHON][TESTS] Improve `session.sql` testing coverage in Python client

2022-11-02 Thread GitBox
AmplabJenkins commented on PR #38472: URL: https://github.com/apache/spark/pull/38472#issuecomment-1301480708 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38477: [SPARK-40993][CONNECT][PYTHON][DOCS] Migrate markdown style README to PySpark Development Documentation

2022-11-02 Thread GitBox
AmplabJenkins commented on PR #38477: URL: https://github.com/apache/spark/pull/38477#issuecomment-1301480649 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38470: [SPARK-40995] [CONNECT] [DOC] Defining Spark Connect Client Connection String

2022-11-02 Thread GitBox
AmplabJenkins commented on PR #38470: URL: https://github.com/apache/spark/pull/38470#issuecomment-1301480736 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-02 Thread GitBox
mridulm commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1301464492 If there are hardware issues which are causing failures - it is better to move the nodes to deny list and prevent them from getting used: we will keep seeing more failures, including for

[GitHub] [spark] zhouyejoe commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-11-02 Thread GitBox
zhouyejoe commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1012349885 ## common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java: ## @@ -413,6 +437,7 @@ public void

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-02 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1012357823 ## core/src/main/scala/org/apache/spark/executor/ShuffleReadMetrics.scala: ## @@ -146,6 +268,16 @@ private[spark] class TempShuffleReadMetrics extends

[GitHub] [spark] mridulm commented on pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-02 Thread GitBox
mridulm commented on PR #38377: URL: https://github.com/apache/spark/pull/38377#issuecomment-1301454290 I am not sure I follow - can you give an example ? Based on what the PR is doing, I would expect both to be equivalent. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] otterc commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-02 Thread GitBox
otterc commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1012336511 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,15 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] otterc commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-02 Thread GitBox
otterc commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1012336511 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,15 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] zhouyejoe commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-02 Thread GitBox
zhouyejoe commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1012326931 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -227,6 +227,16 @@ class TaskMetrics private[spark] () extends Serializable {

[GitHub] [spark] zhouyejoe commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-02 Thread GitBox
zhouyejoe commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1012322498 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -227,6 +227,16 @@ class TaskMetrics private[spark] () extends Serializable {

[GitHub] [spark] kelvinjian-db opened a new pull request, #38486: [SPARK-41000][SQL] Make CommandResult extend Command trait

2022-11-02 Thread GitBox
kelvinjian-db opened a new pull request, #38486: URL: https://github.com/apache/spark/pull/38486 ### What changes were proposed in this pull request? We change `CommandResult` to extend `LeafCommand` (which extends `Command`) instead of `LeafNode`. ### Why are the

[GitHub] [spark] dongjoon-hyun commented on pull request #38433: [SPARK-40943][SQL] Make the MSCK keyword optional in REPAIR TABLE commands

2022-11-02 Thread GitBox
dongjoon-hyun commented on PR #38433: URL: https://github.com/apache/spark/pull/38433#issuecomment-1301158936 Thank you so much, @ben-zhang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ben-zhang commented on pull request #38433: [SPARK-40943][SQL] Make the MSCK keyword optional in REPAIR TABLE commands

2022-11-02 Thread GitBox
ben-zhang commented on PR #38433: URL: https://github.com/apache/spark/pull/38433#issuecomment-1301158208 @dongjoon-hyun , sounds good. I will get back to you when the Databricks docs are updated. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] grundprinzip opened a new pull request, #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client

2022-11-02 Thread GitBox
grundprinzip opened a new pull request, #38485: URL: https://github.com/apache/spark/pull/38485 ### What changes were proposed in this pull request? This PR implements the connection string for Spark Connect clients according to the documentation. ### Why are the changes

[GitHub] [spark] HeartSaVioR commented on pull request #38430: [SPARK-40957] Add in memory cache in HDFSMetadataLog

2022-11-02 Thread GitBox
HeartSaVioR commented on PR #38430: URL: https://github.com/apache/spark/pull/38430#issuecomment-1301097792 @jerrypeng Could you please address post-review comments as followup PR? You don't need a new JIRA ticket. (Sigh, I forgot to check. Please make sure your PR title has a

[GitHub] [spark] clee704 commented on pull request #28450: [SPARK-31639] Revert SPARK-27528 Use Parquet logical type TIMESTAMP_MICROS by default

2022-11-02 Thread GitBox
clee704 commented on PR #28450: URL: https://github.com/apache/spark/pull/28450#issuecomment-1301097763 Can someone explain why we reverted to INT96? I read https://issues.apache.org/jira/browse/SPARK-31085 but want to know how the discussion happened. To me the cost of breaking the API

[GitHub] [spark] MaxGekk commented on pull request #38484: [SPARK-40998][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0040` to `INVALID_IDENTIFIER`

2022-11-02 Thread GitBox
MaxGekk commented on PR #38484: URL: https://github.com/apache/spark/pull/38484#issuecomment-1301046405 @srielau @itholic @panbingkun @LuciferYang @cloud-fan Could you review this PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] swamirishi commented on pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-02 Thread GitBox
swamirishi commented on PR #38377: URL: https://github.com/apache/spark/pull/38377#issuecomment-1301028102 > Makes sense ... why not simply `val dfsLogFile = new Path(rootDir, appId + DRIVER_LOG_FILE_SUFFIX)` instead btw ? If we do necessarily need the fully qualified path, we can use

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-02 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1012129510 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] AmplabJenkins commented on pull request #38482: [WIP][SPARK-40749][SQL] Migrate type check failures of generators onto error classes

2022-11-02 Thread GitBox
AmplabJenkins commented on PR #38482: URL: https://github.com/apache/spark/pull/38482#issuecomment-1300983786 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38483: [SPARK-40997][K8S] K8s resource name prefix should start w/ alphanumeric

2022-11-02 Thread GitBox
AmplabJenkins commented on PR #38483: URL: https://github.com/apache/spark/pull/38483#issuecomment-1300983649 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] srowen commented on pull request #38465: [SPARK-40985][BUILD] Upgrade RoaringBitmap to 0.9.35

2022-11-02 Thread GitBox
srowen commented on PR #38465: URL: https://github.com/apache/spark/pull/38465#issuecomment-1300728818 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen closed pull request #38465: [SPARK-40985][BUILD] Upgrade RoaringBitmap to 0.9.35

2022-11-02 Thread GitBox
srowen closed pull request #38465: [SPARK-40985][BUILD] Upgrade RoaringBitmap to 0.9.35 URL: https://github.com/apache/spark/pull/38465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srowen commented on pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-02 Thread GitBox
srowen commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1300679989 The latter sounds better right? is there any downside? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] eejbyfeldt commented on pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-02 Thread GitBox
eejbyfeldt commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1300650411 > OK, I think we can't accept that much perf degradation. If there's a simple way to refactor the code to make both faster, that seems OK. Ideally we avoid separate code branches for

[GitHub] [spark] srowen commented on pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-02 Thread GitBox
srowen commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1300620236 OK, I think we can't accept that much perf degradation. If there's a simple way to refactor the code to make both faster, that seems OK. Ideally we avoid separate code branches for 2.12

[GitHub] [spark] LuciferYang commented on pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-02 Thread GitBox
LuciferYang commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1300602808 > What's the current state here - this change is still much slower on 2.12? @srowen ``` The overall trend is consistent with local tests. Using toIndexedSeq improves the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38430: [SPARK-40957] Add in memory cache in HDFSMetadataLog

2022-11-02 Thread GitBox
dongjoon-hyun commented on code in PR #38430: URL: https://github.com/apache/spark/pull/38430#discussion_r1011874757 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala: ## @@ -277,10 +295,34 @@ class HDFSMetadataLog[T <: AnyRef :

[GitHub] [spark] srowen commented on pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-02 Thread GitBox
srowen commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1300578770 What's the current state here - this change is still much slower on 2.12? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38430: [SPARK-40957] Add in memory cache in HDFSMetadataLog

2022-11-02 Thread GitBox
dongjoon-hyun commented on code in PR #38430: URL: https://github.com/apache/spark/pull/38430#discussion_r1011874757 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala: ## @@ -277,10 +295,34 @@ class HDFSMetadataLog[T <: AnyRef :

[GitHub] [spark] MaxGekk opened a new pull request, #38484: [WIP][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0040` to `INVALID_IDENTIFIER`

2022-11-02 Thread GitBox
MaxGekk opened a new pull request, #38484: URL: https://github.com/apache/spark/pull/38484 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38430: [SPARK-40957] Add in memory cache in HDFSMetadataLog

2022-11-02 Thread GitBox
dongjoon-hyun commented on code in PR #38430: URL: https://github.com/apache/spark/pull/38430#discussion_r1011869341 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala: ## @@ -168,7 +191,13 @@ class HDFSMetadataLog[T <: AnyRef :

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38430: [SPARK-40957] Add in memory cache in HDFSMetadataLog

2022-11-02 Thread GitBox
dongjoon-hyun commented on code in PR #38430: URL: https://github.com/apache/spark/pull/38430#discussion_r1011867576 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala: ## @@ -64,6 +67,17 @@ class HDFSMetadataLog[T <: AnyRef :

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38430: [SPARK-40957] Add in memory cache in HDFSMetadataLog

2022-11-02 Thread GitBox
dongjoon-hyun commented on code in PR #38430: URL: https://github.com/apache/spark/pull/38430#discussion_r1011867108 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2007,6 +2007,14 @@ object SQLConf { .booleanConf

[GitHub] [spark] LuciferYang commented on pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-02 Thread GitBox
LuciferYang commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1300477025 Move forward? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaksGS09 commented on pull request #37206: [SPARK-39696][CORE] Ensure Concurrent r/w `TaskMetrics` not throw Exception

2022-11-02 Thread GitBox
MaksGS09 commented on PR #37206: URL: https://github.com/apache/spark/pull/37206#issuecomment-1300440849 Hi! Any updates on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

  1   2   >