[GitHub] [spark] LuciferYang commented on pull request #40283: [SPARK-42673][BUILD] Make `build/mvn` build Spark only with the verified maven version

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40283: URL: https://github.com/apache/spark/pull/40283#issuecomment-1455648868 > https://issues.apache.org/jira/browse/MNG-7697 OK, let me test 3.9.1-SNAPSHOT later. @pan3793 Do you have any other issues besides those in GA task? -- This is an

[GitHub] [spark] hboutemy commented on pull request #40283: [SPARK-42673][BUILD] Make `build/mvn` build Spark only with the verified maven version

2023-03-05 Thread via GitHub
hboutemy commented on PR #40283: URL: https://github.com/apache/spark/pull/40283#issuecomment-1455637895 [@cstamas ](https://github.com/cstamas) do you know if the lax parsing covers that `org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml decl of ISO-8859-1 is

[GitHub] [spark] hboutemy commented on pull request #40283: [SPARK-42673][BUILD] Make `build/mvn` build Spark only with the verified maven version

2023-03-05 Thread via GitHub
hboutemy commented on PR #40283: URL: https://github.com/apache/spark/pull/40283#issuecomment-1455633233 there is a known issue in Maven 3.9.0 (related to plexus-utils XML stricter reading https://github.com/codehaus-plexus/plexus-utils/issues/238 ) that is fixed in 3.9.1-SNAPSHOT:

[GitHub] [spark] EnricoMi commented on pull request #38358: [SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering

2023-03-05 Thread via GitHub
EnricoMi commented on PR #38358: URL: https://github.com/apache/spark/pull/38358#issuecomment-1455620898 Yes, it looks like it removes the **empty** table location after **overwriting** the table failed due to the `ArithmeticException`. @cloud-fan do you consider the removal of an

[GitHub] [spark] itholic commented on pull request #40280: [SPARK-42671][CONNECT] Fix bug for createDataFrame from complex type schema

2023-03-05 Thread via GitHub
itholic commented on PR #40280: URL: https://github.com/apache/spark/pull/40280#issuecomment-1455567210 Thanks @panbingkun for the nice fix! Btw, think I found another `createDataFrame` bug which is not working properly with non-nullable schema as below: ```python >>> from

[GitHub] [spark] itholic commented on pull request #40280: [SPARK-42671][CONNECT] Fix bug for createDataFrame from complex type schema

2023-03-05 Thread via GitHub
itholic commented on PR #40280: URL: https://github.com/apache/spark/pull/40280#issuecomment-1455565581 Thanks @panbingkun for the nice fix! Btw, think I found another `createDataFrame` bug which is not working properly with non-nullable schema as below: ```python >>> from

[GitHub] [spark] HeartSaVioR closed pull request #40292: [SPARK-42676][SS] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

2023-03-05 Thread via GitHub
HeartSaVioR closed pull request #40292: [SPARK-42676][SS] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently URL: https://github.com/apache/spark/pull/40292 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HeartSaVioR commented on pull request #40292: [SPARK-42676][SS] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

2023-03-05 Thread via GitHub
HeartSaVioR commented on PR #40292: URL: https://github.com/apache/spark/pull/40292#issuecomment-1455549225 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wangyum opened a new pull request, #40294: [SPARK-40610][SQL] Support unwrap date type to string type

2023-03-05 Thread via GitHub
wangyum opened a new pull request, #40294: URL: https://github.com/apache/spark/pull/40294 ### What changes were proposed in this pull request? This PR enhances `UnwrapCastInBinaryComparison` to support unwrapping date type to string type. ### Why are the changes needed?

[GitHub] [spark] LuciferYang commented on a diff in pull request #40291: [WIP][SPARK-42578][CONNECT] Add JDBC to DataFrameWriter

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40291: URL: https://github.com/apache/spark/pull/40291#discussion_r1125957292 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: ## @@ -345,6 +345,37 @@ final class DataFrameWriter[T] private[sql] (ds:

[GitHub] [spark] LuciferYang commented on pull request #40283: [SPARK-42673][BUILD] Make `build/mvn` build Spark only with the verified maven version

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40283: URL: https://github.com/apache/spark/pull/40283#issuecomment-1455497586 also cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on a diff in pull request #40283: [SPARK-42673][BUILD] Make `build/mvn` build Spark only with the verified maven version

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40283: URL: https://github.com/apache/spark/pull/40283#discussion_r1125949503 ## build/mvn: ## @@ -119,7 +119,8 @@ install_mvn() { if [ "$MVN_BIN" ]; then local MVN_DETECTED_VERSION="$(mvn --version | head -n1 | awk '{print $3}')"

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40244: [WIP][SPARK-42643][CONNECT][PYTHON] Implement `spark.udf.registerJavaFunction`

2023-03-05 Thread via GitHub
xinrong-meng commented on code in PR #40244: URL: https://github.com/apache/spark/pull/40244#discussion_r1125939747 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -303,14 +303,15 @@ message Expression { message

[GitHub] [spark] LuciferYang commented on a diff in pull request #40283: [SPARK-42673][BUILD] Ban Maven 3.9.x for Spark build

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40283: URL: https://github.com/apache/spark/pull/40283#discussion_r1125934336 ## build/mvn: ## @@ -119,7 +119,8 @@ install_mvn() { if [ "$MVN_BIN" ]; then local MVN_DETECTED_VERSION="$(mvn --version | head -n1 | awk '{print $3}')"

[GitHub] [spark] pan3793 commented on a diff in pull request #40283: [SPARK-42673][BUILD] Ban Maven 3.9.x for Spark build

2023-03-05 Thread via GitHub
pan3793 commented on code in PR #40283: URL: https://github.com/apache/spark/pull/40283#discussion_r1125930947 ## build/mvn: ## @@ -119,7 +119,8 @@ install_mvn() { if [ "$MVN_BIN" ]; then local MVN_DETECTED_VERSION="$(mvn --version | head -n1 | awk '{print $3}')" fi

[GitHub] [spark] LuciferYang commented on a diff in pull request #40283: [SPARK-42673][BUILD] Ban Maven 3.9.x for Spark build

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40283: URL: https://github.com/apache/spark/pull/40283#discussion_r1125929900 ## build/mvn: ## @@ -119,7 +119,8 @@ install_mvn() { if [ "$MVN_BIN" ]; then local MVN_DETECTED_VERSION="$(mvn --version | head -n1 | awk '{print $3}')"

[GitHub] [spark] pan3793 commented on a diff in pull request #40283: [SPARK-42673][BUILD] Ban Maven 3.9.x for Spark build

2023-03-05 Thread via GitHub
pan3793 commented on code in PR #40283: URL: https://github.com/apache/spark/pull/40283#discussion_r1125927078 ## build/mvn: ## @@ -119,7 +119,8 @@ install_mvn() { if [ "$MVN_BIN" ]; then local MVN_DETECTED_VERSION="$(mvn --version | head -n1 | awk '{print $3}')" fi

[GitHub] [spark] zhengruifeng commented on pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect

2023-03-05 Thread via GitHub
zhengruifeng commented on PR #40228: URL: https://github.com/apache/spark/pull/40228#issuecomment-1455466444 merged into master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng closed pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect

2023-03-05 Thread via GitHub
zhengruifeng closed pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect URL: https://github.com/apache/spark/pull/40228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] hvanhovell commented on pull request #40291: [WIP][SPARK-42578][CONNECT] Add JDBC to DataFrameWriter

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40291: URL: https://github.com/apache/spark/pull/40291#issuecomment-1455425240 hmmm - let me think about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer opened a new pull request, #40293: [SPARK-42677][SQL] Fix the invalid tests for broadcast hint

2023-03-05 Thread via GitHub
beliefer opened a new pull request, #40293: URL: https://github.com/apache/spark/pull/40293 ### What changes were proposed in this pull request? Currently, there are a lot of test cases for broadcast hint is invalid. Because the data size is smaller than broadcast threshold.

[GitHub] [spark] anishshri-db commented on pull request #40292: [SPARK-42676] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

2023-03-05 Thread via GitHub
anishshri-db commented on PR #40292: URL: https://github.com/apache/spark/pull/40292#issuecomment-1455397903 @HeartSaVioR - please take a look. Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] anishshri-db opened a new pull request, #40292: [SPARK-42676] Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently

2023-03-05 Thread via GitHub
anishshri-db opened a new pull request, #40292: URL: https://github.com/apache/spark/pull/40292 ### What changes were proposed in this pull request? Write temp checkpoints for streaming queries to local filesystem even if default FS is set differently ### Why are the changes

[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-05 Thread via GitHub
beliefer commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1455392063 > I guess we will need to rewrite the lamda function in spark connect planner. Yeah. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-05 Thread via GitHub
beliefer commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1455390728 ![image](https://user-images.githubusercontent.com/8486025/223014232-bf9b26ee-d0e8-4de4-a8fe-2d252813ac4d.png) -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] beliefer commented on a diff in pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

2023-03-05 Thread via GitHub
beliefer commented on code in PR #40277: URL: https://github.com/apache/spark/pull/40277#discussion_r1125854126 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -140,6 +141,21 @@ message Read { // (Optional) A list of path for file-system

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125854357 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -189,6 +190,11 @@ message Expression { int32 days = 2; int64

[GitHub] [spark] zhengruifeng commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-05 Thread via GitHub
zhengruifeng commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1455388960 I guess we will need to rewrite the lamda function in spark connect planner. cc @ueshin as well, since existing implementation follows the fix in

[GitHub] [spark] huangxiaopingRD closed pull request #40196: [SPARK-42603][SQL] Set spark.sql.legacy.createHiveTableByDefault to false.

2023-03-05 Thread via GitHub
huangxiaopingRD closed pull request #40196: [SPARK-42603][SQL] Set spark.sql.legacy.createHiveTableByDefault to false. URL: https://github.com/apache/spark/pull/40196 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125852404 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125852404 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125852404 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache

[GitHub] [spark] beliefer commented on pull request #40291: [WIP][SPARK-42578][CONNECT] Add JDBC to DataFrameWriter

2023-03-05 Thread via GitHub
beliefer commented on PR #40291: URL: https://github.com/apache/spark/pull/40291#issuecomment-1455384866 @hvanhovell It seems that add test cases no way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-05 Thread via GitHub
beliefer commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1455384317 @hvanhovell After my test, `python/run-tests --testnames 'pyspark.sql.connect.dataframe'` will not passed. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] beliefer opened a new pull request, #40291: [WIP][SPARK-42578][CONNECT] Add JDBC to DataFrameWriter

2023-03-05 Thread via GitHub
beliefer opened a new pull request, #40291: URL: https://github.com/apache/spark/pull/40291 ### What changes were proposed in this pull request? Currently, the connect project have the new `DataFrameWriter` API which is corresponding to Spark `DataFrameWriter` API. But the connect's

[GitHub] [spark] Yikf commented on pull request #40290: [SPARK-42478][SQL][3.3] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-05 Thread via GitHub
Yikf commented on PR #40290: URL: https://github.com/apache/spark/pull/40290#issuecomment-1455380079 cc @cloud-fan @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] Yikf commented on pull request #40289: [SPARK-42478][SQL][3.2] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-05 Thread via GitHub
Yikf commented on PR #40289: URL: https://github.com/apache/spark/pull/40289#issuecomment-1455379959 cc @cloud-fan @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] Yikf opened a new pull request, #40290: [SPARK-42478][SQL][3.3] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-05 Thread via GitHub
Yikf opened a new pull request, #40290: URL: https://github.com/apache/spark/pull/40290 This is a backport of https://github.com/apache/spark/pull/40064 for branch-3.3 ### What changes were proposed in this pull request? Make a serializable jobTrackerId instead of a

[GitHub] [spark] Yikf opened a new pull request, #40289: [SPARK-42478][SQL][3.2] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-05 Thread via GitHub
Yikf opened a new pull request, #40289: URL: https://github.com/apache/spark/pull/40289 This is a backport of https://github.com/apache/spark/pull/40064 ### What changes were proposed in this pull request? Make a serializable jobTrackerId instead of a non-serializable JobID

[GitHub] [spark] wangyum commented on pull request #38358: [SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering

2023-03-05 Thread via GitHub
wangyum commented on PR #38358: URL: https://github.com/apache/spark/pull/38358#issuecomment-1455371977 @EnricoMi It seems it will remove the table location if a `java.lang.ArithmeticException` is thrown after this change. How to reproduce: ```scala import

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125837371 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache

[GitHub] [spark] anishshri-db commented on pull request #40273: [SPARK-42668][SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-05 Thread via GitHub
anishshri-db commented on PR #40273: URL: https://github.com/apache/spark/pull/40273#issuecomment-1455371384 > Mind retriggering the build, please? Probably simplest way to do is pushing an empty commit. You can retrigger the build in your fork but it won't be reflected here. Sure

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125837371 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache

[GitHub] [spark] hvanhovell commented on a diff in pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

2023-03-05 Thread via GitHub
hvanhovell commented on code in PR #40277: URL: https://github.com/apache/spark/pull/40277#discussion_r1125835789 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -140,6 +141,21 @@ message Read { // (Optional) A list of path for

[GitHub] [spark] hvanhovell commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1455366786 @HyukjinKwon @zhengruifeng the rationale for this change is that analyzer takes care of making lambda variables unique. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] Yikf commented on pull request #40064: [SPARK-42478] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-05 Thread via GitHub
Yikf commented on PR #40064: URL: https://github.com/apache/spark/pull/40064#issuecomment-1455364691 > @Yikf can you help to open a backport PR for 3.2/3.3? Thanks! Sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] beliefer commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-05 Thread via GitHub
beliefer commented on PR #39091: URL: https://github.com/apache/spark/pull/39091#issuecomment-1455360592 @hvanhovell @grundprinzip @HyukjinKwon @zhengruifeng @amaliujia Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] amaliujia commented on pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect

2023-03-05 Thread via GitHub
amaliujia commented on PR #40228: URL: https://github.com/apache/spark/pull/40228#issuecomment-1455359011 @hvanhovell waiting for CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
beliefer commented on PR #40275: URL: https://github.com/apache/spark/pull/40275#issuecomment-1455357573 @hvanhovell @LuciferYang Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell commented on pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40228: URL: https://github.com/apache/spark/pull/40228#issuecomment-1455352755 @amaliujia can you update the PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell commented on a diff in pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions

2023-03-05 Thread via GitHub
hvanhovell commented on code in PR #40217: URL: https://github.com/apache/spark/pull/40217#discussion_r1125825287 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/DataFrameNaFunctionSuite.scala: ## @@ -0,0 +1,377 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] hvanhovell commented on pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40217: URL: https://github.com/apache/spark/pull/40217#issuecomment-1455351159 @panbingkun can you update the CompatibilitySuite? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40254: URL: https://github.com/apache/spark/pull/40254#issuecomment-1455349598 Thanks @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] itholic commented on pull request #40288: [SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page.

2023-03-05 Thread via GitHub
itholic commented on PR #40288: URL: https://github.com/apache/spark/pull/40288#issuecomment-1455348864 cc @tgravescs since this is a Spark Connect introduction including note about built in authentication you mentioned in JIRA ticket before. -- This is an automated message from the

[GitHub] [spark] hvanhovell commented on pull request #40217: [SPARK-42559][CONNECT] Implement DataFrameNaFunctions

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40217: URL: https://github.com/apache/spark/pull/40217#issuecomment-1455348717 @panbingkun can you update your PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
hvanhovell commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125820525 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache

[GitHub] [spark] hvanhovell commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-05 Thread via GitHub
hvanhovell commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125817796 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -130,4 +138,61 @@ object

[GitHub] [spark] itholic opened a new pull request, #40288: [SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page.

2023-03-05 Thread via GitHub
itholic opened a new pull request, #40288: URL: https://github.com/apache/spark/pull/40288 ### What changes were proposed in this pull request? This PR proposes to add a brief description of Spark Connect to the PySpark main page.

[GitHub] [spark] hvanhovell commented on a diff in pull request #40270: [SPARK-42662][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-05 Thread via GitHub
hvanhovell commented on code in PR #40270: URL: https://github.com/apache/spark/pull/40270#discussion_r1125815690 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -781,3 +782,10 @@ message FrameMap { CommonInlineUserDefinedFunction func = 2;

[GitHub] [spark] srowen commented on pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17

2023-03-05 Thread via GitHub
srowen commented on PR #40254: URL: https://github.com/apache/spark/pull/40254#issuecomment-1455341403 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen closed pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17

2023-03-05 Thread via GitHub
srowen closed pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17 URL: https://github.com/apache/spark/pull/40254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #40064: [SPARK-42478] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-05 Thread via GitHub
cloud-fan commented on PR #40064: URL: https://github.com/apache/spark/pull/40064#issuecomment-1455335925 @Yikf can you help to open a backport PR for 3.2/3.3? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] LuciferYang commented on pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40254: URL: https://github.com/apache/spark/pull/40254#issuecomment-1455328473 friendly ping @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell closed pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-05 Thread via GitHub
hvanhovell closed pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe` URL: https://github.com/apache/spark/pull/39091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] hvanhovell commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-05 Thread via GitHub
hvanhovell commented on PR #39091: URL: https://github.com/apache/spark/pull/39091#issuecomment-1455327845 Merging to master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #40285: [SPARK-42675][CONNECT][TESTS] Drop temp view after test `test temp view`

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40285: URL: https://github.com/apache/spark/pull/40285#issuecomment-1455325164 Thanks @wangyum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on pull request #40255: [SPARK-42558][CONNECT] Implement `DataFrameStatFunctions` except `bloomFilter` functions

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40255: URL: https://github.com/apache/spark/pull/40255#issuecomment-1455324716 Thanks @hvanhovell @HyukjinKwon @zhengruifeng @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] hvanhovell closed pull request #40255: [SPARK-42558][CONNECT] Implement `DataFrameStatFunctions` except `bloomFilter` functions

2023-03-05 Thread via GitHub
hvanhovell closed pull request #40255: [SPARK-42558][CONNECT] Implement `DataFrameStatFunctions` except `bloomFilter` functions URL: https://github.com/apache/spark/pull/40255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] hvanhovell commented on pull request #40255: [SPARK-42558][CONNECT] Implement `DataFrameStatFunctions` except `bloomFilter` functions

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40255: URL: https://github.com/apache/spark/pull/40255#issuecomment-1455323028 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] hvanhovell closed pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
hvanhovell closed pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions URL: https://github.com/apache/spark/pull/40275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
hvanhovell commented on PR #40275: URL: https://github.com/apache/spark/pull/40275#issuecomment-1455321694 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] hvanhovell closed pull request #40279: [MINOR][CONNECT] Remove unused protobuf imports to eliminate build warnings

2023-03-05 Thread via GitHub
hvanhovell closed pull request #40279: [MINOR][CONNECT] Remove unused protobuf imports to eliminate build warnings URL: https://github.com/apache/spark/pull/40279 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] hvanhovell closed pull request #40280: [SPARK-42671][CONNECT] Fix bug for createDataFrame from complex type schema

2023-03-05 Thread via GitHub
hvanhovell closed pull request #40280: [SPARK-42671][CONNECT] Fix bug for createDataFrame from complex type schema URL: https://github.com/apache/spark/pull/40280 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] hvanhovell commented on a diff in pull request #40280: [SPARK-42671][CONNECT] Fix bug for createDataFrame from complex type schema

2023-03-05 Thread via GitHub
hvanhovell commented on code in PR #40280: URL: https://github.com/apache/spark/pull/40280#discussion_r1125800378 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -115,7 +115,7 @@ class SparkSession private[sql] ( private def

[GitHub] [spark] mridulm commented on a diff in pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-05 Thread via GitHub
mridulm commented on code in PR #40286: URL: https://github.com/apache/spark/pull/40286#discussion_r1125790750 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -232,6 +232,13 @@ private[spark] class DAGScheduler(

[GitHub] [spark] mridulm commented on a diff in pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-05 Thread via GitHub
mridulm commented on code in PR #40286: URL: https://github.com/apache/spark/pull/40286#discussion_r1125790378 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2479,4 +2479,14 @@ package object config { .version("3.4.0") .booleanConf

[GitHub] [spark] mridulm commented on a diff in pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-05 Thread via GitHub
mridulm commented on code in PR #40286: URL: https://github.com/apache/spark/pull/40286#discussion_r1125790378 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2479,4 +2479,14 @@ package object config { .version("3.4.0") .booleanConf

[GitHub] [spark] beliefer opened a new pull request, #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names

2023-03-05 Thread via GitHub
beliefer opened a new pull request, #40287: URL: https://github.com/apache/spark/pull/40287 ### What changes were proposed in this pull request? UnresolvedNamedLambdaVariable do not need unique names in python. We already did this for the scala client, and it is good to have parity

[GitHub] [spark] mridulm commented on a diff in pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-05 Thread via GitHub
mridulm commented on code in PR #40286: URL: https://github.com/apache/spark/pull/40286#discussion_r1125790378 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2479,4 +2479,14 @@ package object config { .version("3.4.0") .booleanConf

[GitHub] [spark] ulysses-you commented on pull request #40262: [SPARK-42651][SQL] Optimize global sort to driver sort

2023-03-05 Thread via GitHub
ulysses-you commented on PR #40262: URL: https://github.com/apache/spark/pull/40262#issuecomment-1455303198 cc @cloud-fan @viirya thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-05 Thread via GitHub
beliefer commented on code in PR #39091: URL: https://github.com/apache/spark/pull/39091#discussion_r1125777299 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -338,6 +340,22 @@ class SparkConnectPlanner(session:

[GitHub] [spark] beliefer commented on pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-05 Thread via GitHub
beliefer commented on PR #40275: URL: https://github.com/apache/spark/pull/40275#issuecomment-1455280706 ping @HyukjinKwon @zhengruifeng @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] beliefer commented on pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

2023-03-05 Thread via GitHub
beliefer commented on PR #40277: URL: https://github.com/apache/spark/pull/40277#issuecomment-1455280396 ping @hvanhovell @HyukjinKwon @dongjoon-hyun cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] beliefer commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-05 Thread via GitHub
beliefer commented on PR #39091: URL: https://github.com/apache/spark/pull/39091#issuecomment-1455279364 > @beliefer can you please remove the is_observation code path? And take another look at the protocol. Otherwise I think it looks good. is_observation code path has been removed.

[GitHub] [spark] itholic commented on pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-05 Thread via GitHub
itholic commented on PR #40271: URL: https://github.com/apache/spark/pull/40271#issuecomment-1455275958 Looks good otherwise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-05 Thread via GitHub
itholic commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1125771590 ## python/pyspark/sql/tests/test_functions.py: ## @@ -1268,6 +1268,12 @@ def test_bucket(self): message_parameters={"arg_name": "numBuckets", "arg_type":

[GitHub] [spark] itholic commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-05 Thread via GitHub
itholic commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1125771590 ## python/pyspark/sql/tests/test_functions.py: ## @@ -1268,6 +1268,12 @@ def test_bucket(self): message_parameters={"arg_name": "numBuckets", "arg_type":

[GitHub] [spark] HyukjinKwon closed pull request #40281: [SPARK-41497][CORE][Follow UP]Modify config `spark.rdd.cache.visibilityTracking.enabled` support version to 3.5.0

2023-03-05 Thread via GitHub
HyukjinKwon closed pull request #40281: [SPARK-41497][CORE][Follow UP]Modify config `spark.rdd.cache.visibilityTracking.enabled` support version to 3.5.0 URL: https://github.com/apache/spark/pull/40281 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-05 Thread via GitHub
HyukjinKwon commented on PR #40282: URL: https://github.com/apache/spark/pull/40282#issuecomment-1455270795 cc @MaxGekk and @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #40284: [SPARK-42674][BUILD] Upgrade scalafmt from 3.7.1 to 3.7.2

2023-03-05 Thread via GitHub
HyukjinKwon closed pull request #40284: [SPARK-42674][BUILD] Upgrade scalafmt from 3.7.1 to 3.7.2 URL: https://github.com/apache/spark/pull/40284 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #40284: [SPARK-42674][BUILD] Upgrade scalafmt from 3.7.1 to 3.7.2

2023-03-05 Thread via GitHub
HyukjinKwon commented on PR #40284: URL: https://github.com/apache/spark/pull/40284#issuecomment-1455270404 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] github-actions[bot] closed pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions

2023-03-05 Thread via GitHub
github-actions[bot] closed pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions URL: https://github.com/apache/spark/pull/36265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] github-actions[bot] commented on pull request #38736: [SPARK-41214][SQL] - SQL Metrics are missing from Spark UI when AQE for Cached DataFrame is enabled

2023-03-05 Thread via GitHub
github-actions[bot] commented on PR #38736: URL: https://github.com/apache/spark/pull/38736#issuecomment-1455262719 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] wangyum commented on pull request #40285: [SPARK-42675][CONNECT][TESTS] Drop temp view after test `test temp view`

2023-03-05 Thread via GitHub
wangyum commented on PR #40285: URL: https://github.com/apache/spark/pull/40285#issuecomment-1455258629 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wangyum closed pull request #40285: [SPARK-42675][CONNECT][TESTS] Drop temp view after test `test temp view`

2023-03-05 Thread via GitHub
wangyum closed pull request #40285: [SPARK-42675][CONNECT][TESTS] Drop temp view after test `test temp view` URL: https://github.com/apache/spark/pull/40285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] FurcyPin commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-05 Thread via GitHub
FurcyPin commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1125698656 ## python/pyspark/sql/functions.py: ## @@ -22,20 +22,10 @@ import sys import functools import warnings -from typing import ( -Any, -cast, Review Comment:

[GitHub] [spark] FurcyPin commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-05 Thread via GitHub
FurcyPin commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1125695676 ## python/pyspark/sql/functions.py: ## @@ -22,20 +22,10 @@ import sys import functools import warnings -from typing import ( -Any, -cast, Review Comment:

[GitHub] [spark] itholic commented on a diff in pull request #40236: [SPARK-38735][SQL][TESTS] Add tests for the error class: INTERNAL_ERROR

2023-03-05 Thread via GitHub
itholic commented on code in PR #40236: URL: https://github.com/apache/spark/pull/40236#discussion_r1125682909 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -765,6 +770,58 @@ class QueryExecutionErrorsSuite ) } } +

[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

2023-03-05 Thread via GitHub
LuciferYang commented on PR #40274: URL: https://github.com/apache/spark/pull/40274#issuecomment-1455105130 There is another problem that needs to be confirmed, which may not related to current pr: if other Suites inherit `RemoteSparkSession`, they will share the same connect server,

[GitHub] [spark] ivoson opened a new pull request, #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-05 Thread via GitHub
ivoson opened a new pull request, #40286: URL: https://github.com/apache/spark/pull/40286 ### What changes were proposed in this pull request? Currently a stage will be resubmitted in a few scenarios: 1. Task failed with `FetchFailed` will trigger stage re-submit; 2. Barrier task

  1   2   >