date:20230905

[GitHub] [spark] cloud-fan commented on a diff in pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-09-05 Thread via GitHub



cloud-fan commented on code in PR #42481:
URL: https://github.com/apache/spark/pull/42481#discussion_r1316817958


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -116,6 +119,15 @@ object SQLExecution {
 var ex: Option[Throwable] = None
 val startTime = System.nanoTime()
 try {
+  val planInfo = try {
+SparkPlanInfo.fromSparkPlan(queryExecution.executedPlan)
+  } catch {
+case NonFatal(e) =>
+  logDebug("Failed to generate SparkPlanInfo", e)

Review Comment:
   I think I know where the problem is: We already triggered the analysis and 
failed, and then invoke `withNewExecutionId`. However, within 
`withNewExecutionId`, we don't know that the analysis has failed, and called 
`queryExecution.executedPlan` which triggers the analysis again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #42757: [SPARK-45036][SQL] SPJ: Simplify the logic to handle partially clustered distribution

2023-09-05 Thread via GitHub



LuciferYang commented on PR #42757:
URL: https://github.com/apache/spark/pull/42757#issuecomment-1707754014

   @sunchao After this pr is merged, there are test failures in the daily tests 
of Scala 2.13, we can reproduce the issue offline by executing the following 
command:
   
   ```
   dev/change-scala-version.sh 2.13
   build/sbt "sql/testOnly 
org.apache.spark.sql.connector.KeyGroupedPartitioningSuite" -Pscala-2.13
   ```
   
   ```
   [info] - clustered distribution: output partitioning should be 
KeyGroupedPartitioning *** FAILED *** (1 second, 895 milliseconds)
   [info]   
KeyGroupedPartitioning(List(transformexpression(org.apache.spark.sql.connector.catalog.functions.YearsFunction$@92fa663,
 ts#11, None)), 3, List([50], [51], [52]), List([50], [51], [52])) did not 
equal 
KeyGroupedPartitioning(ArraySeq(transformexpression(org.apache.spark.sql.connector.catalog.functions.YearsFunction$@92fa663,
 ts#11, None)), 3, List([51], [50], [52]), ArraySeq([50], [51], [52])) 
(KeyGroupedPartitioningSuite.scala:237)
   [info]   Analysis:
   [info]   KeyGroupedPartitioning(partitionValues: List(0: [50] -> [51], 1: 
[51] -> [50]), uniquePartitionValues: List(0: [50] -> [51], 1: [51] -> [50]))
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
   [info]   at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
   [info]   at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
   [info]   at 
org.apache.spark.sql.connector.KeyGroupedPartitioningSuite.checkQueryPlan(KeyGroupedPartitioningSuite.scala:237)
   [info]   at 
org.apache.spark.sql.connector.KeyGroupedPartitioningSuite.$anonfun$new$4(KeyGroupedPartitioningSuite.scala:103)
   [info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   [info]   at 
org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
   [info]   at 
org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
   [info]   at 
org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
   [info]   at 
org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
   [info]   at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
   [info]   at 
org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
   [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
   [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
   [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
   [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
   [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
   [info]   at 
org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
   [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
   [info]   at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69)
   [info]   at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
   [info]   at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
   [info]   at 
org.apache.spark.sql.connector.DistributionAndOrderingSuiteBase.org$scalatest$BeforeAndAfter$$super$runTest(DistributionAndOrderingSuiteBase.scala:33)
   [info]   at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:213)
   [info]   at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:203)
   [info]   at 
org.apache.spark.sql.connector.DistributionAndOrderingSuiteBase.runTest(DistributionAndOrderingSuiteBase.scala:33)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
   [info]   at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
   [info]   at scala.collection.immutable.List.foreach(List.scala:333)
   [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
   [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
   [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
   [info]   at 
org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
   [info]   at org.s

[GitHub] [spark] cloud-fan commented on a diff in pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-09-05 Thread via GitHub



cloud-fan commented on code in PR #42481:
URL: https://github.com/apache/spark/pull/42481#discussion_r1316797688


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -116,6 +119,15 @@ object SQLExecution {
 var ex: Option[Throwable] = None
 val startTime = System.nanoTime()
 try {
+  val planInfo = try {
+SparkPlanInfo.fromSparkPlan(queryExecution.executedPlan)
+  } catch {
+case NonFatal(e) =>
+  logDebug("Failed to generate SparkPlanInfo", e)

Review Comment:
   nvm, `body` is just a `throw ...` statement so no need to save the execution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a diff in pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-09-05 Thread via GitHub



yaooqinn commented on code in PR #42481:
URL: https://github.com/apache/spark/pull/42481#discussion_r1316792164


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -116,6 +119,15 @@ object SQLExecution {
 var ex: Option[Throwable] = None
 val startTime = System.nanoTime()
 try {
+  val planInfo = try {
+SparkPlanInfo.fromSparkPlan(queryExecution.executedPlan)
+  } catch {
+case NonFatal(e) =>
+  logDebug("Failed to generate SparkPlanInfo", e)

Review Comment:
   This doesn't seem right. If the `boby` is from an analysis error, `ex = 
Some(e)` would be a deterministic step that prevents execute `body` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-09-05 Thread via GitHub



cloud-fan commented on code in PR #42481:
URL: https://github.com/apache/spark/pull/42481#discussion_r1316776270


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -116,6 +119,15 @@ object SQLExecution {
 var ex: Option[Throwable] = None
 val startTime = System.nanoTime()
 try {
+  val planInfo = try {
+SparkPlanInfo.fromSparkPlan(queryExecution.executedPlan)
+  } catch {
+case NonFatal(e) =>
+  logDebug("Failed to generate SparkPlanInfo", e)

Review Comment:
   one more followup: I think we should add `ex = Some(e)` here and skip 
running the `body` if `ex.isDefined`. The reason is to avoid triggering the 
failed analysis twice, which may introduce repeated error logging.
   ```
   ...
   if (ex.isEmpty) body else throw ex.get
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-09-05 Thread via GitHub



cloud-fan commented on code in PR #42481:
URL: https://github.com/apache/spark/pull/42481#discussion_r1316776270


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -116,6 +119,15 @@ object SQLExecution {
 var ex: Option[Throwable] = None
 val startTime = System.nanoTime()
 try {
+  val planInfo = try {
+SparkPlanInfo.fromSparkPlan(queryExecution.executedPlan)
+  } catch {
+case NonFatal(e) =>
+  logDebug("Failed to generate SparkPlanInfo", e)

Review Comment:
   one more followup: I think we should add `ex = Some(e)` here and skip 
running the `body` if `ex.isDefined`. The reason is to avoid triggering the 
failed analysis twice, which may introduce repeated error logging.



##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -116,6 +119,15 @@ object SQLExecution {
 var ex: Option[Throwable] = None
 val startTime = System.nanoTime()
 try {
+  val planInfo = try {
+SparkPlanInfo.fromSparkPlan(queryExecution.executedPlan)
+  } catch {
+case NonFatal(e) =>
+  logDebug("Failed to generate SparkPlanInfo", e)

Review Comment:
   cc @yaooqinn 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #42820: [BUILD][TEST] Remove obsolete repo of DB2 JDBC driver

2023-09-05 Thread via GitHub



pan3793 commented on PR #42820:
URL: https://github.com/apache/spark/pull/42820#issuecomment-1707709417

   cc @srowen @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk closed pull request #42801: [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar`

2023-09-05 Thread via GitHub



MaxGekk closed pull request #42801: [SPARK-45070][SQL][DOCS] Describe the 
binary and datetime formats of `to_char`/`to_varchar`
URL: https://github.com/apache/spark/pull/42801


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #42801: [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar`

2023-09-05 Thread via GitHub



MaxGekk commented on PR #42801:
URL: https://github.com/apache/spark/pull/42801#issuecomment-1707683221

   Merging to master. Thank you, @dongjoon-hyun and @cloud-fan for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

2023-09-05 Thread via GitHub



HyukjinKwon closed pull request #42806: [SPARK-44833][CONNECT] Fix sending 
Reattach too fast after Execute
URL: https://github.com/apache/spark/pull/42806


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

2023-09-05 Thread via GitHub



HyukjinKwon commented on PR #42806:
URL: https://github.com/apache/spark/pull/42806#issuecomment-1707679262

   Merged to master and branch-3.5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng opened a new pull request, #42828: [SPARK-45088][PYTHON][CONNECT] Make `getitem` work with duplicated columns

2023-09-05 Thread via GitHub



zhengruifeng opened a new pull request, #42828:
URL: https://github.com/apache/spark/pull/42828

   ### What changes were proposed in this pull request?
   
   - Make `getitem` work with duplicated columns
   - Disallow bool type index
   - Disallow negative index
   
   
   ### Why are the changes needed?
   1, SQL feature OrderBy ordinal works with duplicated columns
   ```
   In [4]: df = spark.sql("SELECT * FROM VALUES (1, 1.1, 'a'), (2, 2.2, 'b'), 
(4, 4.4, 'c') AS TAB(a, a, a)")
   
   In [5]: df.createOrReplaceTempView("v")
   
   In [6]: spark.sql("SELECT * FROM v ORDER BY 1, 2").show()
   +---+---+---+
   |  a|  a|  a|
   +---+---+---+
   |  1|1.1|  a|
   |  2|2.2|  b|
   |  4|4.4|  c|
   +---+---+---+
   ```
   
   To support it in DataFame APIs, we need to make `getitem` work with 
duplicated columns
   
   
   2 & 3: the support should be unintentional
   
   
   ### Does this PR introduce _any_ user-facing change?
   1, Make `getitem` work with duplicated columns
   ```
   In [6]: df = spark.sql("SELECT * FROM VALUES (1, 1.1, 'a'), (2, 2.2, 'b'), 
(4, 4.4, 'c') AS TAB(a, a, a)")
   
   In [7]: df.orderBy(1, 2).show()
   ---
   AnalysisException Traceback (most recent call last)
   Cell In[7], line 1
   > 1 df.orderBy(1, 2).show()
   
   ...
   
   AnalysisException: [AMBIGUOUS_REFERENCE] Reference `a` is ambiguous, could 
be: [`TAB`.`a`, `TAB`.`a`, `TAB`.`a`].
   
   ```
   
   after
   
   ```
   In [1]: df = spark.sql("SELECT * FROM VALUES (1, 1.1, 'a'), (2, 2.2, 'b'), 
(4, 4.4, 'c') AS TAB(a, a, a)")
   
   In [2]: df[0]
   Out[2]: Column<'a'>
   
   In [3]: df[1]
   Out[3]: Column<'a'>
   
   In [4]: df.orderBy(1, 2).show()
   +---+---+---+
   |  a|  a|  a|
   +---+---+---+
   |  1|1.1|  a|
   |  2|2.2|  b|
   |  4|4.4|  c|
   +---+---+---+
   ```
   
   
   2, Disallow bool type index
   before
   ```
   In [1]: df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], 
schema=["age", "name"],)
   
   In [2]: df[False]
   Out[2]: Column<'age'>
   
   In [3]: df[True]
   Out[3]: Column<'name'>
   ```
   
   after
   ```
   In [2]: df[True]
   ---
   PySparkTypeError  Traceback (most recent call last)
   ...
   PySparkTypeError: [NOT_COLUMN_OR_FLOAT_OR_INT_OR_LIST_OR_STR] Argument 
`item` should be a column, float, integer, list or string, got bool.
   ```
   
   3, Disallow negative index
   before
   ```
   In [1]: df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], 
schema=["age", "name"],)
   
   In [4]: df[-1]
   Out[4]: Column<'name'>
   
   In [5]: df[-2]
   Out[5]: Column<'age'>
   ```
   
   after
   ```
   In [3]: df[-1]
   ---
   IndexErrorTraceback (most recent call last)
   ...
   IndexError: Column index must be non-negative but got -1
   ```
   
   
   
   ### How was this patch tested?
   added UTs
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   NO


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang opened a new pull request, #42827: [BUILD] Test build with maven 3.9.4

2023-09-05 Thread via GitHub



LuciferYang opened a new pull request, #42827:
URL: https://github.com/apache/spark/pull/42827

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn opened a new pull request, #42826: [SPARK-45086][UI] Display hexadecimal for thread lock hash code

2023-09-05 Thread via GitHub



yaooqinn opened a new pull request, #42826:
URL: https://github.com/apache/spark/pull/42826

   
   
   ### What changes were proposed in this pull request?
   
   
   This PR fixes the stringify method for MonitorInfo/LockInfo to use 
`toString` which contains an extra step of Integer.toHexString.
   
   ### Why are the changes needed?
   
   
   to be consistent with the lock-holder
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   yes, UI and the response for the rest 
API(/applications/[app-id]/executors/[executor-id]/threads) change
   
   ### How was this patch tested?
   
   
   verified locally
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #42766: [SPARK-45046][BUILD] Set `shadeTestJar` of `core` module to `false`

2023-09-05 Thread via GitHub



LuciferYang commented on PR #42766:
URL: https://github.com/apache/spark/pull/42766#issuecomment-1707613904

   @gengliangwang After further investigation, there are three modules that 
depend on `spark-streaming_2.12:test-jar`: `mllib`, `streaming-kafka-0-10` and 
`kinesis-asl`. As I lack the necessary credentials to access AWS, I'm unable to 
verify all Unit Tests of the `kinesis-asl` module, so I've chosen to retain the 
configuration of `true` in the streaming module.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #42766: [SPARK-45046][BUILD] Set `shadeTestJar` of `core` module to `false`

2023-09-05 Thread via GitHub



LuciferYang commented on PR #42766:
URL: https://github.com/apache/spark/pull/42766#issuecomment-1707607819

   Merged into master. Thanks @gengliangwang @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang closed pull request #42766: [SPARK-45046][BUILD] Set `shadeTestJar` of `core` module to `false`

2023-09-05 Thread via GitHub



LuciferYang closed pull request #42766: [SPARK-45046][BUILD] Set `shadeTestJar` 
of `core` module to `false`
URL: https://github.com/apache/spark/pull/42766


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub



wangyum commented on PR #42804:
URL: https://github.com/apache/spark/pull/42804#issuecomment-1707606993

   Merged to master, branch-3.5 and branch-3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum closed pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub



wangyum closed pull request #42804: [SPARK-45071][SQL] Optimize the processing 
speed of `BinaryArithmetic#dataType` when processing multi-column data
URL: https://github.com/apache/spark/pull/42804


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



Hisoka-X commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1316669968


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -228,6 +228,15 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {

Review Comment:
   > Null means dropping the column default value.
   > @Nullable
   
   ~~In my view, the javadoc describe correct. Seem like no need to change.~~
   
   oh, I confuse being empty and being null, it really need to be changed
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



Hisoka-X commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1316673105


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -228,6 +228,15 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {

Review Comment:
   > ah, so the javadoc in TableChange.java is wrong. @Hisoka-X can you help to 
fix it?
   
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



Hisoka-X commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1316669968


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -228,6 +228,15 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {

Review Comment:
   > Null means dropping the column default value.
   > @Nullable
   
   ~~In my view, the javadoc describe correct. Seem like no need to change.~~
   
   oh, I confuse being empty and being null, it really need to change it
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



Hisoka-X commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1316669968


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -228,6 +228,15 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {

Review Comment:
   
   > Null means dropping the column default value.
   > @Nullable
   
   In my view, the javadoc describe correct. Seem like no need to change.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



cloud-fan commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1316665187


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -228,6 +228,15 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {

Review Comment:
   ah, so the javadoc in `TableChange.java` is wrong. @Hisoka-X can you help to 
fix it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub



cloud-fan commented on code in PR #42804:
URL: https://github.com/apache/spark/pull/42804#discussion_r1316665134


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -234,11 +240,7 @@ abstract class BinaryArithmetic extends BinaryOperator
 case _ => super.checkInputDataTypes()
   }
 
-  override def dataType: DataType = (left.dataType, right.dataType) match {
-case (DecimalType.Fixed(p1, s1), DecimalType.Fixed(p2, s2)) =>
-  resultDecimalType(p1, s1, p2, s2)
-case _ => left.dataType
-  }
+  override def dataType: DataType = internalDataType

Review Comment:
   ah then it's not an option. We are good to go.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #42823: [SPARK-45080][SS] Explicitly call out support for columnar in DSv2 streaming data sources

2023-09-05 Thread via GitHub



HeartSaVioR commented on PR #42823:
URL: https://github.com/apache/spark/pull/42823#issuecomment-1707590753

   cc. @zsxwing @viirya @xuanyuanking as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn closed pull request #42825: [SPARK-44801][SQL][FOLLOWUP] Remove overdue comments for generating plan info in SQLExecution

2023-09-05 Thread via GitHub



yaooqinn closed pull request #42825: [SPARK-44801][SQL][FOLLOWUP] Remove 
overdue comments for generating plan info in SQLExecution
URL: https://github.com/apache/spark/pull/42825


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #42825: [SPARK-44801][SQL][FOLLOWUP] Remove overdue comments for generating plan info in SQLExecution

2023-09-05 Thread via GitHub



yaooqinn commented on PR #42825:
URL: https://github.com/apache/spark/pull/42825#issuecomment-1707584651

   cc @cloud-fan and thanks for the post review of 
https://github.com/apache/spark/pull/42481/files#r1315972356


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn opened a new pull request, #42825: [SPARK-44801][SQL][FOLLOWUP] Remove overdue comments for generating plan info in SQLExecution

2023-09-05 Thread via GitHub



yaooqinn opened a new pull request, #42825:
URL: https://github.com/apache/spark/pull/42825

   
   
   
   
   ### What changes were proposed in this pull request?
   
   
   This is a followup to clean up comment for generating plan info in 
SQLExecution, which is currently wrapped with try-catch block.
   
   ### Why are the changes needed?
   
   
   overdue and misleading comments cleanup
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   no
   
   ### How was this patch tested?
   
   existing
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #42823: [SPARK-45080][SS] Explicitly call out support for columnar in DSv2 streaming data sources

2023-09-05 Thread via GitHub



HeartSaVioR commented on PR #42823:
URL: https://github.com/apache/spark/pull/42823#issuecomment-1707580885

   cc. @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun opened a new pull request, #42824: [SPARK-45085][SQL] Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic

2023-09-05 Thread via GitHub



panbingkun opened a new pull request, #42824:
URL: https://github.com/apache/spark/pull/42824

   ### What changes were proposed in this pull request?
   The pr aims to:
   - Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION in 
error classes.
   - Refactor some code.
   - Fix some issue about error prompt.
   
   ### Why are the changes needed?
   - The changes improve the error framework.
   - Make code clear.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   - Pass GA.
   - Manual test.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a diff in pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-09-05 Thread via GitHub



yaooqinn commented on code in PR #42481:
URL: https://github.com/apache/spark/pull/42481#discussion_r1316655772


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -124,7 +136,7 @@ object SQLExecution {
 physicalPlanDescription = 
queryExecution.explainString(planDescriptionMode),
 // `queryExecution.executedPlan` triggers query planning. If it 
fails, the exception
 // will be caught and reported in the 
`SparkListenerSQLExecutionEnd`

Review Comment:
   Yeah, I'll make a followup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yliou commented on a diff in pull request #40502: [SPARK-42829] [UI] add repeat identifier to cached RDD on stage page

2023-09-05 Thread via GitHub



yliou commented on code in PR #40502:
URL: https://github.com/apache/spark/pull/40502#discussion_r1316647094


##
core/src/main/scala/org/apache/spark/ui/scope/RDDOperationGraph.scala:
##
@@ -221,6 +226,24 @@ private[spark] object RDDOperationGraph extends Logging {
 RDDOperationGraph(internalEdges.toSeq, outgoingEdges.toSeq, 
incomingEdges.toSeq, rootCluster)
   }
 
+  def getRepeatIdentifier(node: RDDOperationNode): String = {

Review Comment:
   I don't see how we can add that easily in the code given that 
RDDOperationGraph doesn't have a SqlConf instance to read a config.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR opened a new pull request, #42823: [SPARK-45080][SS] Explicitly call out support for columnar in DSv2 streaming data sources

2023-09-05 Thread via GitHub



HeartSaVioR opened a new pull request, #42823:
URL: https://github.com/apache/spark/pull/42823

   ### What changes were proposed in this pull request?
   
   This PR proposes to override `Scan.columnarSupportMode` for DSv2 streaming 
data sources. All of them don't support columnar.
   
   Rationalization will be explained in the next section.
   
   ### Why are the changes needed?
   
   The default value for `Scan.columnarSupportMode` is `PARTITION_DEFINED`, 
which requires `inputPartitions` to be called/evaluated. That could be 
referenced multiple times during planning.
   
   In `MicrobatchScanExec`, we define `inputPartitions` as lazy val, so that 
there is no multiple evaluation of inputPartitions, which calls 
`MicroBatchStream.planInputPartitions`. But we missed that there is no 
guarantee that the instance will be initialized only once (although the actual 
execution will happen once) - for example, executedPlan clones the plan 
(internally we call constructor to make a deep copy of the node), explain 
(internally called to build a SQL execution start event), etc...
   
   I see `MicroBatchStream.planInputPartitions` gets called 4 times per 
microbatch, which can be concerning if the overhead of planInputPartitions is 
non-trivial.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing UTs.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #42815: [SPARK-45077][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread via GitHub



yaooqinn commented on PR #42815:
URL: https://github.com/apache/spark/pull/42815#issuecomment-1707529974

   > Are you saying 0.6.4 doesn't work well? is this just for your testing then 
and not to merge?
   
   Hi @srowen, I apologize for any confusion caused. This pull request is for 
merging purposes not testing. Neither version 0.6.4 nor 0.4.3 are functional, 
except for the custom changes applied. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #42815: [SPARK-45077][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread via GitHub



yaooqinn commented on PR #42815:
URL: https://github.com/apache/spark/pull/42815#issuecomment-1707530280

   also cc @sarutak @HyukjinKwon and @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #42798: [SPARK-43295][PS] Support string type columns for `DataFrameGroupBy.sum`

2023-09-05 Thread via GitHub



itholic commented on code in PR #42798:
URL: https://github.com/apache/spark/pull/42798#discussion_r1316586840


##
python/pyspark/pandas/groupby.py:
##
@@ -3537,14 +3537,14 @@ def _reduce_for_stat_function(
 if sfun.__name__ == "sum" and isinstance(
 psdf._internal.spark_type_for(label), StringType
 ):
+input_scol_name = 
psser._internal.data_spark_column_names[0]
 # Sort data with natural order column to ensure order of 
data
-sorted_array = F.array_sort(
-F.collect_list(F.struct(NATURAL_ORDER_COLUMN_NAME, 
input_scol))
+output_scol = F.concat_ws(
+"",
+F.array_sort(
+F.collect_list(F.struct(NATURAL_ORDER_COLUMN_NAME, 
input_scol))
+).getField(input_scol_name),

Review Comment:
   Sounds good. Updated the code with `transform`. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #42821: [SPARK-45083][PYTHON][DOCS] Refine the docstring of function `min`

2023-09-05 Thread via GitHub



zhengruifeng commented on PR #42821:
URL: https://github.com/apache/spark/pull/42821#issuecomment-1707515898

   merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng closed pull request #42821: [SPARK-45083][PYTHON][DOCS] Refine the docstring of function `min`

2023-09-05 Thread via GitHub



zhengruifeng closed pull request #42821: [SPARK-45083][PYTHON][DOCS] Refine the 
docstring of function `min`
URL: https://github.com/apache/spark/pull/42821


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on a diff in pull request #42109: [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

2023-09-05 Thread via GitHub



panbingkun commented on code in PR #42109:
URL: https://github.com/apache/spark/pull/42109#discussion_r1315780224


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -1134,20 +1134,21 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
   case v: ResolvedPersistentView =>
 val nameParts = v.catalog.name() +: 
v.identifier.asMultipartIdentifier
 throw QueryCompilationErrors.expectTableNotViewError(
-  nameParts, isTemp = false, cmd, relationTypeMismatchHint, u)
+  nameParts, isTemp = false, cmd, true, u)

Review Comment:
   Because we put the content of hint in error-classes.json



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42798: [SPARK-43295][PS] Support string type columns for `DataFrameGroupBy.sum`

2023-09-05 Thread via GitHub



zhengruifeng commented on code in PR #42798:
URL: https://github.com/apache/spark/pull/42798#discussion_r1316574489


##
python/pyspark/pandas/groupby.py:
##
@@ -3537,14 +3537,14 @@ def _reduce_for_stat_function(
 if sfun.__name__ == "sum" and isinstance(
 psdf._internal.spark_type_for(label), StringType
 ):
+input_scol_name = 
psser._internal.data_spark_column_names[0]
 # Sort data with natural order column to ensure order of 
data
-sorted_array = F.array_sort(
-F.collect_list(F.struct(NATURAL_ORDER_COLUMN_NAME, 
input_scol))
+output_scol = F.concat_ws(
+"",
+F.array_sort(
+F.collect_list(F.struct(NATURAL_ORDER_COLUMN_NAME, 
input_scol))
+).getField(input_scol_name),

Review Comment:
   I think you will need `F.transform` to extract the strings.
   
   Other wise, you can use `F.reduce` to directly concate the strings from 
structs []



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42798: [SPARK-43295][PS] Support string type columns for `DataFrameGroupBy.sum`

2023-09-05 Thread via GitHub



zhengruifeng commented on code in PR #42798:
URL: https://github.com/apache/spark/pull/42798#discussion_r1316572356


##
python/pyspark/pandas/groupby.py:
##
@@ -3534,7 +3534,12 @@ def _reduce_for_stat_function(
 for label in psdf._internal.column_labels:
 psser = psdf._psser_for(label)
 input_scol = psser._dtype_op.nan_to_null(psser).spark.column
-output_scol = sfun(input_scol)
+if sfun.__name__ == "sum" and isinstance(
+psdf._internal.spark_type_for(label), StringType
+):
+output_scol = F.concat_ws("", F.collect_list(input_scol))

Review Comment:
   yes, should extract the string col



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rangadi commented on a diff in pull request #42779: [SPARK-45056][PYTHON][SS][CONNECT] Termination tests for streamingQueryListener and foreachBatch

2023-09-05 Thread via GitHub



rangadi commented on code in PR #42779:
URL: https://github.com/apache/spark/pull/42779#discussion_r1316564483


##
python/pyspark/sql/tests/connect/streaming/worker_for_testing.py:
##
@@ -0,0 +1,63 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+A worker for streaming foreachBatch in Spark Connect.
+Usually this is ran on the driver side of the Spark Connect Server.

Review Comment:
   Why do we need this? better to add description of that here. 



##
python/pyspark/sql/tests/connect/streaming/worker_for_testing.py:
##
@@ -0,0 +1,63 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+A worker for streaming foreachBatch in Spark Connect.
+Usually this is ran on the driver side of the Spark Connect Server.
+"""
+import os
+
+from pyspark.java_gateway import local_connect_and_auth
+from pyspark.serializers import (
+write_int,
+UTF8Deserializer,
+CPickleSerializer,
+)
+from typing import IO
+from pyspark.worker_util import check_python_version
+
+pickle_ser = CPickleSerializer()
+utf8_deserializer = UTF8Deserializer()
+
+
+def main(infile: IO, outfile: IO) -> None:
+check_python_version(infile)
+
+connect_url = os.environ["SPARK_CONNECT_LOCAL_URL"]
+session_id = utf8_deserializer.loads(infile)
+
+print(
+"Streaming test worker is starting with " f"url {connect_url} and 
sessionId {session_id}."
+)
+
+write_int(0, outfile)  # Indicate successful initialization
+outfile.flush()
+
+while True:

Review Comment:
   Busy loop like can add a lot of burden on the test infra and make tests 
flaky 



##
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/service/SparkConnectSessionHodlerSuite.scala:
##
@@ -79,4 +90,135 @@ class SparkConnectSessionHolderSuite extends 
SharedSparkSession {
   sessionHolder.getDataFrameOrThrow(key1)
 }
   }
+
+  private def dummyPythonFunction(sessionHolder: SessionHolder): 
SimplePythonFunction = {
+// Needed because pythonPath in PythonWorkerFactory doesn't include test 
spark home
+val sparkPythonPath = PythonUtils.mergePythonPaths(
+  Seq(sparkHome, "python", "lib", "pyspark.zip").mkString(File.separator),
+  Seq(sparkHome, "python", "lib", 
PythonUtils.PY4J_ZIP_NAME).mkString(File.separator))
+
+SimplePythonFunction(
+  command = Array.emptyByteArray,
+  envVars = mutable.Map("PYTHONPATH" -> sparkPythonPath).asJava,
+  pythonIncludes = 
sessionHolder.artifactManager.getSparkConnectPythonIncludes.asJava,
+  pythonExec = IntegratedUDFTestUtils.pythonExec,
+  pythonVer = IntegratedUDFTestUtils.pythonVer,
+  broadcastVars = Lists.newArrayList(),
+  accumulator = null)
+  }
+
+  test("python foreachBatch process: process terminates after query is 
stopped") {
+val sessionHolder = SessionHolder.forTesting(spark)

Review Comment:
   How does this test if the actual python process is not running?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



Hisoka-X commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1316557800


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -228,6 +228,15 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {

Review Comment:
   > Maybe just put a comment here
   
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



Hisoka-X commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1316557650


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -228,6 +228,15 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {
+val newDataType = if (dataType.isDefined) {

Review Comment:
   done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #42815: [SPARK-45077][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread via GitHub



zhengruifeng commented on PR #42815:
URL: https://github.com/apache/spark/pull/42815#issuecomment-1707476622

   cc @gengliangwang @jasonli-db 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #42798: [SPARK-43295][PS] Support string type columns for `DataFrameGroupBy.sum`

2023-09-05 Thread via GitHub



itholic commented on code in PR #42798:
URL: https://github.com/apache/spark/pull/42798#discussion_r1316539867


##
python/pyspark/pandas/groupby.py:
##
@@ -3534,7 +3534,12 @@ def _reduce_for_stat_function(
 for label in psdf._internal.column_labels:
 psser = psdf._psser_for(label)
 input_scol = psser._dtype_op.nan_to_null(psser).spark.column
-output_scol = sfun(input_scol)
+if sfun.__name__ == "sum" and isinstance(
+psdf._internal.spark_type_for(label), StringType
+):
+output_scol = F.concat_ws("", F.collect_list(input_scol))

Review Comment:
   Just adjusted the comments. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #42798: [SPARK-43295][PS] Support string type columns for `DataFrameGroupBy.sum`

2023-09-05 Thread via GitHub



itholic commented on code in PR #42798:
URL: https://github.com/apache/spark/pull/42798#discussion_r1316539576


##
python/pyspark/pandas/groupby.py:
##
@@ -3534,7 +3534,12 @@ def _reduce_for_stat_function(
 for label in psdf._internal.column_labels:
 psser = psdf._psser_for(label)
 input_scol = psser._dtype_op.nan_to_null(psser).spark.column
-output_scol = sfun(input_scol)
+if sfun.__name__ == "sum" and isinstance(
+psdf._internal.spark_type_for(label), StringType
+):
+output_scol = F.concat_ws("", F.collect_list(input_scol))

Review Comment:
   Yeah, then maybe we should extract the only string column from nested arrays 
to pass as arguments to `concat_ws`?
   
   ```
   F.concat_ws(
   "",
   F.array_sort(
   F.collect_list(F.struct(NATURAL_ORDER_COLUMN_NAME, input_scol))
   ).getField(input_scol_name),
   )
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] allisonwang-db opened a new pull request, #42821: [SPARK-45083][PYTHON][DOCS] Refine the docstring of function `min`

2023-09-05 Thread via GitHub



allisonwang-db opened a new pull request, #42821:
URL: https://github.com/apache/spark/pull/42821

   
   
   ### What changes were proposed in this pull request?
   
   This PR refines the function `min` docstring by adding more examples.
   
   ### Why are the changes needed?
   
   To improve PySpark documentation.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No 
   
   ### How was this patch tested?
   
   doctest
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dtenedor commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



dtenedor commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1316435058


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -228,6 +228,15 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {
+val newDataType = if (dataType.isDefined) {

Review Comment:
   you can use `dataType.getOrElse` to simplify here



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -228,6 +228,15 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {

Review Comment:
   Maybe just put a comment here saying "We call 
'ResolveDefaultColumns.analyze' here to make sure that the default value parses 
successfully, and return an error otherwise"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] andylam-db commented on a diff in pull request #42725: [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition

2023-09-05 Thread via GitHub



andylam-db commented on code in PR #42725:
URL: https://github.com/apache/spark/pull/42725#discussion_r1316384305


##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala:
##
@@ -3751,4 +3751,14 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase with Compilat
 "supported" -> "constant expressions"),
   cause = cause)
   }
+
+  def unsupportedCorrelatedSubqueryExpressionInJoinConditionError(

Review Comment:
   Fixed!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] planga82 commented on a diff in pull request #42759: [SPARK-45039][SQL] Include full identifier in Storage tab

2023-09-05 Thread via GitHub



planga82 commented on code in PR #42759:
URL: https://github.com/apache/spark/pull/42759#discussion_r1316342993


##
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala:
##
@@ -117,12 +117,20 @@ class CacheManager extends Logging with 
AdaptiveSparkPlanHelper {
   logWarning("Asked to cache already cached data.")
 } else {
   val sessionWithConfigsOff = getOrCloneSessionWithConfigsOff(spark)
+  val completeTableName = tableName.map { table =>
+import spark.sessionState.analyzer.CatalogAndIdentifier
+spark.sessionState.sqlParser.parseMultipartIdentifier(table) match {
+  case Seq(_) if spark.catalog.getTable(table).isTemporary => table
+
+  case CatalogAndIdentifier(cat, ident) => 
s"${cat.name()}.${ident.toString}"

Review Comment:
   makes sense, I will make the changes, thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub



xuanyuanking commented on PR #42819:
URL: https://github.com/apache/spark/pull/42819#issuecomment-1707217892

   Thanks, merged in master and branch-3.5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking closed pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub



xuanyuanking closed pull request #42819: [SPARK-45082][DOC] Review and fix 
issues in API docs for 3.5.0
URL: https://github.com/apache/spark/pull/42819


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pan3793 opened a new pull request, #42820: [WIP][TEST] Remove obsolete repo of DB2 JDBC driver

2023-09-05 Thread via GitHub



pan3793 opened a new pull request, #42820:
URL: https://github.com/apache/spark/pull/42820

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] WweiL closed pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener

2023-09-05 Thread via GitHub



WweiL closed pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] 
Tests for foreachBatch and Listener
URL: https://github.com/apache/spark/pull/42664


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] WweiL commented on pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener

2023-09-05 Thread via GitHub



WweiL commented on PR #42664:
URL: https://github.com/apache/spark/pull/42664#issuecomment-1707179668

   fixed in 
https://github.com/apache/spark/commit/7be69bf7da036282c2c7c0b62c32e7666fa1b579


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub



xuanyuanking commented on PR #42819:
URL: https://github.com/apache/spark/pull/42819#issuecomment-1707175717

   @srowen Checked manually for both master and branch-3.5, the Mima test 
passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub



xuanyuanking commented on PR #42819:
URL: https://github.com/apache/spark/pull/42819#issuecomment-1707160026

   good point, let me run Mima test manually on both master and 3.5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub



srowen commented on PR #42819:
URL: https://github.com/apache/spark/pull/42819#issuecomment-1707147907

   I'm concerned that this may not pass mima tests and I don't see that this 
test was run because of other errors. Have you checked that it passes in both 
branches?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub



xuanyuanking commented on PR #42819:
URL: https://github.com/apache/spark/pull/42819#issuecomment-1707142988

   The test `On pull request update / Notify test workflow 
(pull_request_target)` that failed has passed for the initial commit. I think 
it's good to go. @srowen, could you give your approval? I aim to include this 
in the 3.5 RC4 cut.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a diff in pull request #42529: [SPARK-44845][YARN][DEPLOY] Fix file system uri comparison function

2023-09-05 Thread via GitHub



mridulm commented on code in PR #42529:
URL: https://github.com/apache/spark/pull/42529#discussion_r1316224064


##
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:
##
@@ -1618,9 +1618,9 @@ private[spark] object Client extends Logging {
   return false
 }
 
-val srcAuthority = srcUri.getAuthority()
-val dstAuthority = dstUri.getAuthority()
-if (srcAuthority != null && !srcAuthority.equalsIgnoreCase(dstAuthority)) {
+val srcUserInfo = srcUri.getUserInfo()
+val dstUserInfo = dstUri.getUserInfo()
+if (srcUserInfo != null && !srcUserInfo.equalsIgnoreCase(dstUserInfo)) {

Review Comment:
   `getAuthority` is not null since it includes the host - `getUserInfo` can be 
`null`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dillitz closed pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable

2023-09-05 Thread via GitHub



dillitz closed pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by 
default for operation IDs to make operations chronologically sortable
URL: https://github.com/apache/spark/pull/42772


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dillitz commented on pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable

2023-09-05 Thread via GitHub



dillitz commented on PR #42772:
URL: https://github.com/apache/spark/pull/42772#issuecomment-1707033407

   We agreed that the benefits of adding this are not big enough because we can 
not rely on the operation ID being UUIDv7 and need to sort by startDate anyway. 
Closing this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub



srowen commented on PR #42819:
URL: https://github.com/apache/spark/pull/42819#issuecomment-1707033259

   Seems OK in principle, just need tests to pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #42815: [SPARK-45077][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread via GitHub



srowen commented on PR #42815:
URL: https://github.com/apache/spark/pull/42815#issuecomment-1707031614

   Are you saying 0.6.4 doesn't work well? is this just for your testing then 
and not to merge?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] juliuszsompolski commented on pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable

2023-09-05 Thread via GitHub



juliuszsompolski commented on PR #42772:
URL: https://github.com/apache/spark/pull/42772#issuecomment-1707008850

   ... although, the currently existing (Spark 3.4) clients never generate 
operationId client side, so we can get away with adding an assertion that the 
client side id is UUID7 in `ExecuteHolder.operationId`.
   But then this has to go into Spark 3.5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] agubichev commented on a diff in pull request #42778: [SPARK-45055] [SQL] Do not transpose windows if they conflict on ORDER BY / PROJECT clauses

2023-09-05 Thread via GitHub



agubichev commented on code in PR #42778:
URL: https://github.com/apache/spark/pull/42778#discussion_r1316176643


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala:
##
@@ -160,4 +160,18 @@ class TransposeWindowSuite extends PlanTest {
 comparePlans(optimized, correctAnswer.analyze)
   }
 
+  test("two windows with overlapping project/order by lists") {

Review Comment:
   modified the test to reflect the bug better. Now it fails at the latest 
master



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] juliuszsompolski commented on pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable

2023-09-05 Thread via GitHub



juliuszsompolski commented on PR #42772:
URL: https://github.com/apache/spark/pull/42772#issuecomment-170760

   We maintain backwards compatibility, where older clients can connect to 
newer server. These older clients will not provide such UUIDs.
   What will happen then? Does it break any critical assumptions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srielau commented on pull request #41683: [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL

2023-09-05 Thread via GitHub



srielau commented on PR #41683:
URL: https://github.com/apache/spark/pull/41683#issuecomment-1706980076

   +1 on using a WITH clause.
   For UPDATE:
   > WITH (OPTIONS (
   Why the nesting?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking opened a new pull request, #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub



xuanyuanking opened a new pull request, #42819:
URL: https://github.com/apache/spark/pull/42819

   ### What changes were proposed in this pull request?
   
   Compare the 3.4 API doc with the 3.5 RC3 cut. Fix the following issues:
   
   - Remove the leaking class/object in API doc
   
   ### Why are the changes needed?
   Fix the issues in the Spark 3.5.0 release API docs.
   
   ### Does this PR introduce _any_ user-facing change?
   No, API doc changes only.
   
   ### How was this patch tested?
   Manually test.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] juliuszsompolski opened a new pull request, #42818: [SPARK-44835] Make INVALID_CURSOR.DISCONNECTED a retriable error

2023-09-05 Thread via GitHub



juliuszsompolski opened a new pull request, #42818:
URL: https://github.com/apache/spark/pull/42818

   ### What changes were proposed in this pull request?
   
   Make INVALID_CURSOR.DISCONNECTED a retriable error.
   
   ### Why are the changes needed?
   
   This error can happen if two RPCs are racing to reattach to the query, and 
the client is still using the losing one. SPARK-44833 was a bug that exposed 
such a situation. That was fixed, but to be more robust, we can make this error 
retryable.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Tests will be added in https://github.com/apache/spark/pull/42560
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sunchao commented on pull request #42757: [SPARK-45036][SQL] SPJ: Simplify the logic to handle partially clustered distribution

2023-09-05 Thread via GitHub



sunchao commented on PR #42757:
URL: https://github.com/apache/spark/pull/42757#issuecomment-1706951663

   Thanks all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dtenedor commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



dtenedor commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1316137847


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -228,6 +228,15 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {

Review Comment:
   The new default expression can't be an empty string because that will not 
parse as a SQL expression. Therefore it is acceptable to use the empty string 
as a sentinel value to indicate that no default value is associated with the 
target column.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] hvanhovell commented on pull request #42807: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

2023-09-05 Thread via GitHub



hvanhovell commented on PR #42807:
URL: https://github.com/apache/spark/pull/42807#issuecomment-1706856355

   I think it is a bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zzzzming95 commented on a diff in pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub



ming95 commented on code in PR #42804:
URL: https://github.com/apache/spark/pull/42804#discussion_r1316079522


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -234,11 +240,7 @@ abstract class BinaryArithmetic extends BinaryOperator
 case _ => super.checkInputDataTypes()
   }
 
-  override def dataType: DataType = (left.dataType, right.dataType) match {
-case (DecimalType.Fixed(p1, s1), DecimalType.Fixed(p2, s2)) =>
-  resultDecimalType(p1, s1, p2, s2)
-case _ => left.dataType
-  }
+  override def dataType: DataType = internalDataType

Review Comment:
   Sorry for the late reply. Change `dataType` is a big change. We need to 
change all implement of the `Expression` . I think we should not make this 
modification in this issue, What do you think?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #42807: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

2023-09-05 Thread via GitHub



dongjoon-hyun commented on PR #42807:
URL: https://github.com/apache/spark/pull/42807#issuecomment-1706851416

   Let me fix that for you, @hvanhovell . If you don't think this is not a bug, 
please let me know, @hvanhovell .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



Hisoka-X commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1316039979


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -228,6 +228,15 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {

Review Comment:
   It will be empty when execute `alter table alter column drop default`
   
![image](https://github.com/apache/spark/assets/32387433/3bb88a55-7021-4ce0-ab89-fb2011c02fbb)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



cloud-fan commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1315981102


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -228,6 +228,15 @@ case class AlterColumn(
   TableChange.updateColumnPosition(colName, newPosition.position)
 }
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
+  if (newDefaultExpression.nonEmpty) {

Review Comment:
   If I read the javadoc correctly, I think the new default expression can't be 
an empty string
   ```
   /**
* Returns the column default value SQL string (Spark SQL dialect). The 
default value literal
* is not provided as updating column default values does not need to 
back-fill existing data.
* Null means dropping the column default value.
*/
   @Nullable
   public String newDefaultValue() { return newDefaultValue; }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk closed pull request #42816: [WIP][SPARK-45022][SQL] Provide context for dataset API errors

2023-09-05 Thread via GitHub



MaxGekk closed pull request #42816: [WIP][SPARK-45022][SQL] Provide context for 
dataset API errors
URL: https://github.com/apache/spark/pull/42816


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-09-05 Thread via GitHub



cloud-fan commented on code in PR #42481:
URL: https://github.com/apache/spark/pull/42481#discussion_r1315972356


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##
@@ -124,7 +136,7 @@ object SQLExecution {
 physicalPlanDescription = 
queryExecution.explainString(planDescriptionMode),
 // `queryExecution.executedPlan` triggers query planning. If it 
fails, the exception
 // will be caught and reported in the 
`SparkListenerSQLExecutionEnd`

Review Comment:
   shall we remove this comment now?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



Hisoka-X commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1315970129


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -230,6 +230,15 @@ case class AlterColumn(
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
   TableChange.updateColumnDefaultValue(colName, newDefaultExpression)

Review Comment:
   done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



cloud-fan commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1315957135


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -230,6 +230,15 @@ case class AlterColumn(
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
   TableChange.updateColumnDefaultValue(colName, newDefaultExpression)

Review Comment:
   shall we apply the check here? then we can save the `if` condition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



cloud-fan commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1315957135


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:
##
@@ -230,6 +230,15 @@ case class AlterColumn(
 val defaultValueChange = setDefaultExpression.map { newDefaultExpression =>
   TableChange.updateColumnDefaultValue(colName, newDefaultExpression)

Review Comment:
   shall we apply the check? then we can save the `if` condition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #41683: [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL

2023-09-05 Thread via GitHub



cloud-fan commented on PR #41683:
URL: https://github.com/apache/spark/pull/41683#issuecomment-1706680230

   Let's spend more time on the API design first, as different people may have 
different opinions and we should collect as much feedback as possible.
   
   Taking a step back, I think what we need is an SQL API to specify per-scan 
options, like `spark.read.options(...)`. The SQL API should be general as it's 
very likely that people will ask for something similar for `df.write.options` 
and `spark.readStream.options`.
   
   TVF can only be used in the FROM clause, so a new SQL syntax may be better 
here. Inspired by the [pgsql 
syntax](https://www.postgresql.org/docs/current/sql-createtable.html), we can 
add a WITH clause to Spark SQL:
   ```
   ... FROM tbl_name WITH (optionA = v1, optionB = v2, ...)
   INSERT INTO tbl_name WITH (optionA = v1, optionB = v2, ...) SELECT ...
   ```
   
   Streaming is orthogonal to this, and this new WITH clause won't conflict 
with it. E.g. we can probably do `... FROM STREAM tbl_name WITH (...)`. It's 
out of the scope of this PR though, as streaming SQL is a big topic.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] juliuszsompolski commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

2023-09-05 Thread via GitHub



juliuszsompolski commented on PR #42806:
URL: https://github.com/apache/spark/pull/42806#issuecomment-1706640901

   
https://github.com/juliuszsompolski/apache-spark/actions/runs/6076122424/job/16483638602
   This module timed out. All connect related tests finished successfuly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] hvanhovell closed pull request #42807: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

2023-09-05 Thread via GitHub



hvanhovell closed pull request #42807: [SPARK-45072][CONNECT] Fix outer scopes 
for ammonite classes
URL: https://github.com/apache/spark/pull/42807


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zzzzming95 commented on a diff in pull request #42574: [SPARK-43149][SQL] `CreateDataSourceTableCommand` should create metadata first

2023-09-05 Thread via GitHub



ming95 commented on code in PR #42574:
URL: https://github.com/apache/spark/pull/42574#discussion_r1315907482


##
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala:
##
@@ -191,16 +193,26 @@ case class CreateDataSourceTableAsSelectCommand(
 schema = tableSchema)
   // Table location is already validated. No need to check it again during 
table creation.
   sessionState.catalog.createTable(newTable, ignoreIfExists = false, 
validateLocation = false)
+  try {
+val result = saveDataIntoTable(
+  sparkSession, table, tableLocation, SaveMode.Overwrite, tableExists 
= false)
 
-  result match {
-case fs: HadoopFsRelation if table.partitionColumnNames.nonEmpty &&
+result match {
+  case fs: HadoopFsRelation if table.partitionColumnNames.nonEmpty &&
 sparkSession.sqlContext.conf.manageFilesourcePartitions =>
-  // Need to recover partitions into the metastore so our saved data 
is visible.
-  sessionState.executePlan(RepairTableCommand(
-table.identifier,
-enableAddPartitions = true,
-enableDropPartitions = false), CommandExecutionMode.SKIP).toRdd
-case _ =>
+// Need to recover partitions into the metastore so our saved data 
is visible.
+sessionState.executePlan(RepairTableCommand(
+  table.identifier,
+  enableAddPartitions = true,
+  enableDropPartitions = false), CommandExecutionMode.SKIP).toRdd
+  case _ =>
+}
+  } catch {
+case NonFatal(e) =>
+  // drop the created table.
+  sessionState.catalog.dropTable(newTable.identifier,

Review Comment:
   > 
   There is no good solution here, changing the order introduces a new issue: 
we may have metadata created but no data exist
   
   Drop table when meet exception during writing. @cloud-fan 
   
   



##
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala:
##
@@ -191,16 +193,26 @@ case class CreateDataSourceTableAsSelectCommand(
 schema = tableSchema)
   // Table location is already validated. No need to check it again during 
table creation.
   sessionState.catalog.createTable(newTable, ignoreIfExists = false, 
validateLocation = false)
+  try {
+val result = saveDataIntoTable(
+  sparkSession, table, tableLocation, SaveMode.Overwrite, tableExists 
= false)
 
-  result match {
-case fs: HadoopFsRelation if table.partitionColumnNames.nonEmpty &&
+result match {
+  case fs: HadoopFsRelation if table.partitionColumnNames.nonEmpty &&
 sparkSession.sqlContext.conf.manageFilesourcePartitions =>
-  // Need to recover partitions into the metastore so our saved data 
is visible.
-  sessionState.executePlan(RepairTableCommand(
-table.identifier,
-enableAddPartitions = true,
-enableDropPartitions = false), CommandExecutionMode.SKIP).toRdd
-case _ =>
+// Need to recover partitions into the metastore so our saved data 
is visible.
+sessionState.executePlan(RepairTableCommand(
+  table.identifier,
+  enableAddPartitions = true,
+  enableDropPartitions = false), CommandExecutionMode.SKIP).toRdd
+  case _ =>
+}
+  } catch {
+case NonFatal(e) =>
+  // drop the created table.
+  sessionState.catalog.dropTable(newTable.identifier,

Review Comment:
   > There is no good solution here, changing the order introduces a new issue: 
we may have metadata created but no data exist
   
   Drop table when meet exception during writing. @cloud-fan 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on a diff in pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub



wangyum commented on code in PR #42804:
URL: https://github.com/apache/spark/pull/42804#discussion_r1315905477


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -234,11 +240,7 @@ abstract class BinaryArithmetic extends BinaryOperator
 case _ => super.checkInputDataTypes()
   }
 
-  override def dataType: DataType = (left.dataType, right.dataType) match {
-case (DecimalType.Fixed(p1, s1), DecimalType.Fixed(p2, s2)) =>
-  resultDecimalType(p1, s1, p2, s2)
-case _ => left.dataType
-  }
+  override def dataType: DataType = internalDataType

Review Comment:
   @ming95 Please address this comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zzzzming95 commented on pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub



ming95 commented on PR #42804:
URL: https://github.com/apache/spark/pull/42804#issuecomment-1706594949

   @cloud-fan @wangyum 
   
   Please merge it to master , thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk opened a new pull request, #42817: [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy

2023-09-05 Thread via GitHub



MaxGekk opened a new pull request, #42817:
URL: https://github.com/apache/spark/pull/42817

   ### What changes were proposed in this pull request?
   In the PR, I propose to check the `accuracy` argument is not a NULL in 
`ApproximatePercentile`. And if it is, throw an `AnalysisException` with new 
error class `DATATYPE_MISMATCH.UNEXPECTED_NULL`.
   
   ### Why are the changes needed?
   To fix the issue demonstrated by the example:
   ```sql
   $ spark-sql (default)> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 
NULL) FROM VALUES (0), (1), (2), (10) AS tab(col);
   [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. 
You hit a bug in Spark or the Spark plugins you use. Please, report this bug to 
the corresponding communities or vendors, and provide the full stack trace.
   ``` 
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   By running new test:
   ```
   $ build/sbt "test:testOnly *.ApproximatePercentileQuerySuite"
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on a diff in pull request #42109: [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

2023-09-05 Thread via GitHub



panbingkun commented on code in PR #42109:
URL: https://github.com/apache/spark/pull/42109#discussion_r1315780224


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -1134,20 +1134,21 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
   case v: ResolvedPersistentView =>
 val nameParts = v.catalog.name() +: 
v.identifier.asMultipartIdentifier
 throw QueryCompilationErrors.expectTableNotViewError(
-  nameParts, isTemp = false, cmd, relationTypeMismatchHint, u)
+  nameParts, isTemp = false, cmd, true, u)

Review Comment:
   Because we put the content of hint in error-classes.json



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on a diff in pull request #42109: [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

2023-09-05 Thread via GitHub



panbingkun commented on code in PR #42109:
URL: https://github.com/apache/spark/pull/42109#discussion_r1315778389


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala:
##
@@ -44,7 +44,7 @@ case class UnresolvedNamespace(multipartIdentifier: 
Seq[String]) extends Unresol
 case class UnresolvedTable(
 multipartIdentifier: Seq[String],
 commandName: String,
-relationTypeMismatchHint: Option[String]) extends UnresolvedLeafNode
+hint: Boolean = false) extends UnresolvedLeafNode

Review Comment:
   Okay, let me refactor it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a diff in pull request #42801: [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar`

2023-09-05 Thread via GitHub



MaxGekk commented on code in PR #42801:
URL: https://github.com/apache/spark/pull/42801#discussion_r1315744531


##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala:
##
@@ -4284,12 +4285,22 @@ object functions {
*   prints '+' for positive values but 'MI' prints a space. 'PR': 
Only allowed at the
*   end of the format string; specifies that the result string will be 
wrapped by angle
*   brackets if the input value is negative. 
+   *   If `e` is a datetime, `format` shall be a valid datetime pattern, see
+   *   https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>Datetime
 Patterns.
+   *   If `e` is a binary, it is converted to a string in one of the formats:
+   *   
+   * 'base64': a base 64 string.
+   * 'hex': a string in the hexadecimal format.
+   * 'utf-8': the input binary is decoded to UTF-8 string.
+   *   

Review Comment:
   Thank you, @dongjoon-hyun 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] whcdjj commented on pull request #28280: [SPARK-31438][CORE][SQL] Support JobCleaned Status in SparkListener

2023-09-05 Thread via GitHub



whcdjj commented on PR #28280:
URL: https://github.com/apache/spark/pull/28280#issuecomment-1706282059

   Is there any progress on this issue? When I was insert into hive table with 
speculative enabled, I encountered the same problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng closed pull request #42811: [SPARK-43241][PS][FOLLOWUP] Add migration guide for behavior change

2023-09-05 Thread via GitHub



zhengruifeng closed pull request #42811: [SPARK-43241][PS][FOLLOWUP] Add 
migration guide for behavior change
URL: https://github.com/apache/spark/pull/42811


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #42811: [SPARK-43241][PS][FOLLOWUP] Add migration guide for behavior change

2023-09-05 Thread via GitHub



zhengruifeng commented on PR #42811:
URL: https://github.com/apache/spark/pull/42811#issuecomment-1706252804

   merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



Hisoka-X commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1315586396


##
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala:
##
@@ -213,7 +213,9 @@ private[sql] object CatalogV2Util {
 // enforced by the parser). On the other hand, commands that drop 
the default value such
 // as "ALTER TABLE t ALTER COLUMN c DROP DEFAULT" will set this 
string to empty.
 if (update.newDefaultValue().nonEmpty) {
-  Some(field.withCurrentDefaultValue(update.newDefaultValue()))
+  val result = 
field.withCurrentDefaultValue(update.newDefaultValue())
+  ResolveDefaultColumns.analyze(result, "ALTER TABLE ALTER COLUMN")

Review Comment:
   Hi, I find the `ALTER TABLE ADD COLUMN` check default expression in 
https://github.com/apache/spark/blob/6c885a7cf57df328b03308cff2eed814bda156e4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala#L144
 when invoke `changes` 
https://github.com/apache/spark/blob/6c885a7cf57df328b03308cff2eed814bda156e4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala#L123
 from `AlterTable` in `DataSourceV2Strategy`. I think add check default 
expression for `ALTER TABLE ALTER COLUMN` into `changes` of `AlterColumn` look 
more reasonable. After this, when we match `AlterTableCommand` in 
`DataSourceV2Strategy` will throw an exception. What do you think about this 
way?
   
   I already updated the code, you can check it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub



Hisoka-X commented on code in PR #42810:
URL: https://github.com/apache/spark/pull/42810#discussion_r1315586396


##
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala:
##
@@ -213,7 +213,9 @@ private[sql] object CatalogV2Util {
 // enforced by the parser). On the other hand, commands that drop 
the default value such
 // as "ALTER TABLE t ALTER COLUMN c DROP DEFAULT" will set this 
string to empty.
 if (update.newDefaultValue().nonEmpty) {
-  Some(field.withCurrentDefaultValue(update.newDefaultValue()))
+  val result = 
field.withCurrentDefaultValue(update.newDefaultValue())
+  ResolveDefaultColumns.analyze(result, "ALTER TABLE ALTER COLUMN")

Review Comment:
   Hi, I find the `ALTER TABLE ADD COLUMN` check default expression in 
https://github.com/apache/spark/blob/6c885a7cf57df328b03308cff2eed814bda156e4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala#L144
 when invoke `changes` 
https://github.com/apache/spark/blob/6c885a7cf57df328b03308cff2eed814bda156e4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala#L123
 from `AlterTable` in `DataSourceV2Strategy`. I think add check default 
expression for `ALTER TABLE ALTER COLUMN` into `changes` of `AlterColumn` look 
more reasonable. After this, when we match `AlterTableCommand` in 
`DataSourceV2Strategy` will throw an exception. What do you think about this 
way?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk opened a new pull request, #42816: [WIP][SPARK-45022][SQL] Provide context for dataset API errors

2023-09-05 Thread via GitHub



MaxGekk opened a new pull request, #42816:
URL: https://github.com/apache/spark/pull/42816

   ### What changes were proposed in this pull request?
   This PR captures the dataset APIs used by the user code and the call site in 
the user code and provides better error messages.
   
   E.g. consider the following Spark app `SimpleApp.scala`:
   ```scala
  1  import org.apache.spark.sql.SparkSession
  2  import org.apache.spark.sql.functions._
  3
  4  object SimpleApp {
  5def main(args: Array[String]) {
  6  val spark = SparkSession.builder.appName("Simple 
Application").config("spark.sql.ansi.enabled", true).getOrCreate()
  7  import spark.implicits._
  8
  9  val c = col("a") / col("b")
 10
 11  Seq((1, 0)).toDF("a", "b").select(c).show()
 12
 13  spark.stop()
 14}
 15  }
   ``` 
   
   After this PR the error message contains the error context (which Spark 
Dataset API is called from where in the user code) in the following form:
   ```
   Exception in thread "main" org.apache.spark.SparkArithmeticException: 
[DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 
and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" 
to bypass this error.
   == Dataset ==
   "div" was called from SimpleApp$.main(SimpleApp.scala:9)
   
at 
org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
at 
org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:672
   ...
   ```
   which is similar to the already provided context in case of SQL queries:
   ```
   org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by 
zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If 
necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
   == SQL(line 1, position 1) ==
   a / b
   ^
   
at 
org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
at 
org.apache.spark.sql.errors.QueryExecutionErrors.divideByZeroError(QueryExecutionErrors.scala)
   ...
   ```
   
   Please note that stack trace in `spark-shell` doesn't contain meaningful 
elements:
   ```
   scala> Thread.currentThread().getStackTrace.foreach(println)
   java.base/java.lang.Thread.getStackTrace(Thread.java:1602)
   $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:23)
   $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:27)
   $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:29)
   $line15.$read$$iw$$iw$$iw$$iw$$iw.(:31)
   $line15.$read$$iw$$iw$$iw$$iw.(:33)
   $line15.$read$$iw$$iw$$iw.(:35)
   $line15.$read$$iw$$iw.(:37)
   $line15.$read$$iw.(:39)
   $line15.$read.(:41)
   $line15.$read$.(:45)
   $line15.$read$.()
   $line15.$eval$.$print$lzycompute(:7)
   $line15.$eval$.$print(:6)
   $line15.$eval.$print()
   java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
   
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   ...
   ```
   so this change doesn't help with that usecase.
   
   ### Why are the changes needed?
   To provide more user friendly errors.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes.
   
   ### How was this patch tested?
   Added new UTs to `QueryExecutionAnsiErrorsSuite`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 107 matches

Mail list logo