date:20231031

[jira] [Updated] (SPARK-45755) Push down limit through Dataset.isEmpty()

2023-10-31 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45755:

Description: 
Push down LocalLimit can not optimize the case of distinct.

{code:scala}
  def isEmpty: Boolean = withAction("isEmpty",
withTypedPlan { LocalLimit(Literal(1), select().logicalPlan) 
}.queryExecution) { plan =>
plan.executeTake(1).isEmpty
  }
{code}


> Push down limit through Dataset.isEmpty()
> -
>
> Key: SPARK-45755
> URL: https://issues.apache.org/jira/browse/SPARK-45755
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Push down LocalLimit can not optimize the case of distinct.
> {code:scala}
>   def isEmpty: Boolean = withAction("isEmpty",
> withTypedPlan { LocalLimit(Literal(1), select().logicalPlan) 
> }.queryExecution) { plan =>
> plan.executeTake(1).isEmpty
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45755) Push down limit through Dataset.isEmpty()

2023-10-31 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-45755:
---

 Summary: Push down limit through Dataset.isEmpty()
 Key: SPARK-45755
 URL: https://issues.apache.org/jira/browse/SPARK-45755
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45754) Support `spark.deploy.appIdPattern`

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45754:
---
Labels: pull-request-available  (was: )

> Support `spark.deploy.appIdPattern`
> ---
>
> Key: SPARK-45754
> URL: https://issues.apache.org/jira/browse/SPARK-45754
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45754) Support `spark.deploy.appIdPattern`

2023-10-31 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45754:
-

 Summary: Support `spark.deploy.appIdPattern`
 Key: SPARK-45754
 URL: https://issues.apache.org/jira/browse/SPARK-45754
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45753) Support `spark.deploy.driverIdPattern`

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45753:
---
Labels: pull-request-available  (was: )

> Support `spark.deploy.driverIdPattern`
> --
>
> Key: SPARK-45753
> URL: https://issues.apache.org/jira/browse/SPARK-45753
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45753) Support `spark.deploy.driverIdPattern`

2023-10-31 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45753:
-

 Summary: Support `spark.deploy.driverIdPattern`
 Key: SPARK-45753
 URL: https://issues.apache.org/jira/browse/SPARK-45753
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45752) Unreferenced CTE should all be checked by CheckAnalysis0

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45752:
---
Labels: pull-request-available  (was: )

> Unreferenced CTE should all be checked by CheckAnalysis0
> 
>
> Key: SPARK-45752
> URL: https://issues.apache.org/jira/browse/SPARK-45752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45752) Unreferenced CTE should all be checked by CheckAnalysis0

2023-10-31 Thread Rui Wang (Jira)

Rui Wang created SPARK-45752:


 Summary: Unreferenced CTE should all be checked by CheckAnalysis0
 Key: SPARK-45752
 URL: https://issues.apache.org/jira/browse/SPARK-45752
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45734) Upgrade commons-io to 2.15.0

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45734:
---
Labels: pull-request-available  (was: )

> Upgrade commons-io to 2.15.0
> 
>
> Key: SPARK-45734
> URL: https://issues.apache.org/jira/browse/SPARK-45734
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://commons.apache.org/proper/commons-io/changes-report.html#a2.15.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45734) Upgrade commons-io to 2.15.0

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45734.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43592
[https://github.com/apache/spark/pull/43592]

> Upgrade commons-io to 2.15.0
> 
>
> Key: SPARK-45734
> URL: https://issues.apache.org/jira/browse/SPARK-45734
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 4.0.0
>
>
> https://commons.apache.org/proper/commons-io/changes-report.html#a2.15.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45734) Upgrade commons-io to 2.15.0

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45734:
-

Assignee: Yang Jie

> Upgrade commons-io to 2.15.0
> 
>
> Key: SPARK-45734
> URL: https://issues.apache.org/jira/browse/SPARK-45734
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> https://commons.apache.org/proper/commons-io/changes-report.html#a2.15.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"

2023-10-31 Thread Adi Wehrli (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781589#comment-17781589
 ] 

Adi Wehrli commented on SPARK-45644:


Dear [~bersprockets]

That's very kind of you. Thanks a lot.

Best regards,
Adi

> After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException 
> "scala.Some is not a valid external type for schema of array"
> --
>
> Key: SPARK-45644
> URL: https://issues.apache.org/jira/browse/SPARK-45644
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Adi Wehrli
>Priority: Major
>
> I do not really know if this is a bug, but I am at the end with my knowledge.
> A Spark job ran successfully with Spark 3.2.x and 3.3.x. 
> But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job 
> with the same data the following always occurs now:
> {code}
> scala.Some is not a valid external type for schema of array
> {code}
> The corresponding stacktrace is:
> {code}
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch 
> worker for task 0.0 in stage 0.0 (TID 0)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:141) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) 
> [spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   at java.lang.Thread.run(Thread.java:834) [?:?]
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch 
> worker for task 1.0 in stage 0.0 (TID 1)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
>

[jira] [Updated] (SPARK-45751) The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the official website is incorrect

2023-10-31 Thread chenyu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenyu updated SPARK-45751:
---
Attachment: the value on the website.png

> The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the 
> official website is incorrect
> 
>
> Key: SPARK-45751
> URL: https://issues.apache.org/jira/browse/SPARK-45751
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, UI
>Affects Versions: 3.5.0
>Reporter: chenyu
>Priority: Trivial
> Attachments: the default value.png, the value on the website.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45751) The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the official website is incorrect

2023-10-31 Thread chenyu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenyu updated SPARK-45751:
---
Attachment: the default value.png

> The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the 
> official website is incorrect
> 
>
> Key: SPARK-45751
> URL: https://issues.apache.org/jira/browse/SPARK-45751
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, UI
>Affects Versions: 3.5.0
>Reporter: chenyu
>Priority: Trivial
> Attachments: the default value.png, the value on the website.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45751) The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the official website is incorrect

2023-10-31 Thread chenyu (Jira)

chenyu created SPARK-45751:
--

 Summary: The default value of 
‘spark.executor.logs.rolling.maxRetainedFiles' on the official website is 
incorrect
 Key: SPARK-45751
 URL: https://issues.apache.org/jira/browse/SPARK-45751
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, UI
Affects Versions: 3.5.0
Reporter: chenyu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45748) Add a `fromSQL` functionality for Literals

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45748:
---
Labels: pull-request-available  (was: )

> Add a `fromSQL` functionality for Literals
> --
>
> Key: SPARK-45748
> URL: https://issues.apache.org/jira/browse/SPARK-45748
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Xinyi Yu
>Priority: Major
>  Labels: pull-request-available
>
> Add a `fromSQL` helper function for `Literal`s so that together with .sql it 
> serializes and deserializes the Literal values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45750) The param 'spark.shuffle.io.preferDirectBufs' is useless

2023-10-31 Thread chenyu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenyu resolved SPARK-45750.

Resolution: Won't Fix

> The param 'spark.shuffle.io.preferDirectBufs' is useless
> 
>
> Key: SPARK-45750
> URL: https://issues.apache.org/jira/browse/SPARK-45750
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, UI
>Affects Versions: 3.5.0
>Reporter: chenyu
>Priority: Trivial
>
> There is no place to use this parameter.
> We should delete the corresponding configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-45750) The param 'spark.shuffle.io.preferDirectBufs' is useless

2023-10-31 Thread chenyu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenyu closed SPARK-45750.
--

> The param 'spark.shuffle.io.preferDirectBufs' is useless
> 
>
> Key: SPARK-45750
> URL: https://issues.apache.org/jira/browse/SPARK-45750
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, UI
>Affects Versions: 3.5.0
>Reporter: chenyu
>Priority: Trivial
>
> There is no place to use this parameter.
> We should delete the corresponding configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45750) The param 'spark.shuffle.io.preferDirectBufs' is useless

2023-10-31 Thread chenyu (Jira)

chenyu created SPARK-45750:
--

 Summary: The param 'spark.shuffle.io.preferDirectBufs' is useless
 Key: SPARK-45750
 URL: https://issues.apache.org/jira/browse/SPARK-45750
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, UI
Affects Versions: 3.5.0
Reporter: chenyu


There is no place to use this parameter.

We should delete the corresponding configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45749) Fix Spark History Server to sort `Duration` column properly

2023-10-31 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-45749:


Assignee: Dongjoon Hyun

> Fix Spark History Server to sort `Duration` column properly
> ---
>
> Key: SPARK-45749
> URL: https://issues.apache.org/jira/browse/SPARK-45749
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.2.0, 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45749) Fix Spark History Server to sort `Duration` column properly

2023-10-31 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-45749.
--
Fix Version/s: 3.3.4
   3.5.1
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 43613
[https://github.com/apache/spark/pull/43613]

> Fix Spark History Server to sort `Duration` column properly
> ---
>
> Key: SPARK-45749
> URL: https://issues.apache.org/jira/browse/SPARK-45749
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.2.0, 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.4, 3.5.1, 4.0.0, 3.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45749) Fix Spark History Server to sort `Duration` column properly

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45749:
---
Labels: pull-request-available  (was: )

> Fix Spark History Server to sort `Duration` column properly
> ---
>
> Key: SPARK-45749
> URL: https://issues.apache.org/jira/browse/SPARK-45749
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.2.0, 3.3.2, 3.4.1, 3.5.0, 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45749) Fix Spark History Server to sort `Duration` column properly

2023-10-31 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45749:
-

 Summary: Fix Spark History Server to sort `Duration` column 
properly
 Key: SPARK-45749
 URL: https://issues.apache.org/jira/browse/SPARK-45749
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Web UI
Affects Versions: 3.5.0, 3.4.1, 3.3.2, 3.2.0, 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45748) Add a `fromSQL` functionality for Literals

2023-10-31 Thread Xinyi Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yu updated SPARK-45748:
-
Issue Type: Improvement  (was: Bug)

> Add a `fromSQL` functionality for Literals
> --
>
> Key: SPARK-45748
> URL: https://issues.apache.org/jira/browse/SPARK-45748
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Xinyi Yu
>Priority: Major
>
> Add a `fromSQL` helper function for `Literal`s so that together with .sql it 
> serializes and deserializes the Literal values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45748) Add a `fromSQL` functionality for Literals

2023-10-31 Thread Xinyi Yu (Jira)

Xinyi Yu created SPARK-45748:


 Summary: Add a `fromSQL` functionality for Literals
 Key: SPARK-45748
 URL: https://issues.apache.org/jira/browse/SPARK-45748
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Xinyi Yu


Add a `fromSQL` helper function for `Literal`s so that together with .sql it 
serializes and deserializes the Literal values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45654) Add Python data source write API

2023-10-31 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45654.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43516
[https://github.com/apache/spark/pull/43516]

> Add Python data source write API
> 
>
> Key: SPARK-45654
> URL: https://issues.apache.org/jira/browse/SPARK-45654
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add Python data source write API in datasource.py 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45654) Add Python data source write API

2023-10-31 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45654:


Assignee: Allison Wang

> Add Python data source write API
> 
>
> Key: SPARK-45654
> URL: https://issues.apache.org/jira/browse/SPARK-45654
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Add Python data source write API in datasource.py 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45747) Support session window aggregation in state reader

2023-10-31 Thread Chaoqin Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoqin Li updated SPARK-45747:
---
Summary: Support session window aggregation in state reader  (was: Support 
session window operator in state reader)

> Support session window aggregation in state reader
> --
>
> Key: SPARK-45747
> URL: https://issues.apache.org/jira/browse/SPARK-45747
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Chaoqin Li
>Priority: Major
>
> We are introducing state reader in SPARK-45511, but currently session window 
> operator is not supported because the numColPrefixKey is unknown. We can read 
> the operator state metadata introduced in SPARK-45558 to determine the number 
> of prefix columns and load the state of session window correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45747) Support session window operator in state reader

2023-10-31 Thread Chaoqin Li (Jira)

Chaoqin Li created SPARK-45747:
--

 Summary: Support session window operator in state reader
 Key: SPARK-45747
 URL: https://issues.apache.org/jira/browse/SPARK-45747
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.5.0
Reporter: Chaoqin Li


We are introducing state reader in SPARK-45511, but currently session window 
operator is not supported because the numColPrefixKey is unknown. We can read 
the operator state metadata introduced in SPARK-45558 to determine the number 
of prefix columns and load the state of session window correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45741) Upgrade Netty to 4.1.100.Final

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45741:
-

Assignee: Dongjoon Hyun

> Upgrade Netty to 4.1.100.Final
> --
>
> Key: SPARK-45741
> URL: https://issues.apache.org/jira/browse/SPARK-45741
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45741) Upgrade Netty to 4.1.100.Final

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45741.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43605
[https://github.com/apache/spark/pull/43605]

> Upgrade Netty to 4.1.100.Final
> --
>
> Key: SPARK-45741
> URL: https://issues.apache.org/jira/browse/SPARK-45741
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45746) Return specific error messages if UDTF 'analyze' method accepts or returns wrong values

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45746:
---
Labels: pull-request-available  (was: )

> Return specific error messages if UDTF 'analyze' method accepts or returns 
> wrong values
> ---
>
> Key: SPARK-45746
> URL: https://issues.apache.org/jira/browse/SPARK-45746
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20075) Support classifier, packaging in Maven coordinates

2023-10-31 Thread Gera Shegalov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-20075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781550#comment-17781550
 ] 

Gera Shegalov commented on SPARK-20075:
---

This would be a great feature that can help spark-rapids plugin users that 
require a non-default classifier such as cuda12

> Support classifier, packaging in Maven coordinates
> --
>
> Key: SPARK-20075
> URL: https://issues.apache.org/jira/browse/SPARK-20075
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Affects Versions: 2.1.0
>Reporter: Sean R. Owen
>Priority: Minor
>  Labels: bulk-closed
>
> Currently, it's possible to add dependencies to an app using its Maven 
> coordinates on the command line: {{group:artifact:version}}. However, really 
> Maven coordinates are 5-dimensional: 
> {{group:artifact:packaging:classifier:version}}. In some rare but real cases 
> it's important to be able to specify the classifier. And while we're at it 
> why not try to support packaging?
> I have a WIP PR that I'll post soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45746) Return specific error messages if UDTF 'analyze' method accepts or returns wrong values

2023-10-31 Thread Daniel (Jira)

Daniel created SPARK-45746:
--

 Summary: Return specific error messages if UDTF 'analyze' method 
accepts or returns wrong values
 Key: SPARK-45746
 URL: https://issues.apache.org/jira/browse/SPARK-45746
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Daniel






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45739) Catch IOException instead of EOFException alone for faulthandler

2023-10-31 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45739:


Assignee: Hyukjin Kwon

> Catch IOException instead of EOFException alone for faulthandler
> 
>
> Key: SPARK-45739
> URL: https://issues.apache.org/jira/browse/SPARK-45739
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {{spark.python.worker.faulthandler.enabled}} can describe the fatal exception 
> such as segfault hanlder. Exceptions such as {{java.net.SocketException: 
> Connection reset}} can happen because the worker unexpectedly die. We should 
> better catch all IO exception there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45739) Catch IOException instead of EOFException alone for faulthandler

2023-10-31 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45739.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43600
[https://github.com/apache/spark/pull/43600]

> Catch IOException instead of EOFException alone for faulthandler
> 
>
> Key: SPARK-45739
> URL: https://issues.apache.org/jira/browse/SPARK-45739
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {{spark.python.worker.faulthandler.enabled}} can describe the fatal exception 
> such as segfault hanlder. Exceptions such as {{java.net.SocketException: 
> Connection reset}} can happen because the worker unexpectedly die. We should 
> better catch all IO exception there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45745) Extremely slow execution of sum of columns in Spark 3.4.1

2023-10-31 Thread Javier (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Javier updated SPARK-45745:
---
Description: 
We are in the process of upgrading some pySpark jobs from Spark 3.1.2 to Spark 
3.4.1 and some code that was running fine is now basically never ending even 
for small dataframes.

We have simplified the problematic piece of code and the minimum pySpark 
example below shows the issue:
{code:java}
n_cols = 50
data = [{f"col{i}": i for i in range(n_cols)} for _ in range(5)]
df_data = sql_context.createDataFrame(data)

df_data = df_data.withColumn(
"col_sum", sum([F.col(f"col{i}") for i in range(n_cols)])
)
df_data.show(10, False) {code}
Basically, this code with Spark 3.1.2 runs fine but with 3.4.1 the computation 
time seems to explode when the value of `n_cols` is bigger than about 25 
columns. A colleague suggested that it could be related to the limit of 22 
elements in a tuple in Scala 2.13 
(https://www.scala-lang.org/api/current/scala/Tuple22.html), since the 25 
columns are suspiciously close to this. Is there any known defect in the 
logical plan optimization in 3.4.1? Or is this kind of operations (sum of 
multiple columns) supposed to be implemented differently?

  was:
We are in the process of upgrading some pySpark jobs from Spark 3.1.2 to Spark 
3.4.1 and some code that was running fine is now basically never ending even 
for small dataframes.

We have simplified the problematic piece of code and the minimum pySpark 
example below shows the issue:
{code:java}
n_cols = 50
data = [{f"col{i}": i for i in range(n_cols)} for _ in range(5)]
df_data = sql_context.createDataFrame(data)

df_data = df_data.withColumn(
"col_sum", sum([F.col(f"col{i}") for i in range(n_cols)])
)
df_data.show(10, False) {code}
Basically, this code with Spark 3.1.2 runs fine but with 3.4.1 the computation 
time seems to explode when the value of `n_cols` is bigger than about 25 
columns. A colleague suggested that it could be related to the limit of 22 
elements in a tuple in Scala 2.13, since the 25 columns are suspiciously close 
to this. Is there any known defect in the logical plan optimization in 3.4.1? 
Or is this kind of operations (sum of multiple columns) supposed to be 
implemented differently?


> Extremely slow execution of sum of columns in Spark 3.4.1
> -
>
> Key: SPARK-45745
> URL: https://issues.apache.org/jira/browse/SPARK-45745
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.1
>Reporter: Javier
>Priority: Major
>
> We are in the process of upgrading some pySpark jobs from Spark 3.1.2 to 
> Spark 3.4.1 and some code that was running fine is now basically never ending 
> even for small dataframes.
> We have simplified the problematic piece of code and the minimum pySpark 
> example below shows the issue:
> {code:java}
> n_cols = 50
> data = [{f"col{i}": i for i in range(n_cols)} for _ in range(5)]
> df_data = sql_context.createDataFrame(data)
> df_data = df_data.withColumn(
> "col_sum", sum([F.col(f"col{i}") for i in range(n_cols)])
> )
> df_data.show(10, False) {code}
> Basically, this code with Spark 3.1.2 runs fine but with 3.4.1 the 
> computation time seems to explode when the value of `n_cols` is bigger than 
> about 25 columns. A colleague suggested that it could be related to the limit 
> of 22 elements in a tuple in Scala 2.13 
> (https://www.scala-lang.org/api/current/scala/Tuple22.html), since the 25 
> columns are suspiciously close to this. Is there any known defect in the 
> logical plan optimization in 3.4.1? Or is this kind of operations (sum of 
> multiple columns) supposed to be implemented differently?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45745) Extremely slow execution of sum of columns in Spark 3.4.1

2023-10-31 Thread Javier (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781539#comment-17781539
 ] 

Javier commented on SPARK-45745:


I originally posted a comment in StackOverflow asking for feedback on this 
([https://stackoverflow.com/questions/77391731/extremely-slow-execution-in-spark-3-4-1-when-computing-the-sum-of-pyspark-datafr])
 and a user there pointed me to a problem to a never ending UT reported here 
https://issues.apache.org/jira/browse/SPARK-43972 It is for the same Spark 
version, but I honestly don't know if this can be related.

> Extremely slow execution of sum of columns in Spark 3.4.1
> -
>
> Key: SPARK-45745
> URL: https://issues.apache.org/jira/browse/SPARK-45745
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.1
>Reporter: Javier
>Priority: Major
>
> We are in the process of upgrading some pySpark jobs from Spark 3.1.2 to 
> Spark 3.4.1 and some code that was running fine is now basically never ending 
> even for small dataframes.
> We have simplified the problematic piece of code and the minimum pySpark 
> example below shows the issue:
> {code:java}
> n_cols = 50
> data = [{f"col{i}": i for i in range(n_cols)} for _ in range(5)]
> df_data = sql_context.createDataFrame(data)
> df_data = df_data.withColumn(
> "col_sum", sum([F.col(f"col{i}") for i in range(n_cols)])
> )
> df_data.show(10, False) {code}
> Basically, this code with Spark 3.1.2 runs fine but with 3.4.1 the 
> computation time seems to explode when the value of `n_cols` is bigger than 
> about 25 columns. A colleague suggested that it could be related to the limit 
> of 22 elements in a tuple in Scala 2.13, since the 25 columns are 
> suspiciously close to this. Is there any known defect in the logical plan 
> optimization in 3.4.1? Or is this kind of operations (sum of multiple 
> columns) supposed to be implemented differently?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45745) Extremely slow execution of sum of columns in Spark 3.4.1

2023-10-31 Thread Javier (Jira)

Javier created SPARK-45745:
--

 Summary: Extremely slow execution of sum of columns in Spark 3.4.1
 Key: SPARK-45745
 URL: https://issues.apache.org/jira/browse/SPARK-45745
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.4.1
Reporter: Javier


We are in the process of upgrading some pySpark jobs from Spark 3.1.2 to Spark 
3.4.1 and some code that was running fine is now basically never ending even 
for small dataframes.

We have simplified the problematic piece of code and the minimum pySpark 
example below shows the issue:
{code:java}
n_cols = 50
data = [{f"col{i}": i for i in range(n_cols)} for _ in range(5)]
df_data = sql_context.createDataFrame(data)

df_data = df_data.withColumn(
"col_sum", sum([F.col(f"col{i}") for i in range(n_cols)])
)
df_data.show(10, False) {code}
Basically, this code with Spark 3.1.2 runs fine but with 3.4.1 the computation 
time seems to explode when the value of `n_cols` is bigger than about 25 
columns. A colleague suggested that it could be related to the limit of 22 
elements in a tuple in Scala 2.13, since the 25 columns are suspiciously close 
to this. Is there any known defect in the logical plan optimization in 3.4.1? 
Or is this kind of operations (sum of multiple columns) supposed to be 
implemented differently?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"

2023-10-31 Thread Bruce Robbins (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781531#comment-17781531
 ] 

Bruce Robbins commented on SPARK-45644:
---

I will look into it and try to submit a fix. If I can't, I will ping someone 
who can.

> After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException 
> "scala.Some is not a valid external type for schema of array"
> --
>
> Key: SPARK-45644
> URL: https://issues.apache.org/jira/browse/SPARK-45644
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Adi Wehrli
>Priority: Major
>
> I do not really know if this is a bug, but I am at the end with my knowledge.
> A Spark job ran successfully with Spark 3.2.x and 3.3.x. 
> But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job 
> with the same data the following always occurs now:
> {code}
> scala.Some is not a valid external type for schema of array
> {code}
> The corresponding stacktrace is:
> {code}
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch 
> worker for task 0.0 in stage 0.0 (TID 0)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:141) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) 
> [spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   at java.lang.Thread.run(Thread.java:834) [?:?]
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch 
> worker for task 1.0 in stage 0.0 (TID 1)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
>

[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"

2023-10-31 Thread Adi Wehrli (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781526#comment-17781526
 ] 

Adi Wehrli commented on SPARK-45644:


Good evening, [~bersprockets]

Thanks for your reproduction. So, what does this mean now? Will we have a 
bugfix for this? Or do we have to migrate something somehow?

Kind regards, 
Adi

> After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException 
> "scala.Some is not a valid external type for schema of array"
> --
>
> Key: SPARK-45644
> URL: https://issues.apache.org/jira/browse/SPARK-45644
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Adi Wehrli
>Priority: Major
>
> I do not really know if this is a bug, but I am at the end with my knowledge.
> A Spark job ran successfully with Spark 3.2.x and 3.3.x. 
> But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job 
> with the same data the following always occurs now:
> {code}
> scala.Some is not a valid external type for schema of array
> {code}
> The corresponding stacktrace is:
> {code}
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch 
> worker for task 0.0 in stage 0.0 (TID 0)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:141) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) 
> [spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   at java.lang.Thread.run(Thread.java:834) [?:?]
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch 
> worker for task 1.0 in stage 0.0 (TID 1)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
>

[jira] [Comment Edited] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"

2023-10-31 Thread Adi Wehrli (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781526#comment-17781526
 ] 

Adi Wehrli edited comment on SPARK-45644 at 10/31/23 9:38 PM:
--

Good evening, [~bersprockets]

Thanks a lot, that you could reproduce this. So, what does this mean now? Will 
we have a bugfix for this? Or do we have to migrate something somehow?

Kind regards, 
Adi


was (Author: JIRAUSER302746):
Good evening, [~bersprockets]

Thanks for your reproduction. So, what does this mean now? Will we have a 
bugfix for this? Or do we have to migrate something somehow?

Kind regards, 
Adi

> After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException 
> "scala.Some is not a valid external type for schema of array"
> --
>
> Key: SPARK-45644
> URL: https://issues.apache.org/jira/browse/SPARK-45644
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Adi Wehrli
>Priority: Major
>
> I do not really know if this is a bug, but I am at the end with my knowledge.
> A Spark job ran successfully with Spark 3.2.x and 3.3.x. 
> But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job 
> with the same data the following always occurs now:
> {code}
> scala.Some is not a valid external type for schema of array
> {code}
> The corresponding stacktrace is:
> {code}
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch 
> worker for task 0.0 in stage 0.0 (TID 0)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:141) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) 
> [spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   at java.lang.Thread.run(Thread.java:834) [?:?]
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 1.0 in stage 0.0 (TID

[jira] [Assigned] (SPARK-45744) Switch `spark.history.store.serializer` to use `PROTOBUF` by default

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45744:
-

Assignee: Dongjoon Hyun

> Switch `spark.history.store.serializer` to use `PROTOBUF` by default
> 
>
> Key: SPARK-45744
> URL: https://issues.apache.org/jira/browse/SPARK-45744
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision"

2023-10-31 Thread Hannah Amundson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781505#comment-17781505
 ] 

Hannah Amundson commented on SPARK-45699:
-

[~LuciferYang] Do you have any suggestions for tickets that are distributed?

> Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it 
> loses precision"
> --
>
> Key: SPARK-45699
> URL: https://issues.apache.org/jira/browse/SPARK-45699
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold
> [error]       val threshold = max(speculationMultiplier * medianDuration, 
> minTimeToSpeculation)
> [error]                                                                   ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks
> [error]       foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, 
> customizedThreshold = true)
> [error]                                                            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48:
>  Widening conversion from Int to Float is deprecated because it loses 
> precision. Write `.toFloat` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat
> [error]   override def getFloat(i: Int): Float = getInt(i)
> [error]                                                ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49:
>  Widening conversion from Long to Float is deprecated because it loses 
> precision. Write `.toFloat` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat
> [error]   override def getFloat(i: Int): Float = getLong(i)
> [error]                                                 ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble
> [error]   override def getDouble(i: Int): Double = getLong(i)
> [error]                                                   ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45744) Switch `spark.history.store.serializer` to use `PROTOBUF` by default

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45744:
---
Labels: pull-request-available  (was: )

> Switch `spark.history.store.serializer` to use `PROTOBUF` by default
> 
>
> Key: SPARK-45744
> URL: https://issues.apache.org/jira/browse/SPARK-45744
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45744) Switch `spark.history.store.serializer` to use `PROTOBUF` by default

2023-10-31 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45744:
-

 Summary: Switch `spark.history.store.serializer` to use `PROTOBUF` 
by default
 Key: SPARK-45744
 URL: https://issues.apache.org/jira/browse/SPARK-45744
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45743) Upgrade dropwizard metrics 4.2.21

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45743:
---
Labels: pull-request-available  (was: )

> Upgrade dropwizard metrics 4.2.21
> -
>
> Key: SPARK-45743
> URL: https://issues.apache.org/jira/browse/SPARK-45743
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> [https://github.com/dropwizard/metrics/releases/tag/v4.2.21]
> [https://github.com/dropwizard/metrics/releases/tag/v4.2.20]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"

2023-10-31 Thread Bruce Robbins (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781494#comment-17781494
 ] 

Bruce Robbins commented on SPARK-45644:
---

OK, I can reproduce. I will take a look. I will also try to get my reproduction 
example down to a minimal case and will post here later.

> After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException 
> "scala.Some is not a valid external type for schema of array"
> --
>
> Key: SPARK-45644
> URL: https://issues.apache.org/jira/browse/SPARK-45644
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Adi Wehrli
>Priority: Major
>
> I do not really know if this is a bug, but I am at the end with my knowledge.
> A Spark job ran successfully with Spark 3.2.x and 3.3.x. 
> But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job 
> with the same data the following always occurs now:
> {code}
> scala.Some is not a valid external type for schema of array
> {code}
> The corresponding stacktrace is:
> {code}
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch 
> worker for task 0.0 in stage 0.0 (TID 0)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:141) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) 
> [spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   at java.lang.Thread.run(Thread.java:834) [?:?]
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch 
> worker for task 1.0 in stage 0.0 (TID 1)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
>

[jira] [Created] (SPARK-45743) Upgrade dropwizard metrics 4.2.21

2023-10-31 Thread Yang Jie (Jira)

Yang Jie created SPARK-45743:


 Summary: Upgrade dropwizard metrics 4.2.21
 Key: SPARK-45743
 URL: https://issues.apache.org/jira/browse/SPARK-45743
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Yang Jie


[https://github.com/dropwizard/metrics/releases/tag/v4.2.21]

[https://github.com/dropwizard/metrics/releases/tag/v4.2.20]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44896) Consider adding information os_prio, cpu, elapsed, tid, nid, etc., from the jstack tool

2023-10-31 Thread Hannah Amundson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781489#comment-17781489
 ] 

Hannah Amundson commented on SPARK-44896:
-

Hello everyone (and [~yao]),

I am a graduate student at the University of Texas (Computer Science). I have a 
project in my Distributed Systems course to contribute to an open source 
distributed project. Would it be okay if I worked on this ticket?

 

Thanks for your help,

Hannah

> Consider adding information os_prio, cpu, elapsed, tid, nid, etc.,  from the 
> jstack tool
> 
>
> Key: SPARK-44896
> URL: https://issues.apache.org/jira/browse/SPARK-44896
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45190) XML: StructType schema issue in pyspark connect

2023-10-31 Thread Hannah Amundson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781487#comment-17781487
 ] 

Hannah Amundson commented on SPARK-45190:
-

Hello everyone (and [~sandip.agarwala] ),

I am a graduate student at the University of Texas (Computer Science). I have a 
project in my Distributed Systems course to contribute to an open source 
distributed project. Would it be okay if I worked on this ticket?

 

Thanks for your help,

Hannah

> XML: StructType schema issue in pyspark connect
> ---
>
> Key: SPARK-45190
> URL: https://issues.apache.org/jira/browse/SPARK-45190
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Major
>
> The following PR added support for from_xml to pyspark.
> https://github.com/apache/spark/pull/42938
>  
> However, StructType schema is resulting in schema parse error for pyspark 
> connect. 
> Filing a Jira to track this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision"

2023-10-31 Thread Hannah Amundson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781488#comment-17781488
 ] 

Hannah Amundson commented on SPARK-45699:
-

Hello everyone (and [~LuciferYang]),

I am a graduate student at the University of Texas (Computer Science). I have a 
project in my Distributed Systems course to contribute to an open source 
distributed project. Would it be okay if I worked on this ticket?

 

Thanks for your help,

Hannah

> Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it 
> loses precision"
> --
>
> Key: SPARK-45699
> URL: https://issues.apache.org/jira/browse/SPARK-45699
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold
> [error]       val threshold = max(speculationMultiplier * medianDuration, 
> minTimeToSpeculation)
> [error]                                                                   ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks
> [error]       foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, 
> customizedThreshold = true)
> [error]                                                            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48:
>  Widening conversion from Int to Float is deprecated because it loses 
> precision. Write `.toFloat` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat
> [error]   override def getFloat(i: Int): Float = getInt(i)
> [error]                                                ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49:
>  Widening conversion from Long to Float is deprecated because it loses 
> precision. Write `.toFloat` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat
> [error]   override def getFloat(i: Int): Float = getLong(i)
> [error]                                                 ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble
> [error]   override def getDouble(i: Int): Double = getLong(i)
> [error]                                                   ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38473) Use error classes in org.apache.spark.scheduler

2023-10-31 Thread Hannah Amundson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781486#comment-17781486
 ] 

Hannah Amundson commented on SPARK-38473:
-

Hello,

I am a graduate student at the University of Texas (Computer Science). I have a 
project in my Distributed Systems course to contribute to an open source 
distributed project. Would it be okay if I worked on this ticket?

 

Thanks for your help,

Hannah

> Use error classes in org.apache.spark.scheduler
> ---
>
> Key: SPARK-38473
> URL: https://issues.apache.org/jira/browse/SPARK-38473
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-38473) Use error classes in org.apache.spark.scheduler

2023-10-31 Thread Hannah Amundson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781486#comment-17781486
 ] 

Hannah Amundson edited comment on SPARK-38473 at 10/31/23 7:08 PM:
---

Hello everyone (and [~bozhang]),

I am a graduate student at the University of Texas (Computer Science). I have a 
project in my Distributed Systems course to contribute to an open source 
distributed project. Would it be okay if I worked on this ticket?

 

Thanks for your help,

Hannah


was (Author: hannahkamundson):
Hello,

I am a graduate student at the University of Texas (Computer Science). I have a 
project in my Distributed Systems course to contribute to an open source 
distributed project. Would it be okay if I worked on this ticket?

 

Thanks for your help,

Hannah

> Use error classes in org.apache.spark.scheduler
> ---
>
> Key: SPARK-38473
> URL: https://issues.apache.org/jira/browse/SPARK-38473
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45742) Introduce an implicit function for Scala Array to wrap into `immutable.ArraySeq`.

2023-10-31 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45742:
-
Summary: Introduce an implicit function for Scala Array to wrap into 
`immutable.ArraySeq`.  (was: Introduce an implicit method for Scala Array to 
wrap into `immutable.ArraySeq`.)

> Introduce an implicit function for Scala Array to wrap into 
> `immutable.ArraySeq`.
> -
>
> Key: SPARK-45742
> URL: https://issues.apache.org/jira/browse/SPARK-45742
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, MLlib, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> Currently, we need to use `immutable.ArraySeq.unsafeWrapArray(array)` to wrap 
> an Array into an `immutable.ArraySeq`, which makes the code look bloated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45719) Upgrade AWS SDK to v2 for Kubernetes integration tests

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45719:
---
Labels: pull-request-available  (was: )

> Upgrade AWS SDK to v2 for Kubernetes integration tests
> --
>
> Key: SPARK-45719
> URL: https://issues.apache.org/jira/browse/SPARK-45719
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.5.0
>Reporter: Lantao Jin
>Priority: Major
>  Labels: pull-request-available
>
> Sub-task of [SPARK-44124|https://issues.apache.org/jira/browse/SPARK-44124]. 
> In this issue, we will upgrade AWS SDK in Credentials providers, AWS clients 
> and related Kubernetes integration tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45741) Upgrade Netty to 4.1.100.Final

2023-10-31 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45741:
-

 Summary: Upgrade Netty to 4.1.100.Final
 Key: SPARK-45741
 URL: https://issues.apache.org/jira/browse/SPARK-45741
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45741) Upgrade Netty to 4.1.100.Final

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45741:
---
Labels: pull-request-available  (was: )

> Upgrade Netty to 4.1.100.Final
> --
>
> Key: SPARK-45741
> URL: https://issues.apache.org/jira/browse/SPARK-45741
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45172) Upgrade commons-compress.version to 1.24.0

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45172:
--
Summary: Upgrade commons-compress.version to 1.24.0  (was: Upgrade 
commons-compress.version from 1.23.0 to 1.24.0)

> Upgrade commons-compress.version to 1.24.0
> --
>
> Key: SPARK-45172
> URL: https://issues.apache.org/jira/browse/SPARK-45172
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45282) Join loses records for cached datasets

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45282:
--
Target Version/s: 3.4.2, 3.5.1  (was: 3.4.2)

> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or 
> databricks 13.3
>Reporter: koert kuipers
>Priority: Blocker
>  Labels: CorrectnessBug, correctness
>
> we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
> not present on spark 3.3.1.
> it only shows up in distributed environment. i cannot replicate in unit test. 
> however i did get it to show up on hadoop cluster, kubernetes, and on 
> databricks 13.3
> the issue is that records are dropped when two cached dataframes are joined. 
> it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an 
> optimization while in spark 3.3.1 these Exhanges are still present. it seems 
> to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true.
> to reproduce on distributed cluster these settings needed are:
> {code:java}
> spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
> spark.sql.adaptive.coalescePartitions.parallelismFirst false
> spark.sql.adaptive.enabled true
> spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
> code using scala to reproduce is:
> {code:java}
> import java.util.UUID
> import org.apache.spark.sql.functions.col
> import spark.implicits._
> val data = (1 to 100).toDS().map(i => 
> UUID.randomUUID().toString).persist()
> val left = data.map(k => (k, 1))
> val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
> println("number of left " + left.count())
> println("number of right " + right.count())
> println("number of (left join right) " +
>   left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count()
> )
> val left1 = left
>   .toDF("key", "value1")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of left1 " + left1.count())
> val right1 = right
>   .toDF("key", "value2")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of right1 " + right1.count())
> println("number of (left1 join right1) " +  left1.join(right1, 
> "key").count()) // this gives incorrect result{code}
> this produces the following output:
> {code:java}
> number of left 100
> number of right 100
> number of (left join right) 100
> number of left1 100
> number of right1 100
> number of (left1 join right1) 859531 {code}
> note that the last number (the incorrect one) actually varies depending on 
> settings and cluster size etc.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45282) Join loses records for cached datasets

2023-10-31 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781464#comment-17781464
 ] 

Dongjoon Hyun commented on SPARK-45282:
---

Thank you for sharing, [~koert].

> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or 
> databricks 13.3
>Reporter: koert kuipers
>Priority: Blocker
>  Labels: CorrectnessBug, correctness
>
> we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
> not present on spark 3.3.1.
> it only shows up in distributed environment. i cannot replicate in unit test. 
> however i did get it to show up on hadoop cluster, kubernetes, and on 
> databricks 13.3
> the issue is that records are dropped when two cached dataframes are joined. 
> it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an 
> optimization while in spark 3.3.1 these Exhanges are still present. it seems 
> to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true.
> to reproduce on distributed cluster these settings needed are:
> {code:java}
> spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
> spark.sql.adaptive.coalescePartitions.parallelismFirst false
> spark.sql.adaptive.enabled true
> spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
> code using scala to reproduce is:
> {code:java}
> import java.util.UUID
> import org.apache.spark.sql.functions.col
> import spark.implicits._
> val data = (1 to 100).toDS().map(i => 
> UUID.randomUUID().toString).persist()
> val left = data.map(k => (k, 1))
> val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
> println("number of left " + left.count())
> println("number of right " + right.count())
> println("number of (left join right) " +
>   left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count()
> )
> val left1 = left
>   .toDF("key", "value1")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of left1 " + left1.count())
> val right1 = right
>   .toDF("key", "value2")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of right1 " + right1.count())
> println("number of (left1 join right1) " +  left1.join(right1, 
> "key").count()) // this gives incorrect result{code}
> this produces the following output:
> {code:java}
> number of left 100
> number of right 100
> number of (left join right) 100
> number of left1 100
> number of right1 100
> number of (left1 join right1) 859531 {code}
> note that the last number (the incorrect one) actually varies depending on 
> settings and cluster size etc.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45737) Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function.

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45737:
-

Assignee: Yang Jie

> Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` 
> function.
> ---
>
> Key: SPARK-45737
> URL: https://issues.apache.org/jira/browse/SPARK-45737
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> if (takeFromEnd) {
>   while (buf.length < n && i < res.length) {
> val rows = decodeUnsafeRows(res(i)._2)
> if (n - buf.length >= res(i)._1) {
>   buf.prependAll(rows.toArray[InternalRow])
> } else {
>   val dropUntil = res(i)._1 - (n - buf.length)
>   // Same as Iterator.drop but this only takes a long.
>   var j: Long = 0L
>   while (j < dropUntil) { rows.next(); j += 1L}
>   buf.prependAll(rows.toArray[InternalRow])
> }
> i += 1
>   }
> } else {
>   while (buf.length < n && i < res.length) {
> val rows = decodeUnsafeRows(res(i)._2)
> if (n - buf.length >= res(i)._1) {
>   buf ++= rows.toArray[InternalRow]
> } else {
>   buf ++= rows.take(n - buf.length).toArray[InternalRow]
> }
> i += 1
>   }
> } {code}
> In the above code, the input parameters of `mutable.Buffer#prependAll` and 
> `mutable.Growable#++=` functions are `IterableOnce`, and the type of rows is 
> `Iterator[InternalRow]`, which inherits from `IterableOnce`, so there is no 
> need to cast to an array of InternalRow anymore.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45737) Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function.

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45737.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43599
[https://github.com/apache/spark/pull/43599]

> Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` 
> function.
> ---
>
> Key: SPARK-45737
> URL: https://issues.apache.org/jira/browse/SPARK-45737
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> if (takeFromEnd) {
>   while (buf.length < n && i < res.length) {
> val rows = decodeUnsafeRows(res(i)._2)
> if (n - buf.length >= res(i)._1) {
>   buf.prependAll(rows.toArray[InternalRow])
> } else {
>   val dropUntil = res(i)._1 - (n - buf.length)
>   // Same as Iterator.drop but this only takes a long.
>   var j: Long = 0L
>   while (j < dropUntil) { rows.next(); j += 1L}
>   buf.prependAll(rows.toArray[InternalRow])
> }
> i += 1
>   }
> } else {
>   while (buf.length < n && i < res.length) {
> val rows = decodeUnsafeRows(res(i)._2)
> if (n - buf.length >= res(i)._1) {
>   buf ++= rows.toArray[InternalRow]
> } else {
>   buf ++= rows.take(n - buf.length).toArray[InternalRow]
> }
> i += 1
>   }
> } {code}
> In the above code, the input parameters of `mutable.Buffer#prependAll` and 
> `mutable.Growable#++=` functions are `IterableOnce`, and the type of rows is 
> `Iterator[InternalRow]`, which inherits from `IterableOnce`, so there is no 
> need to cast to an array of InternalRow anymore.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45700) Fix `The outer reference in this type test cannot be checked at run time`

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45700.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43582
[https://github.com/apache/spark/pull/43582]

> Fix `The outer reference in this type test cannot be checked at run time`
> -
>
> Key: SPARK-45700
> URL: https://issues.apache.org/jira/browse/SPARK-45700
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:324:12:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.createScalaTestCase
> [error]       case udfTestCase: UDFTest
> [error]            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:506:12:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
> [error]       case udfTestCase: UDFTest =>
> [error]            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:508:12:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
> [error]       case udtfTestCase: UDTFSetTest =>
> [error]            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:514:13:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
> [error]       case _: PgSQLTest =>
> [error]             ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:522:13:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
> [error]       case _: AnsiTest =>
> [error]             ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:524:13:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
> [error]       case _: TimestampNTZTest =>
> [error]             ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:584:12:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries.clue
> [error]       case udfTestCase: UDFTest
> [error]            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:596:12:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries.clue
> [error]       case udtfTestCase: UDTFSetTest
> [error]            ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45702) Fix `the type test for pattern TypeA cannot be checked at runtime`

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45702:
-

Assignee: Yang Jie

> Fix `the type test for pattern TypeA cannot be checked at runtime`
> --
>
> Key: SPARK-45702
> URL: https://issues.apache.org/jira/browse/SPARK-45702
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> {code:java}
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala:100:21:
>  the type test for pattern org.apache.spark.RangePartitioner[K,V] cannot be 
> checked at runtime because it has type parameters eliminated by erasure
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.rdd.OrderedRDDFunctions.filterByRange.rddToFilter
> [error]       case Some(rp: RangePartitioner[K, V]) =>
> [error]  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45702) Fix `the type test for pattern TypeA cannot be checked at runtime`

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45702.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43582
[https://github.com/apache/spark/pull/43582]

> Fix `the type test for pattern TypeA cannot be checked at runtime`
> --
>
> Key: SPARK-45702
> URL: https://issues.apache.org/jira/browse/SPARK-45702
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 4.0.0
>
>
> {code:java}
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala:100:21:
>  the type test for pattern org.apache.spark.RangePartitioner[K,V] cannot be 
> checked at runtime because it has type parameters eliminated by erasure
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.rdd.OrderedRDDFunctions.filterByRange.rddToFilter
> [error]       case Some(rp: RangePartitioner[K, V]) =>
> [error]  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45703) Fix `abstract type TypeA in type pattern Some[TypeA] is unchecked since it is eliminated by erasure`

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45703.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43582
[https://github.com/apache/spark/pull/43582]

> Fix `abstract type TypeA in type pattern Some[TypeA] is unchecked since it is 
> eliminated by erasure`
> 
>
> Key: SPARK-45703
> URL: https://issues.apache.org/jira/browse/SPARK-45703
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 4.0.0
>
>
> {code:java}
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala:105:19:
>  abstract type ScalaInputType in type pattern Some[ScalaInputType] is 
> unchecked since it is eliminated by erasure
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.catalyst.CatalystTypeConverters.CatalystTypeConverter.toCatalyst
> [error]         case opt: Some[ScalaInputType] => toCatalystImpl(opt.get)
> [error]                   ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45700) Fix `The outer reference in this type test cannot be checked at run time`

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45700:
-

Assignee: Yang Jie

> Fix `The outer reference in this type test cannot be checked at run time`
> -
>
> Key: SPARK-45700
> URL: https://issues.apache.org/jira/browse/SPARK-45700
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:324:12:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.createScalaTestCase
> [error]       case udfTestCase: UDFTest
> [error]            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:506:12:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
> [error]       case udfTestCase: UDFTest =>
> [error]            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:508:12:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
> [error]       case udtfTestCase: UDTFSetTest =>
> [error]            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:514:13:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
> [error]       case _: PgSQLTest =>
> [error]             ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:522:13:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
> [error]       case _: AnsiTest =>
> [error]             ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:524:13:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries
> [error]       case _: TimestampNTZTest =>
> [error]             ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:584:12:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries.clue
> [error]       case udfTestCase: UDFTest
> [error]            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:596:12:
>  The outer reference in this type test cannot be checked at run time.
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, 
> site=org.apache.spark.sql.SQLQueryTestSuite.runQueries.clue
> [error]       case udtfTestCase: UDTFSetTest
> [error]            ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45683) Fix `method any2stringadd in object Predef is deprecated`

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45683:
-

Assignee: Yang Jie

> Fix `method any2stringadd in object Predef is deprecated`
> -
>
> Key: SPARK-45683
> URL: https://issues.apache.org/jira/browse/SPARK-45683
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:720:17:
>  method any2stringadd in object Predef is deprecated (since 2.13.0): Implicit 
> injection of + is deprecated. Convert to String to call +
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.catalyst.expressions.BinaryExpression.nullSafeCodeGen.nullSafeEval,
>  origin=scala.Predef.any2stringadd, version=2.13.0
> [warn]         leftGen.code + ctx.nullSafeExec(left.nullable, leftGen.isNull) 
> {
> [warn]                 ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45683) Fix `method any2stringadd in object Predef is deprecated`

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45683.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43580
[https://github.com/apache/spark/pull/43580]

> Fix `method any2stringadd in object Predef is deprecated`
> -
>
> Key: SPARK-45683
> URL: https://issues.apache.org/jira/browse/SPARK-45683
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:720:17:
>  method any2stringadd in object Predef is deprecated (since 2.13.0): Implicit 
> injection of + is deprecated. Convert to String to call +
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.catalyst.expressions.BinaryExpression.nullSafeCodeGen.nullSafeEval,
>  origin=scala.Predef.any2stringadd, version=2.13.0
> [warn]         leftGen.code + ctx.nullSafeExec(left.nullable, leftGen.isNull) 
> {
> [warn]                 ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45725) remove the non-default IN subquery runtime filter

2023-10-31 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45725.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43585
[https://github.com/apache/spark/pull/43585]

> remove the non-default IN subquery runtime filter
> -
>
> Key: SPARK-45725
> URL: https://issues.apache.org/jira/browse/SPARK-45725
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45687) Fix `Passing an explicit array value to a Scala varargs method is deprecated`

2023-10-31 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781420#comment-17781420
 ] 

Yang Jie commented on SPARK-45687:
--

Thanks [~ivoson] 

> Fix `Passing an explicit array value to a Scala varargs method is deprecated`
> -
>
> Key: SPARK-45687
> URL: https://issues.apache.org/jira/browse/SPARK-45687
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
>  
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala:945:21:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.AggregationQuerySuite, version=2.13.0
> [warn]         df.agg(udaf(allColumns: _*)),
> [warn]                     ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:156:48:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]         df.agg(aggFunctions.head, aggFunctions.tail: _*),
> [warn]                                                ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:161:76:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]         df.groupBy($"id" % 4 as "mod").agg(aggFunctions.head, 
> aggFunctions.tail: _*),
> [warn]                                                                        
>     ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:171:50:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]           df.agg(aggFunctions.head, aggFunctions.tail: _*),
> [warn]  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45687) Fix `Passing an explicit array value to a Scala varargs method is deprecated`

2023-10-31 Thread Tengfei Huang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781419#comment-17781419
 ] 

Tengfei Huang commented on SPARK-45687:
---

I will work on this.

> Fix `Passing an explicit array value to a Scala varargs method is deprecated`
> -
>
> Key: SPARK-45687
> URL: https://issues.apache.org/jira/browse/SPARK-45687
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
>  
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala:945:21:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.AggregationQuerySuite, version=2.13.0
> [warn]         df.agg(udaf(allColumns: _*)),
> [warn]                     ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:156:48:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]         df.agg(aggFunctions.head, aggFunctions.tail: _*),
> [warn]                                                ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:161:76:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]         df.groupBy($"id" % 4 as "mod").agg(aggFunctions.head, 
> aggFunctions.tail: _*),
> [warn]                                                                        
>     ^
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:171:50:
>  Passing an explicit array value to a Scala varargs method is deprecated 
> (since 2.13.0) and will result in a defensive copy; Use the more efficient 
> non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, 
> version=2.13.0
> [warn]           df.agg(aggFunctions.head, aggFunctions.tail: _*),
> [warn]  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41533) GRPC Errors on the client should be cleaned up

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-41533:
---
Labels: pull-request-available  (was: )

> GRPC Errors on the client should be cleaned up
> --
>
> Key: SPARK-41533
> URL: https://issues.apache.org/jira/browse/SPARK-41533
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When the server throws an exception we report a very deep stack trace that is 
> not helpful for the user. 
> We need to separate the cause from the user visible exception and wrap the 
> error into custom exception instead of publishing the RPCError from GRPC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`

2023-10-31 Thread Tengfei Huang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781374#comment-17781374
 ] 

Tengfei Huang commented on SPARK-45694:
---

sure, will include [SPARK-45695] Fix `method force in trait View is deprecated` 
- ASF JIRA (apache.org) in one PR.

> Fix `method signum in trait ScalaNumberProxy is deprecated`
> ---
>
> Key: SPARK-45694
> URL: https://issues.apache.org/jira/browse/SPARK-45694
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25:
>  method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use 
> `sign` method instead
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc,
>  origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0
> [warn]       val uc = useCount.signum
> [warn]   {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal

2023-10-31 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-45368.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43456
[https://github.com/apache/spark/pull/43456]

> Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
> ---
>
> Key: SPARK-45368
> URL: https://issues.apache.org/jira/browse/SPARK-45368
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: tangjiafu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal

2023-10-31 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-45368:


Assignee: tangjiafu

> Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
> ---
>
> Key: SPARK-45368
> URL: https://issues.apache.org/jira/browse/SPARK-45368
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: tangjiafu
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"

2023-10-31 Thread Adi Wehrli (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781348#comment-17781348
 ] 

Adi Wehrli commented on SPARK-45644:


But I can really not say which job statement causes this problem. I'm not sure 
but I suppose that it could have something to do with 
{{org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer}} (from 
{{org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0}}) and the likes.

> After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException 
> "scala.Some is not a valid external type for schema of array"
> --
>
> Key: SPARK-45644
> URL: https://issues.apache.org/jira/browse/SPARK-45644
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Adi Wehrli
>Priority: Major
>
> I do not really know if this is a bug, but I am at the end with my knowledge.
> A Spark job ran successfully with Spark 3.2.x and 3.3.x. 
> But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job 
> with the same data the following always occurs now:
> {code}
> scala.Some is not a valid external type for schema of array
> {code}
> The corresponding stacktrace is:
> {code}
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch 
> worker for task 0.0 in stage 0.0 (TID 0)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:141) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) 
> [spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   at java.lang.Thread.run(Thread.java:834) [?:?]
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch 
> worker for task 1.0 in stage 0.0 (TID 1)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
>

[jira] [Updated] (SPARK-45740) Relax the node prefix of SparkPlanGraphCluster

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45740:
---
Labels: pull-request-available  (was: )

> Relax the node prefix of SparkPlanGraphCluster
> --
>
> Key: SPARK-45740
> URL: https://issues.apache.org/jira/browse/SPARK-45740
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45740) Relax the node prefix of SparkPlanGraphCluster

2023-10-31 Thread XiDuo You (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-45740:
--
Summary: Relax the node prefix of SparkPlanGraphCluster  (was: Release the 
node prefix of SparkPlanGraphCluster)

> Relax the node prefix of SparkPlanGraphCluster
> --
>
> Key: SPARK-45740
> URL: https://issues.apache.org/jira/browse/SPARK-45740
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45740) Release the node prefix of SparkPlanGraphCluster

2023-10-31 Thread XiDuo You (Jira)

XiDuo You created SPARK-45740:
-

 Summary: Release the node prefix of SparkPlanGraphCluster
 Key: SPARK-45740
 URL: https://issues.apache.org/jira/browse/SPARK-45740
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: XiDuo You






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45732) Upgrade commons-text to 1.11.0

2023-10-31 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45732.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43590
[https://github.com/apache/spark/pull/43590]

> Upgrade commons-text to 1.11.0
> --
>
> Key: SPARK-45732
> URL: https://issues.apache.org/jira/browse/SPARK-45732
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45732) Upgrade commons-text to 1.11.0

2023-10-31 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45732:


Assignee: BingKun Pan

> Upgrade commons-text to 1.11.0
> --
>
> Key: SPARK-45732
> URL: https://issues.apache.org/jira/browse/SPARK-45732
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45738) client will wait forever if session in spark connect server is evicted

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45738:
---
Labels: pull-request-available  (was: )

> client will wait forever if session in spark connect server is evicted
> --
>
> Key: SPARK-45738
> URL: https://issues.apache.org/jira/browse/SPARK-45738
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: xie shuiahu
>Priority: Critical
>  Labels: pull-request-available
>
> Step1. start a spark connect server
> Step2. submit a spark job which will run long
> {code:java}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create()
> spark.sql("A SQL will run longer than creating 100 sessions").show() {code}
>  
> Step3. create more than 100 sessions
> Tips: Run concurrently with step2
> {code:java}
> for i in range(0, 200):
>     spark = 
> SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create()
>     spark.sql("show databases") {code}
>  
> *When the python code in step3 is executed, the session created in step2 will 
> be evicted, and the client will wait forever*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45739) Catch IOException instead of EOFException alone for faulthandler

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45739:
---
Labels: pull-request-available  (was: )

> Catch IOException instead of EOFException alone for faulthandler
> 
>
> Key: SPARK-45739
> URL: https://issues.apache.org/jira/browse/SPARK-45739
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {{spark.python.worker.faulthandler.enabled}} can describe the fatal exception 
> such as segfault hanlder. Exceptions such as {{java.net.SocketException: 
> Connection reset}} can happen because the worker unexpectedly die. We should 
> better catch all IO exception there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45739) Catch IOException instead of EOFException alone for faulthandler

2023-10-31 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45739:


 Summary: Catch IOException instead of EOFException alone for 
faulthandler
 Key: SPARK-45739
 URL: https://issues.apache.org/jira/browse/SPARK-45739
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{{spark.python.worker.faulthandler.enabled}} can describe the fatal exception 
such as segfault hanlder. Exceptions such as {{java.net.SocketException: 
Connection reset}} can happen because the worker unexpectedly die. We should 
better catch all IO exception there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45738) client will wait forever if session in spark connect server is evicted

2023-10-31 Thread xie shuiahu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xie shuiahu updated SPARK-45738:

Description: 
Step1. start a spark connect server

Step2. submit a spark job which will run long
{code:java}
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create()
spark.sql("A SQL will run longer than creating 100 sessions").show() {code}
 

Step3. create more than 100 sessions

Tips: Run concurrently with step2
{code:java}
for i in range(0, 200):
    spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create()
    spark.sql("show databases") {code}
 

*When the python code in step3 is executed, the session created in step2 will 
be evicted, and the client will wait forever*

 

  was:
Step1. start a spark connect server


Step2. submit a spark job which will run long
{code:java}
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create()
spark.sql("A SQL will run longer than creating 100 sessions").show() {code}
 

Step3. create more than 100 sessions

Tips: Run concurrently with step2
{code:java}
for i in range(0, 200):
    spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create()
    spark.sql("show databases") {code}
 

*When the python code in step3 is executed, the session created in step2 will 
be evicted, and the client will wait forever*

The server will log exception like this:


> client will wait forever if session in spark connect server is evicted
> --
>
> Key: SPARK-45738
> URL: https://issues.apache.org/jira/browse/SPARK-45738
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: xie shuiahu
>Priority: Critical
>
> Step1. start a spark connect server
> Step2. submit a spark job which will run long
> {code:java}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create()
> spark.sql("A SQL will run longer than creating 100 sessions").show() {code}
>  
> Step3. create more than 100 sessions
> Tips: Run concurrently with step2
> {code:java}
> for i in range(0, 200):
>     spark = 
> SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create()
>     spark.sql("show databases") {code}
>  
> *When the python code in step3 is executed, the session created in step2 will 
> be evicted, and the client will wait forever*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45738) client will wait forever if session in spark connect server is evicted

2023-10-31 Thread xie shuiahu (Jira)

xie shuiahu created SPARK-45738:
---

 Summary: client will wait forever if session in spark connect 
server is evicted
 Key: SPARK-45738
 URL: https://issues.apache.org/jira/browse/SPARK-45738
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: xie shuiahu


Step1. start a spark connect server


Step2. submit a spark job which will run long
{code:java}
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create()
spark.sql("A SQL will run longer than creating 100 sessions").show() {code}
 

Step3. create more than 100 sessions

Tips: Run concurrently with step2
{code:java}
for i in range(0, 200):
    spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create()
    spark.sql("show databases") {code}
 

*When the python code in step3 is executed, the session created in step2 will 
be evicted, and the client will wait forever*

The server will log exception like this:



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"

2023-10-31 Thread Adi Wehrli (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781276#comment-17781276
 ] 

Adi Wehrli commented on SPARK-45644:


Thanks for this information, [~bersprockets]. 

I could now cut the {{MapObjects_10}} method for either version:

h4. Spark 3.5.0
{code:java}
private ArrayData MapObjects_10(InternalRow i) {
  scala.Option value_2284 = null;
  if (!isNull_ExternalMapToCatalyst_value_lambda_variable_42) {
if 
(value_ExternalMapToCatalyst_value_lambda_variable_42.getClass().isArray() || 
value_ExternalMapToCatalyst_value_lambda_variable_42 instanceof 
scala.collection.Seq || value_ExternalMapToCatalyst_value_lambda_variable_42 
instanceof scala.collection.immutable.Set || 
value_ExternalMapToCatalyst_value_lambda_variable_42 instanceof java.util.List) 
{
  value_2284 = (scala.Option) 
value_ExternalMapToCatalyst_value_lambda_variable_42;
} else {
  throw new 
RuntimeException(value_ExternalMapToCatalyst_value_lambda_variable_42.getClass().getName()
 + ((java.lang.String) references[212] /* errMsg */));
}
  }
  final boolean isNull_1963 = 
isNull_ExternalMapToCatalyst_value_lambda_variable_42 || value_2284.isEmpty();
scala.collection.Seq value_2283 = isNull_1963 ? null :
  (scala.collection.Seq) value_2284.get();
  ArrayData value_2282 = null;

  if (!isNull_1963) {

int dataLength_10 = value_2283.size();

UTF8String[] convertedArray_10 = null;
convertedArray_10 = new UTF8String[dataLength_10];


int loopIndex_10 = 0;
scala.collection.Iterator it_10 = value_2283.toIterator();
while (loopIndex_10 < dataLength_10) {
  value_MapObject_lambda_variable_43 = (java.lang.Object) (it_10.next());
  isNull_MapObject_lambda_variable_43 = value_MapObject_lambda_variable_43 
== null;

  resultIsNull_127 = false;
  if (!resultIsNull_127) {
java.lang.String value_2286 = null;
if (!isNull_MapObject_lambda_variable_43) {
  if (value_MapObject_lambda_variable_43 instanceof java.lang.String) {
value_2286 = (java.lang.String) value_MapObject_lambda_variable_43;
  } else {
throw new 
RuntimeException(value_MapObject_lambda_variable_43.getClass().getName() + 
((java.lang.String) references[213] /* errMsg */));
  }
}
resultIsNull_127 = isNull_MapObject_lambda_variable_43;
mutableStateArray_0[121] = value_2286;
  }

  boolean isNull_1965 = resultIsNull_127;
  UTF8String value_2285 = null;
  if (!resultIsNull_127) {
value_2285 = 
org.apache.spark.unsafe.types.UTF8String.fromString(mutableStateArray_0[121]);
  }
  if (isNull_1965) {
convertedArray_10[loopIndex_10] = null;
  } else {
convertedArray_10[loopIndex_10] = value_2285;
  }

  loopIndex_10 += 1;
}

value_2282 = new 
org.apache.spark.sql.catalyst.util.GenericArrayData(convertedArray_10);
  }
  globalIsNull_320 = isNull_1963;
  return value_2282;
}
{code}

h4. Spark 3.3.3:
{code:java}
private scala.collection.Seq MapObjects_10(InternalRow i) {
  scala.collection.Seq value_1083 = null;

  if (!isNull_CatalystToExternalMap_value_lambda_variable_21) {

int dataLength_11 = 
value_CatalystToExternalMap_value_lambda_variable_21.numElements();

scala.collection.mutable.Builder collectionBuilder_10 = 
scala.collection.Seq$.MODULE$.newBuilder();
collectionBuilder_10.sizeHint(dataLength_11);


int loopIndex_11 = 0;

while (loopIndex_11 < dataLength_11) {
  value_MapObject_lambda_variable_22 = (UTF8String) 
(value_CatalystToExternalMap_value_lambda_variable_21.getUTF8String(loopIndex_11));
  isNull_MapObject_lambda_variable_22 = 
value_CatalystToExternalMap_value_lambda_variable_21.isNullAt(loopIndex_11);

  boolean isNull_957 = true;
  java.lang.String value_1084 = null;
  if (!isNull_MapObject_lambda_variable_22) {
isNull_957 = false;
if (!isNull_957) {

  Object funcResult_121 = null;
  funcResult_121 = value_MapObject_lambda_variable_22.toString();
  value_1084 = (java.lang.String) funcResult_121;

}
  }
  if (isNull_957) {
collectionBuilder_10.$plus$eq(null);
  } else {
collectionBuilder_10.$plus$eq(value_1084);
  }

  loopIndex_11 += 1;
}

value_1083 = (scala.collection.Seq) collectionBuilder_10.result();
  }
  globalIsNull_81 = isNull_CatalystToExternalMap_value_lambda_variable_21;
  return value_1083;
}
{code}


> After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException 
> "scala.Some is not a valid external type for schema of array"
> --
>
> Key: SPARK-45644
> URL: https://issues.apache.org/jira/browse/SPARK-45644
> Project: Spark
>

[jira] [Assigned] (SPARK-45737) Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function.

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45737:
--

Assignee: Apache Spark

> Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` 
> function.
> ---
>
> Key: SPARK-45737
> URL: https://issues.apache.org/jira/browse/SPARK-45737
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> if (takeFromEnd) {
>   while (buf.length < n && i < res.length) {
> val rows = decodeUnsafeRows(res(i)._2)
> if (n - buf.length >= res(i)._1) {
>   buf.prependAll(rows.toArray[InternalRow])
> } else {
>   val dropUntil = res(i)._1 - (n - buf.length)
>   // Same as Iterator.drop but this only takes a long.
>   var j: Long = 0L
>   while (j < dropUntil) { rows.next(); j += 1L}
>   buf.prependAll(rows.toArray[InternalRow])
> }
> i += 1
>   }
> } else {
>   while (buf.length < n && i < res.length) {
> val rows = decodeUnsafeRows(res(i)._2)
> if (n - buf.length >= res(i)._1) {
>   buf ++= rows.toArray[InternalRow]
> } else {
>   buf ++= rows.take(n - buf.length).toArray[InternalRow]
> }
> i += 1
>   }
> } {code}
> In the above code, the input parameters of `mutable.Buffer#prependAll` and 
> `mutable.Growable#++=` functions are `IterableOnce`, and the type of rows is 
> `Iterator[InternalRow]`, which inherits from `IterableOnce`, so there is no 
> need to cast to an array of InternalRow anymore.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45737) Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function.

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45737:
--

Assignee: (was: Apache Spark)

> Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` 
> function.
> ---
>
> Key: SPARK-45737
> URL: https://issues.apache.org/jira/browse/SPARK-45737
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> if (takeFromEnd) {
>   while (buf.length < n && i < res.length) {
> val rows = decodeUnsafeRows(res(i)._2)
> if (n - buf.length >= res(i)._1) {
>   buf.prependAll(rows.toArray[InternalRow])
> } else {
>   val dropUntil = res(i)._1 - (n - buf.length)
>   // Same as Iterator.drop but this only takes a long.
>   var j: Long = 0L
>   while (j < dropUntil) { rows.next(); j += 1L}
>   buf.prependAll(rows.toArray[InternalRow])
> }
> i += 1
>   }
> } else {
>   while (buf.length < n && i < res.length) {
> val rows = decodeUnsafeRows(res(i)._2)
> if (n - buf.length >= res(i)._1) {
>   buf ++= rows.toArray[InternalRow]
> } else {
>   buf ++= rows.take(n - buf.length).toArray[InternalRow]
> }
> i += 1
>   }
> } {code}
> In the above code, the input parameters of `mutable.Buffer#prependAll` and 
> `mutable.Growable#++=` functions are `IterableOnce`, and the type of rows is 
> `Iterator[InternalRow]`, which inherits from `IterableOnce`, so there is no 
> need to cast to an array of InternalRow anymore.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45737) Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function.

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45737:
---
Labels: pull-request-available  (was: )

> Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` 
> function.
> ---
>
> Key: SPARK-45737
> URL: https://issues.apache.org/jira/browse/SPARK-45737
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> if (takeFromEnd) {
>   while (buf.length < n && i < res.length) {
> val rows = decodeUnsafeRows(res(i)._2)
> if (n - buf.length >= res(i)._1) {
>   buf.prependAll(rows.toArray[InternalRow])
> } else {
>   val dropUntil = res(i)._1 - (n - buf.length)
>   // Same as Iterator.drop but this only takes a long.
>   var j: Long = 0L
>   while (j < dropUntil) { rows.next(); j += 1L}
>   buf.prependAll(rows.toArray[InternalRow])
> }
> i += 1
>   }
> } else {
>   while (buf.length < n && i < res.length) {
> val rows = decodeUnsafeRows(res(i)._2)
> if (n - buf.length >= res(i)._1) {
>   buf ++= rows.toArray[InternalRow]
> } else {
>   buf ++= rows.take(n - buf.length).toArray[InternalRow]
> }
> i += 1
>   }
> } {code}
> In the above code, the input parameters of `mutable.Buffer#prependAll` and 
> `mutable.Growable#++=` functions are `IterableOnce`, and the type of rows is 
> `Iterator[InternalRow]`, which inherits from `IterableOnce`, so there is no 
> need to cast to an array of InternalRow anymore.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45737) Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function.

2023-10-31 Thread Yang Jie (Jira)

Yang Jie created SPARK-45737:


 Summary: Remove unnecessary `.toArray[InternalRow]` in 
`SparkPlan#executeTake` function.
 Key: SPARK-45737
 URL: https://issues.apache.org/jira/browse/SPARK-45737
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
if (takeFromEnd) {
  while (buf.length < n && i < res.length) {
val rows = decodeUnsafeRows(res(i)._2)
if (n - buf.length >= res(i)._1) {
  buf.prependAll(rows.toArray[InternalRow])
} else {
  val dropUntil = res(i)._1 - (n - buf.length)
  // Same as Iterator.drop but this only takes a long.
  var j: Long = 0L
  while (j < dropUntil) { rows.next(); j += 1L}
  buf.prependAll(rows.toArray[InternalRow])
}
i += 1
  }
} else {
  while (buf.length < n && i < res.length) {
val rows = decodeUnsafeRows(res(i)._2)
if (n - buf.length >= res(i)._1) {
  buf ++= rows.toArray[InternalRow]
} else {
  buf ++= rows.take(n - buf.length).toArray[InternalRow]
}
i += 1
  }
} {code}
In the above code, the input parameters of `mutable.Buffer#prependAll` and 
`mutable.Growable#++=` functions are `IterableOnce`, and the type of rows is 
`Iterator[InternalRow]`, which inherits from `IterableOnce`, so there is no 
need to cast to an array of InternalRow anymore.
 
 
 
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45735) Reenable CatalogTests without Spark Connect

2023-10-31 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45735:


Assignee: Hyukjin Kwon

> Reenable CatalogTests without Spark Connect
> ---
>
> Key: SPARK-45735
> URL: https://issues.apache.org/jira/browse/SPARK-45735
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, Tests
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://issues.apache.org/jira/browse/SPARK-41707 mistakenly make the 
> original tests skipped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45735) Reenable CatalogTests without Spark Connect

2023-10-31 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45735.
--
Fix Version/s: 3.5.1
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 43595
[https://github.com/apache/spark/pull/43595]

> Reenable CatalogTests without Spark Connect
> ---
>
> Key: SPARK-45735
> URL: https://issues.apache.org/jira/browse/SPARK-45735
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, Tests
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0, 3.4.2
>
>
> https://issues.apache.org/jira/browse/SPARK-41707 mistakenly make the 
> original tests skipped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45736) Use \s+ as separator when testing Kafka source or network source

2023-10-31 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45736:
---
Labels: pull-request-available  (was: )

> Use \s+ as separator when testing Kafka source or network source
> 
>
> Key: SPARK-45736
> URL: https://issues.apache.org/jira/browse/SPARK-45736
> Project: Spark
>  Issue Type: Improvement
>  Components: Examples
>Affects Versions: 3.5.0
>Reporter: Deng Ziming
>Priority: Minor
>  Labels: pull-request-available
>
>  When testing data is from Kafka or network, it's possible that we generator 
> redundant "blank".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45701) Clean up the deprecated API usage related to `SetOps`

2023-10-31 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-45701:


Assignee: Yang Jie

> Clean up the deprecated API usage related to `SetOps`
> -
>
> Key: SPARK-45701
> URL: https://issues.apache.org/jira/browse/SPARK-45701
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> * method - in trait SetOps is deprecated (since 2.13.0)
>  * method -- in trait SetOps is deprecated (since 2.13.0)
>  * method + in trait SetOps is deprecated (since 2.13.0)
>  * method retain in trait SetOps is deprecated (since 2.13.0)
>  
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala:70:32:
>  method + in trait SetOps is deprecated (since 2.13.0): Consider requiring an 
> immutable Set or fall back to Set.union
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.storage.BlockReplicationUtils.getSampleIds.indices.$anonfun,
>  origin=scala.collection.SetOps.+, version=2.13.0
> [warn]       if (set.contains(t)) set + i else set + t
> [warn]                                ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45701) Clean up the deprecated API usage related to `SetOps`

2023-10-31 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45701.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43575
[https://github.com/apache/spark/pull/43575]

> Clean up the deprecated API usage related to `SetOps`
> -
>
> Key: SPARK-45701
> URL: https://issues.apache.org/jira/browse/SPARK-45701
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> * method - in trait SetOps is deprecated (since 2.13.0)
>  * method -- in trait SetOps is deprecated (since 2.13.0)
>  * method + in trait SetOps is deprecated (since 2.13.0)
>  * method retain in trait SetOps is deprecated (since 2.13.0)
>  
> {code:java}
> [warn] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala:70:32:
>  method + in trait SetOps is deprecated (since 2.13.0): Consider requiring an 
> immutable Set or fall back to Set.union
> [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, 
> site=org.apache.spark.storage.BlockReplicationUtils.getSampleIds.indices.$anonfun,
>  origin=scala.collection.SetOps.+, version=2.13.0
> [warn]       if (set.contains(t)) set + i else set + t
> [warn]                                ^ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45736) Use \s+ as separator when testing Kafka source or network source

2023-10-31 Thread Deng Ziming (Jira)

Deng Ziming created SPARK-45736:
---

 Summary: Use \s+ as separator when testing Kafka source or network 
source
 Key: SPARK-45736
 URL: https://issues.apache.org/jira/browse/SPARK-45736
 Project: Spark
  Issue Type: Improvement
  Components: Examples
Affects Versions: 3.5.0
Reporter: Deng Ziming


 When testing data is from Kafka or network, it's possible that we generator 
redundant "blank".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

99 matches

Mail list logo