[jira] [Updated] (SPARK-45755) Push down limit through Dataset.isEmpty()
[ https://issues.apache.org/jira/browse/SPARK-45755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-45755: Description: Push down LocalLimit can not optimize the case of distinct. {code:scala} def isEmpty: Boolean = withAction("isEmpty", withTypedPlan { LocalLimit(Literal(1), select().logicalPlan) }.queryExecution) { plan => plan.executeTake(1).isEmpty } {code} > Push down limit through Dataset.isEmpty() > - > > Key: SPARK-45755 > URL: https://issues.apache.org/jira/browse/SPARK-45755 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > > Push down LocalLimit can not optimize the case of distinct. > {code:scala} > def isEmpty: Boolean = withAction("isEmpty", > withTypedPlan { LocalLimit(Literal(1), select().logicalPlan) > }.queryExecution) { plan => > plan.executeTake(1).isEmpty > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45755) Push down limit through Dataset.isEmpty()
Yuming Wang created SPARK-45755: --- Summary: Push down limit through Dataset.isEmpty() Key: SPARK-45755 URL: https://issues.apache.org/jira/browse/SPARK-45755 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45754) Support `spark.deploy.appIdPattern`
[ https://issues.apache.org/jira/browse/SPARK-45754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45754: --- Labels: pull-request-available (was: ) > Support `spark.deploy.appIdPattern` > --- > > Key: SPARK-45754 > URL: https://issues.apache.org/jira/browse/SPARK-45754 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45754) Support `spark.deploy.appIdPattern`
Dongjoon Hyun created SPARK-45754: - Summary: Support `spark.deploy.appIdPattern` Key: SPARK-45754 URL: https://issues.apache.org/jira/browse/SPARK-45754 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45753) Support `spark.deploy.driverIdPattern`
[ https://issues.apache.org/jira/browse/SPARK-45753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45753: --- Labels: pull-request-available (was: ) > Support `spark.deploy.driverIdPattern` > -- > > Key: SPARK-45753 > URL: https://issues.apache.org/jira/browse/SPARK-45753 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45753) Support `spark.deploy.driverIdPattern`
Dongjoon Hyun created SPARK-45753: - Summary: Support `spark.deploy.driverIdPattern` Key: SPARK-45753 URL: https://issues.apache.org/jira/browse/SPARK-45753 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45752) Unreferenced CTE should all be checked by CheckAnalysis0
[ https://issues.apache.org/jira/browse/SPARK-45752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45752: --- Labels: pull-request-available (was: ) > Unreferenced CTE should all be checked by CheckAnalysis0 > > > Key: SPARK-45752 > URL: https://issues.apache.org/jira/browse/SPARK-45752 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45752) Unreferenced CTE should all be checked by CheckAnalysis0
Rui Wang created SPARK-45752: Summary: Unreferenced CTE should all be checked by CheckAnalysis0 Key: SPARK-45752 URL: https://issues.apache.org/jira/browse/SPARK-45752 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45734) Upgrade commons-io to 2.15.0
[ https://issues.apache.org/jira/browse/SPARK-45734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45734: --- Labels: pull-request-available (was: ) > Upgrade commons-io to 2.15.0 > > > Key: SPARK-45734 > URL: https://issues.apache.org/jira/browse/SPARK-45734 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > https://commons.apache.org/proper/commons-io/changes-report.html#a2.15.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45734) Upgrade commons-io to 2.15.0
[ https://issues.apache.org/jira/browse/SPARK-45734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45734. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43592 [https://github.com/apache/spark/pull/43592] > Upgrade commons-io to 2.15.0 > > > Key: SPARK-45734 > URL: https://issues.apache.org/jira/browse/SPARK-45734 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 4.0.0 > > > https://commons.apache.org/proper/commons-io/changes-report.html#a2.15.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45734) Upgrade commons-io to 2.15.0
[ https://issues.apache.org/jira/browse/SPARK-45734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45734: - Assignee: Yang Jie > Upgrade commons-io to 2.15.0 > > > Key: SPARK-45734 > URL: https://issues.apache.org/jira/browse/SPARK-45734 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > https://commons.apache.org/proper/commons-io/changes-report.html#a2.15.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"
[ https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781589#comment-17781589 ] Adi Wehrli commented on SPARK-45644: Dear [~bersprockets] That's very kind of you. Thanks a lot. Best regards, Adi > After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException > "scala.Some is not a valid external type for schema of array" > -- > > Key: SPARK-45644 > URL: https://issues.apache.org/jira/browse/SPARK-45644 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Adi Wehrli >Priority: Major > > I do not really know if this is a bug, but I am at the end with my knowledge. > A Spark job ran successfully with Spark 3.2.x and 3.3.x. > But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job > with the same data the following always occurs now: > {code} > scala.Some is not a valid external type for schema of array > {code} > The corresponding stacktrace is: > {code} > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch > worker for task 0.0 in stage 0.0 (TID 0)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.scheduler.Task.run(Task.scala:141) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) > [spark-core_2.12-3.5.0.jar:3.5.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch > worker for task 1.0 in stage 0.0 (TID 1)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at >
[jira] [Updated] (SPARK-45751) The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the official website is incorrect
[ https://issues.apache.org/jira/browse/SPARK-45751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenyu updated SPARK-45751: --- Attachment: the value on the website.png > The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the > official website is incorrect > > > Key: SPARK-45751 > URL: https://issues.apache.org/jira/browse/SPARK-45751 > Project: Spark > Issue Type: Improvement > Components: Spark Core, UI >Affects Versions: 3.5.0 >Reporter: chenyu >Priority: Trivial > Attachments: the default value.png, the value on the website.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45751) The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the official website is incorrect
[ https://issues.apache.org/jira/browse/SPARK-45751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenyu updated SPARK-45751: --- Attachment: the default value.png > The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the > official website is incorrect > > > Key: SPARK-45751 > URL: https://issues.apache.org/jira/browse/SPARK-45751 > Project: Spark > Issue Type: Improvement > Components: Spark Core, UI >Affects Versions: 3.5.0 >Reporter: chenyu >Priority: Trivial > Attachments: the default value.png, the value on the website.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45751) The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the official website is incorrect
chenyu created SPARK-45751: -- Summary: The default value of ‘spark.executor.logs.rolling.maxRetainedFiles' on the official website is incorrect Key: SPARK-45751 URL: https://issues.apache.org/jira/browse/SPARK-45751 Project: Spark Issue Type: Improvement Components: Spark Core, UI Affects Versions: 3.5.0 Reporter: chenyu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45748) Add a `fromSQL` functionality for Literals
[ https://issues.apache.org/jira/browse/SPARK-45748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45748: --- Labels: pull-request-available (was: ) > Add a `fromSQL` functionality for Literals > -- > > Key: SPARK-45748 > URL: https://issues.apache.org/jira/browse/SPARK-45748 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Xinyi Yu >Priority: Major > Labels: pull-request-available > > Add a `fromSQL` helper function for `Literal`s so that together with .sql it > serializes and deserializes the Literal values. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45750) The param 'spark.shuffle.io.preferDirectBufs' is useless
[ https://issues.apache.org/jira/browse/SPARK-45750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenyu resolved SPARK-45750. Resolution: Won't Fix > The param 'spark.shuffle.io.preferDirectBufs' is useless > > > Key: SPARK-45750 > URL: https://issues.apache.org/jira/browse/SPARK-45750 > Project: Spark > Issue Type: Improvement > Components: Spark Core, UI >Affects Versions: 3.5.0 >Reporter: chenyu >Priority: Trivial > > There is no place to use this parameter. > We should delete the corresponding configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-45750) The param 'spark.shuffle.io.preferDirectBufs' is useless
[ https://issues.apache.org/jira/browse/SPARK-45750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenyu closed SPARK-45750. -- > The param 'spark.shuffle.io.preferDirectBufs' is useless > > > Key: SPARK-45750 > URL: https://issues.apache.org/jira/browse/SPARK-45750 > Project: Spark > Issue Type: Improvement > Components: Spark Core, UI >Affects Versions: 3.5.0 >Reporter: chenyu >Priority: Trivial > > There is no place to use this parameter. > We should delete the corresponding configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45750) The param 'spark.shuffle.io.preferDirectBufs' is useless
chenyu created SPARK-45750: -- Summary: The param 'spark.shuffle.io.preferDirectBufs' is useless Key: SPARK-45750 URL: https://issues.apache.org/jira/browse/SPARK-45750 Project: Spark Issue Type: Improvement Components: Spark Core, UI Affects Versions: 3.5.0 Reporter: chenyu There is no place to use this parameter. We should delete the corresponding configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45749) Fix Spark History Server to sort `Duration` column properly
[ https://issues.apache.org/jira/browse/SPARK-45749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-45749: Assignee: Dongjoon Hyun > Fix Spark History Server to sort `Duration` column properly > --- > > Key: SPARK-45749 > URL: https://issues.apache.org/jira/browse/SPARK-45749 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.2.0, 3.3.2, 3.4.1, 3.5.0, 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45749) Fix Spark History Server to sort `Duration` column properly
[ https://issues.apache.org/jira/browse/SPARK-45749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-45749. -- Fix Version/s: 3.3.4 3.5.1 4.0.0 3.4.2 Resolution: Fixed Issue resolved by pull request 43613 [https://github.com/apache/spark/pull/43613] > Fix Spark History Server to sort `Duration` column properly > --- > > Key: SPARK-45749 > URL: https://issues.apache.org/jira/browse/SPARK-45749 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.2.0, 3.3.2, 3.4.1, 3.5.0, 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 3.3.4, 3.5.1, 4.0.0, 3.4.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45749) Fix Spark History Server to sort `Duration` column properly
[ https://issues.apache.org/jira/browse/SPARK-45749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45749: --- Labels: pull-request-available (was: ) > Fix Spark History Server to sort `Duration` column properly > --- > > Key: SPARK-45749 > URL: https://issues.apache.org/jira/browse/SPARK-45749 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.2.0, 3.3.2, 3.4.1, 3.5.0, 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45749) Fix Spark History Server to sort `Duration` column properly
Dongjoon Hyun created SPARK-45749: - Summary: Fix Spark History Server to sort `Duration` column properly Key: SPARK-45749 URL: https://issues.apache.org/jira/browse/SPARK-45749 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Affects Versions: 3.5.0, 3.4.1, 3.3.2, 3.2.0, 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45748) Add a `fromSQL` functionality for Literals
[ https://issues.apache.org/jira/browse/SPARK-45748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinyi Yu updated SPARK-45748: - Issue Type: Improvement (was: Bug) > Add a `fromSQL` functionality for Literals > -- > > Key: SPARK-45748 > URL: https://issues.apache.org/jira/browse/SPARK-45748 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Xinyi Yu >Priority: Major > > Add a `fromSQL` helper function for `Literal`s so that together with .sql it > serializes and deserializes the Literal values. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45748) Add a `fromSQL` functionality for Literals
Xinyi Yu created SPARK-45748: Summary: Add a `fromSQL` functionality for Literals Key: SPARK-45748 URL: https://issues.apache.org/jira/browse/SPARK-45748 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Xinyi Yu Add a `fromSQL` helper function for `Literal`s so that together with .sql it serializes and deserializes the Literal values. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45654) Add Python data source write API
[ https://issues.apache.org/jira/browse/SPARK-45654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45654. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43516 [https://github.com/apache/spark/pull/43516] > Add Python data source write API > > > Key: SPARK-45654 > URL: https://issues.apache.org/jira/browse/SPARK-45654 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add Python data source write API in datasource.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45654) Add Python data source write API
[ https://issues.apache.org/jira/browse/SPARK-45654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45654: Assignee: Allison Wang > Add Python data source write API > > > Key: SPARK-45654 > URL: https://issues.apache.org/jira/browse/SPARK-45654 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Add Python data source write API in datasource.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45747) Support session window aggregation in state reader
[ https://issues.apache.org/jira/browse/SPARK-45747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoqin Li updated SPARK-45747: --- Summary: Support session window aggregation in state reader (was: Support session window operator in state reader) > Support session window aggregation in state reader > -- > > Key: SPARK-45747 > URL: https://issues.apache.org/jira/browse/SPARK-45747 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Chaoqin Li >Priority: Major > > We are introducing state reader in SPARK-45511, but currently session window > operator is not supported because the numColPrefixKey is unknown. We can read > the operator state metadata introduced in SPARK-45558 to determine the number > of prefix columns and load the state of session window correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45747) Support session window operator in state reader
Chaoqin Li created SPARK-45747: -- Summary: Support session window operator in state reader Key: SPARK-45747 URL: https://issues.apache.org/jira/browse/SPARK-45747 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.5.0 Reporter: Chaoqin Li We are introducing state reader in SPARK-45511, but currently session window operator is not supported because the numColPrefixKey is unknown. We can read the operator state metadata introduced in SPARK-45558 to determine the number of prefix columns and load the state of session window correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45741) Upgrade Netty to 4.1.100.Final
[ https://issues.apache.org/jira/browse/SPARK-45741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45741: - Assignee: Dongjoon Hyun > Upgrade Netty to 4.1.100.Final > -- > > Key: SPARK-45741 > URL: https://issues.apache.org/jira/browse/SPARK-45741 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45741) Upgrade Netty to 4.1.100.Final
[ https://issues.apache.org/jira/browse/SPARK-45741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45741. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43605 [https://github.com/apache/spark/pull/43605] > Upgrade Netty to 4.1.100.Final > -- > > Key: SPARK-45741 > URL: https://issues.apache.org/jira/browse/SPARK-45741 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45746) Return specific error messages if UDTF 'analyze' method accepts or returns wrong values
[ https://issues.apache.org/jira/browse/SPARK-45746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45746: --- Labels: pull-request-available (was: ) > Return specific error messages if UDTF 'analyze' method accepts or returns > wrong values > --- > > Key: SPARK-45746 > URL: https://issues.apache.org/jira/browse/SPARK-45746 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20075) Support classifier, packaging in Maven coordinates
[ https://issues.apache.org/jira/browse/SPARK-20075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781550#comment-17781550 ] Gera Shegalov commented on SPARK-20075: --- This would be a great feature that can help spark-rapids plugin users that require a non-default classifier such as cuda12 > Support classifier, packaging in Maven coordinates > -- > > Key: SPARK-20075 > URL: https://issues.apache.org/jira/browse/SPARK-20075 > Project: Spark > Issue Type: Improvement > Components: Spark Shell, Spark Submit >Affects Versions: 2.1.0 >Reporter: Sean R. Owen >Priority: Minor > Labels: bulk-closed > > Currently, it's possible to add dependencies to an app using its Maven > coordinates on the command line: {{group:artifact:version}}. However, really > Maven coordinates are 5-dimensional: > {{group:artifact:packaging:classifier:version}}. In some rare but real cases > it's important to be able to specify the classifier. And while we're at it > why not try to support packaging? > I have a WIP PR that I'll post soon. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45746) Return specific error messages if UDTF 'analyze' method accepts or returns wrong values
Daniel created SPARK-45746: -- Summary: Return specific error messages if UDTF 'analyze' method accepts or returns wrong values Key: SPARK-45746 URL: https://issues.apache.org/jira/browse/SPARK-45746 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Daniel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45739) Catch IOException instead of EOFException alone for faulthandler
[ https://issues.apache.org/jira/browse/SPARK-45739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45739: Assignee: Hyukjin Kwon > Catch IOException instead of EOFException alone for faulthandler > > > Key: SPARK-45739 > URL: https://issues.apache.org/jira/browse/SPARK-45739 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {{spark.python.worker.faulthandler.enabled}} can describe the fatal exception > such as segfault hanlder. Exceptions such as {{java.net.SocketException: > Connection reset}} can happen because the worker unexpectedly die. We should > better catch all IO exception there. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45739) Catch IOException instead of EOFException alone for faulthandler
[ https://issues.apache.org/jira/browse/SPARK-45739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45739. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43600 [https://github.com/apache/spark/pull/43600] > Catch IOException instead of EOFException alone for faulthandler > > > Key: SPARK-45739 > URL: https://issues.apache.org/jira/browse/SPARK-45739 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {{spark.python.worker.faulthandler.enabled}} can describe the fatal exception > such as segfault hanlder. Exceptions such as {{java.net.SocketException: > Connection reset}} can happen because the worker unexpectedly die. We should > better catch all IO exception there. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45745) Extremely slow execution of sum of columns in Spark 3.4.1
[ https://issues.apache.org/jira/browse/SPARK-45745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Javier updated SPARK-45745: --- Description: We are in the process of upgrading some pySpark jobs from Spark 3.1.2 to Spark 3.4.1 and some code that was running fine is now basically never ending even for small dataframes. We have simplified the problematic piece of code and the minimum pySpark example below shows the issue: {code:java} n_cols = 50 data = [{f"col{i}": i for i in range(n_cols)} for _ in range(5)] df_data = sql_context.createDataFrame(data) df_data = df_data.withColumn( "col_sum", sum([F.col(f"col{i}") for i in range(n_cols)]) ) df_data.show(10, False) {code} Basically, this code with Spark 3.1.2 runs fine but with 3.4.1 the computation time seems to explode when the value of `n_cols` is bigger than about 25 columns. A colleague suggested that it could be related to the limit of 22 elements in a tuple in Scala 2.13 (https://www.scala-lang.org/api/current/scala/Tuple22.html), since the 25 columns are suspiciously close to this. Is there any known defect in the logical plan optimization in 3.4.1? Or is this kind of operations (sum of multiple columns) supposed to be implemented differently? was: We are in the process of upgrading some pySpark jobs from Spark 3.1.2 to Spark 3.4.1 and some code that was running fine is now basically never ending even for small dataframes. We have simplified the problematic piece of code and the minimum pySpark example below shows the issue: {code:java} n_cols = 50 data = [{f"col{i}": i for i in range(n_cols)} for _ in range(5)] df_data = sql_context.createDataFrame(data) df_data = df_data.withColumn( "col_sum", sum([F.col(f"col{i}") for i in range(n_cols)]) ) df_data.show(10, False) {code} Basically, this code with Spark 3.1.2 runs fine but with 3.4.1 the computation time seems to explode when the value of `n_cols` is bigger than about 25 columns. A colleague suggested that it could be related to the limit of 22 elements in a tuple in Scala 2.13, since the 25 columns are suspiciously close to this. Is there any known defect in the logical plan optimization in 3.4.1? Or is this kind of operations (sum of multiple columns) supposed to be implemented differently? > Extremely slow execution of sum of columns in Spark 3.4.1 > - > > Key: SPARK-45745 > URL: https://issues.apache.org/jira/browse/SPARK-45745 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.1 >Reporter: Javier >Priority: Major > > We are in the process of upgrading some pySpark jobs from Spark 3.1.2 to > Spark 3.4.1 and some code that was running fine is now basically never ending > even for small dataframes. > We have simplified the problematic piece of code and the minimum pySpark > example below shows the issue: > {code:java} > n_cols = 50 > data = [{f"col{i}": i for i in range(n_cols)} for _ in range(5)] > df_data = sql_context.createDataFrame(data) > df_data = df_data.withColumn( > "col_sum", sum([F.col(f"col{i}") for i in range(n_cols)]) > ) > df_data.show(10, False) {code} > Basically, this code with Spark 3.1.2 runs fine but with 3.4.1 the > computation time seems to explode when the value of `n_cols` is bigger than > about 25 columns. A colleague suggested that it could be related to the limit > of 22 elements in a tuple in Scala 2.13 > (https://www.scala-lang.org/api/current/scala/Tuple22.html), since the 25 > columns are suspiciously close to this. Is there any known defect in the > logical plan optimization in 3.4.1? Or is this kind of operations (sum of > multiple columns) supposed to be implemented differently? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45745) Extremely slow execution of sum of columns in Spark 3.4.1
[ https://issues.apache.org/jira/browse/SPARK-45745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781539#comment-17781539 ] Javier commented on SPARK-45745: I originally posted a comment in StackOverflow asking for feedback on this ([https://stackoverflow.com/questions/77391731/extremely-slow-execution-in-spark-3-4-1-when-computing-the-sum-of-pyspark-datafr]) and a user there pointed me to a problem to a never ending UT reported here https://issues.apache.org/jira/browse/SPARK-43972 It is for the same Spark version, but I honestly don't know if this can be related. > Extremely slow execution of sum of columns in Spark 3.4.1 > - > > Key: SPARK-45745 > URL: https://issues.apache.org/jira/browse/SPARK-45745 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.1 >Reporter: Javier >Priority: Major > > We are in the process of upgrading some pySpark jobs from Spark 3.1.2 to > Spark 3.4.1 and some code that was running fine is now basically never ending > even for small dataframes. > We have simplified the problematic piece of code and the minimum pySpark > example below shows the issue: > {code:java} > n_cols = 50 > data = [{f"col{i}": i for i in range(n_cols)} for _ in range(5)] > df_data = sql_context.createDataFrame(data) > df_data = df_data.withColumn( > "col_sum", sum([F.col(f"col{i}") for i in range(n_cols)]) > ) > df_data.show(10, False) {code} > Basically, this code with Spark 3.1.2 runs fine but with 3.4.1 the > computation time seems to explode when the value of `n_cols` is bigger than > about 25 columns. A colleague suggested that it could be related to the limit > of 22 elements in a tuple in Scala 2.13, since the 25 columns are > suspiciously close to this. Is there any known defect in the logical plan > optimization in 3.4.1? Or is this kind of operations (sum of multiple > columns) supposed to be implemented differently? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45745) Extremely slow execution of sum of columns in Spark 3.4.1
Javier created SPARK-45745: -- Summary: Extremely slow execution of sum of columns in Spark 3.4.1 Key: SPARK-45745 URL: https://issues.apache.org/jira/browse/SPARK-45745 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.4.1 Reporter: Javier We are in the process of upgrading some pySpark jobs from Spark 3.1.2 to Spark 3.4.1 and some code that was running fine is now basically never ending even for small dataframes. We have simplified the problematic piece of code and the minimum pySpark example below shows the issue: {code:java} n_cols = 50 data = [{f"col{i}": i for i in range(n_cols)} for _ in range(5)] df_data = sql_context.createDataFrame(data) df_data = df_data.withColumn( "col_sum", sum([F.col(f"col{i}") for i in range(n_cols)]) ) df_data.show(10, False) {code} Basically, this code with Spark 3.1.2 runs fine but with 3.4.1 the computation time seems to explode when the value of `n_cols` is bigger than about 25 columns. A colleague suggested that it could be related to the limit of 22 elements in a tuple in Scala 2.13, since the 25 columns are suspiciously close to this. Is there any known defect in the logical plan optimization in 3.4.1? Or is this kind of operations (sum of multiple columns) supposed to be implemented differently? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"
[ https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781531#comment-17781531 ] Bruce Robbins commented on SPARK-45644: --- I will look into it and try to submit a fix. If I can't, I will ping someone who can. > After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException > "scala.Some is not a valid external type for schema of array" > -- > > Key: SPARK-45644 > URL: https://issues.apache.org/jira/browse/SPARK-45644 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Adi Wehrli >Priority: Major > > I do not really know if this is a bug, but I am at the end with my knowledge. > A Spark job ran successfully with Spark 3.2.x and 3.3.x. > But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job > with the same data the following always occurs now: > {code} > scala.Some is not a valid external type for schema of array > {code} > The corresponding stacktrace is: > {code} > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch > worker for task 0.0 in stage 0.0 (TID 0)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.scheduler.Task.run(Task.scala:141) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) > [spark-core_2.12-3.5.0.jar:3.5.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch > worker for task 1.0 in stage 0.0 (TID 1)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at >
[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"
[ https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781526#comment-17781526 ] Adi Wehrli commented on SPARK-45644: Good evening, [~bersprockets] Thanks for your reproduction. So, what does this mean now? Will we have a bugfix for this? Or do we have to migrate something somehow? Kind regards, Adi > After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException > "scala.Some is not a valid external type for schema of array" > -- > > Key: SPARK-45644 > URL: https://issues.apache.org/jira/browse/SPARK-45644 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Adi Wehrli >Priority: Major > > I do not really know if this is a bug, but I am at the end with my knowledge. > A Spark job ran successfully with Spark 3.2.x and 3.3.x. > But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job > with the same data the following always occurs now: > {code} > scala.Some is not a valid external type for schema of array > {code} > The corresponding stacktrace is: > {code} > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch > worker for task 0.0 in stage 0.0 (TID 0)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.scheduler.Task.run(Task.scala:141) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) > [spark-core_2.12-3.5.0.jar:3.5.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch > worker for task 1.0 in stage 0.0 (TID 1)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at >
[jira] [Comment Edited] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"
[ https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781526#comment-17781526 ] Adi Wehrli edited comment on SPARK-45644 at 10/31/23 9:38 PM: -- Good evening, [~bersprockets] Thanks a lot, that you could reproduce this. So, what does this mean now? Will we have a bugfix for this? Or do we have to migrate something somehow? Kind regards, Adi was (Author: JIRAUSER302746): Good evening, [~bersprockets] Thanks for your reproduction. So, what does this mean now? Will we have a bugfix for this? Or do we have to migrate something somehow? Kind regards, Adi > After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException > "scala.Some is not a valid external type for schema of array" > -- > > Key: SPARK-45644 > URL: https://issues.apache.org/jira/browse/SPARK-45644 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Adi Wehrli >Priority: Major > > I do not really know if this is a bug, but I am at the end with my knowledge. > A Spark job ran successfully with Spark 3.2.x and 3.3.x. > But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job > with the same data the following always occurs now: > {code} > scala.Some is not a valid external type for schema of array > {code} > The corresponding stacktrace is: > {code} > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch > worker for task 0.0 in stage 0.0 (TID 0)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.scheduler.Task.run(Task.scala:141) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) > [spark-core_2.12-3.5.0.jar:3.5.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 1.0 in stage 0.0 (TID
[jira] [Assigned] (SPARK-45744) Switch `spark.history.store.serializer` to use `PROTOBUF` by default
[ https://issues.apache.org/jira/browse/SPARK-45744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45744: - Assignee: Dongjoon Hyun > Switch `spark.history.store.serializer` to use `PROTOBUF` by default > > > Key: SPARK-45744 > URL: https://issues.apache.org/jira/browse/SPARK-45744 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision"
[ https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781505#comment-17781505 ] Hannah Amundson commented on SPARK-45699: - [~LuciferYang] Do you have any suggestions for tickets that are distributed? > Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it > loses precision" > -- > > Key: SPARK-45699 > URL: https://issues.apache.org/jira/browse/SPARK-45699 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold > [error] val threshold = max(speculationMultiplier * medianDuration, > minTimeToSpeculation) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks > [error] foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, > customizedThreshold = true) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48: > Widening conversion from Int to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getInt(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49: > Widening conversion from Long to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getLong(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble > [error] override def getDouble(i: Int): Double = getLong(i) > [error] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45744) Switch `spark.history.store.serializer` to use `PROTOBUF` by default
[ https://issues.apache.org/jira/browse/SPARK-45744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45744: --- Labels: pull-request-available (was: ) > Switch `spark.history.store.serializer` to use `PROTOBUF` by default > > > Key: SPARK-45744 > URL: https://issues.apache.org/jira/browse/SPARK-45744 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45744) Switch `spark.history.store.serializer` to use `PROTOBUF` by default
Dongjoon Hyun created SPARK-45744: - Summary: Switch `spark.history.store.serializer` to use `PROTOBUF` by default Key: SPARK-45744 URL: https://issues.apache.org/jira/browse/SPARK-45744 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45743) Upgrade dropwizard metrics 4.2.21
[ https://issues.apache.org/jira/browse/SPARK-45743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45743: --- Labels: pull-request-available (was: ) > Upgrade dropwizard metrics 4.2.21 > - > > Key: SPARK-45743 > URL: https://issues.apache.org/jira/browse/SPARK-45743 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > [https://github.com/dropwizard/metrics/releases/tag/v4.2.21] > [https://github.com/dropwizard/metrics/releases/tag/v4.2.20] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"
[ https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781494#comment-17781494 ] Bruce Robbins commented on SPARK-45644: --- OK, I can reproduce. I will take a look. I will also try to get my reproduction example down to a minimal case and will post here later. > After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException > "scala.Some is not a valid external type for schema of array" > -- > > Key: SPARK-45644 > URL: https://issues.apache.org/jira/browse/SPARK-45644 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Adi Wehrli >Priority: Major > > I do not really know if this is a bug, but I am at the end with my knowledge. > A Spark job ran successfully with Spark 3.2.x and 3.3.x. > But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job > with the same data the following always occurs now: > {code} > scala.Some is not a valid external type for schema of array > {code} > The corresponding stacktrace is: > {code} > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch > worker for task 0.0 in stage 0.0 (TID 0)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.scheduler.Task.run(Task.scala:141) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) > [spark-core_2.12-3.5.0.jar:3.5.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch > worker for task 1.0 in stage 0.0 (TID 1)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at >
[jira] [Created] (SPARK-45743) Upgrade dropwizard metrics 4.2.21
Yang Jie created SPARK-45743: Summary: Upgrade dropwizard metrics 4.2.21 Key: SPARK-45743 URL: https://issues.apache.org/jira/browse/SPARK-45743 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Yang Jie [https://github.com/dropwizard/metrics/releases/tag/v4.2.21] [https://github.com/dropwizard/metrics/releases/tag/v4.2.20] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44896) Consider adding information os_prio, cpu, elapsed, tid, nid, etc., from the jstack tool
[ https://issues.apache.org/jira/browse/SPARK-44896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781489#comment-17781489 ] Hannah Amundson commented on SPARK-44896: - Hello everyone (and [~yao]), I am a graduate student at the University of Texas (Computer Science). I have a project in my Distributed Systems course to contribute to an open source distributed project. Would it be okay if I worked on this ticket? Thanks for your help, Hannah > Consider adding information os_prio, cpu, elapsed, tid, nid, etc., from the > jstack tool > > > Key: SPARK-44896 > URL: https://issues.apache.org/jira/browse/SPARK-44896 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45190) XML: StructType schema issue in pyspark connect
[ https://issues.apache.org/jira/browse/SPARK-45190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781487#comment-17781487 ] Hannah Amundson commented on SPARK-45190: - Hello everyone (and [~sandip.agarwala] ), I am a graduate student at the University of Texas (Computer Science). I have a project in my Distributed Systems course to contribute to an open source distributed project. Would it be okay if I worked on this ticket? Thanks for your help, Hannah > XML: StructType schema issue in pyspark connect > --- > > Key: SPARK-45190 > URL: https://issues.apache.org/jira/browse/SPARK-45190 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Major > > The following PR added support for from_xml to pyspark. > https://github.com/apache/spark/pull/42938 > > However, StructType schema is resulting in schema parse error for pyspark > connect. > Filing a Jira to track this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision"
[ https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781488#comment-17781488 ] Hannah Amundson commented on SPARK-45699: - Hello everyone (and [~LuciferYang]), I am a graduate student at the University of Texas (Computer Science). I have a project in my Distributed Systems course to contribute to an open source distributed project. Would it be okay if I worked on this ticket? Thanks for your help, Hannah > Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it > loses precision" > -- > > Key: SPARK-45699 > URL: https://issues.apache.org/jira/browse/SPARK-45699 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold > [error] val threshold = max(speculationMultiplier * medianDuration, > minTimeToSpeculation) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks > [error] foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, > customizedThreshold = true) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48: > Widening conversion from Int to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getInt(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49: > Widening conversion from Long to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getLong(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble > [error] override def getDouble(i: Int): Double = getLong(i) > [error] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38473) Use error classes in org.apache.spark.scheduler
[ https://issues.apache.org/jira/browse/SPARK-38473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781486#comment-17781486 ] Hannah Amundson commented on SPARK-38473: - Hello, I am a graduate student at the University of Texas (Computer Science). I have a project in my Distributed Systems course to contribute to an open source distributed project. Would it be okay if I worked on this ticket? Thanks for your help, Hannah > Use error classes in org.apache.spark.scheduler > --- > > Key: SPARK-38473 > URL: https://issues.apache.org/jira/browse/SPARK-38473 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38473) Use error classes in org.apache.spark.scheduler
[ https://issues.apache.org/jira/browse/SPARK-38473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781486#comment-17781486 ] Hannah Amundson edited comment on SPARK-38473 at 10/31/23 7:08 PM: --- Hello everyone (and [~bozhang]), I am a graduate student at the University of Texas (Computer Science). I have a project in my Distributed Systems course to contribute to an open source distributed project. Would it be okay if I worked on this ticket? Thanks for your help, Hannah was (Author: hannahkamundson): Hello, I am a graduate student at the University of Texas (Computer Science). I have a project in my Distributed Systems course to contribute to an open source distributed project. Would it be okay if I worked on this ticket? Thanks for your help, Hannah > Use error classes in org.apache.spark.scheduler > --- > > Key: SPARK-38473 > URL: https://issues.apache.org/jira/browse/SPARK-38473 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45742) Introduce an implicit function for Scala Array to wrap into `immutable.ArraySeq`.
[ https://issues.apache.org/jira/browse/SPARK-45742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45742: - Summary: Introduce an implicit function for Scala Array to wrap into `immutable.ArraySeq`. (was: Introduce an implicit method for Scala Array to wrap into `immutable.ArraySeq`.) > Introduce an implicit function for Scala Array to wrap into > `immutable.ArraySeq`. > - > > Key: SPARK-45742 > URL: https://issues.apache.org/jira/browse/SPARK-45742 > Project: Spark > Issue Type: Sub-task > Components: Connect, MLlib, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > Currently, we need to use `immutable.ArraySeq.unsafeWrapArray(array)` to wrap > an Array into an `immutable.ArraySeq`, which makes the code look bloated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45719) Upgrade AWS SDK to v2 for Kubernetes integration tests
[ https://issues.apache.org/jira/browse/SPARK-45719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45719: --- Labels: pull-request-available (was: ) > Upgrade AWS SDK to v2 for Kubernetes integration tests > -- > > Key: SPARK-45719 > URL: https://issues.apache.org/jira/browse/SPARK-45719 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core >Affects Versions: 3.5.0 >Reporter: Lantao Jin >Priority: Major > Labels: pull-request-available > > Sub-task of [SPARK-44124|https://issues.apache.org/jira/browse/SPARK-44124]. > In this issue, we will upgrade AWS SDK in Credentials providers, AWS clients > and related Kubernetes integration tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45741) Upgrade Netty to 4.1.100.Final
Dongjoon Hyun created SPARK-45741: - Summary: Upgrade Netty to 4.1.100.Final Key: SPARK-45741 URL: https://issues.apache.org/jira/browse/SPARK-45741 Project: Spark Issue Type: Bug Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45741) Upgrade Netty to 4.1.100.Final
[ https://issues.apache.org/jira/browse/SPARK-45741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45741: --- Labels: pull-request-available (was: ) > Upgrade Netty to 4.1.100.Final > -- > > Key: SPARK-45741 > URL: https://issues.apache.org/jira/browse/SPARK-45741 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45172) Upgrade commons-compress.version to 1.24.0
[ https://issues.apache.org/jira/browse/SPARK-45172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45172: -- Summary: Upgrade commons-compress.version to 1.24.0 (was: Upgrade commons-compress.version from 1.23.0 to 1.24.0) > Upgrade commons-compress.version to 1.24.0 > -- > > Key: SPARK-45172 > URL: https://issues.apache.org/jira/browse/SPARK-45172 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45282) Join loses records for cached datasets
[ https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45282: -- Target Version/s: 3.4.2, 3.5.1 (was: 3.4.2) > Join loses records for cached datasets > -- > > Key: SPARK-45282 > URL: https://issues.apache.org/jira/browse/SPARK-45282 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 > Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or > databricks 13.3 >Reporter: koert kuipers >Priority: Blocker > Labels: CorrectnessBug, correctness > > we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is > not present on spark 3.3.1. > it only shows up in distributed environment. i cannot replicate in unit test. > however i did get it to show up on hadoop cluster, kubernetes, and on > databricks 13.3 > the issue is that records are dropped when two cached dataframes are joined. > it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an > optimization while in spark 3.3.1 these Exhanges are still present. it seems > to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true. > to reproduce on distributed cluster these settings needed are: > {code:java} > spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432 > spark.sql.adaptive.coalescePartitions.parallelismFirst false > spark.sql.adaptive.enabled true > spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code} > code using scala to reproduce is: > {code:java} > import java.util.UUID > import org.apache.spark.sql.functions.col > import spark.implicits._ > val data = (1 to 100).toDS().map(i => > UUID.randomUUID().toString).persist() > val left = data.map(k => (k, 1)) > val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works! > println("number of left " + left.count()) > println("number of right " + right.count()) > println("number of (left join right) " + > left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count() > ) > val left1 = left > .toDF("key", "value1") > .repartition(col("key")) // comment out this line to make it work > .persist() > println("number of left1 " + left1.count()) > val right1 = right > .toDF("key", "value2") > .repartition(col("key")) // comment out this line to make it work > .persist() > println("number of right1 " + right1.count()) > println("number of (left1 join right1) " + left1.join(right1, > "key").count()) // this gives incorrect result{code} > this produces the following output: > {code:java} > number of left 100 > number of right 100 > number of (left join right) 100 > number of left1 100 > number of right1 100 > number of (left1 join right1) 859531 {code} > note that the last number (the incorrect one) actually varies depending on > settings and cluster size etc. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45282) Join loses records for cached datasets
[ https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781464#comment-17781464 ] Dongjoon Hyun commented on SPARK-45282: --- Thank you for sharing, [~koert]. > Join loses records for cached datasets > -- > > Key: SPARK-45282 > URL: https://issues.apache.org/jira/browse/SPARK-45282 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 > Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or > databricks 13.3 >Reporter: koert kuipers >Priority: Blocker > Labels: CorrectnessBug, correctness > > we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is > not present on spark 3.3.1. > it only shows up in distributed environment. i cannot replicate in unit test. > however i did get it to show up on hadoop cluster, kubernetes, and on > databricks 13.3 > the issue is that records are dropped when two cached dataframes are joined. > it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an > optimization while in spark 3.3.1 these Exhanges are still present. it seems > to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true. > to reproduce on distributed cluster these settings needed are: > {code:java} > spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432 > spark.sql.adaptive.coalescePartitions.parallelismFirst false > spark.sql.adaptive.enabled true > spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code} > code using scala to reproduce is: > {code:java} > import java.util.UUID > import org.apache.spark.sql.functions.col > import spark.implicits._ > val data = (1 to 100).toDS().map(i => > UUID.randomUUID().toString).persist() > val left = data.map(k => (k, 1)) > val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works! > println("number of left " + left.count()) > println("number of right " + right.count()) > println("number of (left join right) " + > left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count() > ) > val left1 = left > .toDF("key", "value1") > .repartition(col("key")) // comment out this line to make it work > .persist() > println("number of left1 " + left1.count()) > val right1 = right > .toDF("key", "value2") > .repartition(col("key")) // comment out this line to make it work > .persist() > println("number of right1 " + right1.count()) > println("number of (left1 join right1) " + left1.join(right1, > "key").count()) // this gives incorrect result{code} > this produces the following output: > {code:java} > number of left 100 > number of right 100 > number of (left join right) 100 > number of left1 100 > number of right1 100 > number of (left1 join right1) 859531 {code} > note that the last number (the incorrect one) actually varies depending on > settings and cluster size etc. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45737) Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function.
[ https://issues.apache.org/jira/browse/SPARK-45737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45737: - Assignee: Yang Jie > Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` > function. > --- > > Key: SPARK-45737 > URL: https://issues.apache.org/jira/browse/SPARK-45737 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > > {code:java} > if (takeFromEnd) { > while (buf.length < n && i < res.length) { > val rows = decodeUnsafeRows(res(i)._2) > if (n - buf.length >= res(i)._1) { > buf.prependAll(rows.toArray[InternalRow]) > } else { > val dropUntil = res(i)._1 - (n - buf.length) > // Same as Iterator.drop but this only takes a long. > var j: Long = 0L > while (j < dropUntil) { rows.next(); j += 1L} > buf.prependAll(rows.toArray[InternalRow]) > } > i += 1 > } > } else { > while (buf.length < n && i < res.length) { > val rows = decodeUnsafeRows(res(i)._2) > if (n - buf.length >= res(i)._1) { > buf ++= rows.toArray[InternalRow] > } else { > buf ++= rows.take(n - buf.length).toArray[InternalRow] > } > i += 1 > } > } {code} > In the above code, the input parameters of `mutable.Buffer#prependAll` and > `mutable.Growable#++=` functions are `IterableOnce`, and the type of rows is > `Iterator[InternalRow]`, which inherits from `IterableOnce`, so there is no > need to cast to an array of InternalRow anymore. > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45737) Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function.
[ https://issues.apache.org/jira/browse/SPARK-45737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45737. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43599 [https://github.com/apache/spark/pull/43599] > Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` > function. > --- > > Key: SPARK-45737 > URL: https://issues.apache.org/jira/browse/SPARK-45737 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > if (takeFromEnd) { > while (buf.length < n && i < res.length) { > val rows = decodeUnsafeRows(res(i)._2) > if (n - buf.length >= res(i)._1) { > buf.prependAll(rows.toArray[InternalRow]) > } else { > val dropUntil = res(i)._1 - (n - buf.length) > // Same as Iterator.drop but this only takes a long. > var j: Long = 0L > while (j < dropUntil) { rows.next(); j += 1L} > buf.prependAll(rows.toArray[InternalRow]) > } > i += 1 > } > } else { > while (buf.length < n && i < res.length) { > val rows = decodeUnsafeRows(res(i)._2) > if (n - buf.length >= res(i)._1) { > buf ++= rows.toArray[InternalRow] > } else { > buf ++= rows.take(n - buf.length).toArray[InternalRow] > } > i += 1 > } > } {code} > In the above code, the input parameters of `mutable.Buffer#prependAll` and > `mutable.Growable#++=` functions are `IterableOnce`, and the type of rows is > `Iterator[InternalRow]`, which inherits from `IterableOnce`, so there is no > need to cast to an array of InternalRow anymore. > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45700) Fix `The outer reference in this type test cannot be checked at run time`
[ https://issues.apache.org/jira/browse/SPARK-45700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45700. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43582 [https://github.com/apache/spark/pull/43582] > Fix `The outer reference in this type test cannot be checked at run time` > - > > Key: SPARK-45700 > URL: https://issues.apache.org/jira/browse/SPARK-45700 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:324:12: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.createScalaTestCase > [error] case udfTestCase: UDFTest > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:506:12: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries > [error] case udfTestCase: UDFTest => > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:508:12: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries > [error] case udtfTestCase: UDTFSetTest => > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:514:13: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries > [error] case _: PgSQLTest => > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:522:13: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries > [error] case _: AnsiTest => > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:524:13: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries > [error] case _: TimestampNTZTest => > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:584:12: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries.clue > [error] case udfTestCase: UDFTest > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:596:12: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries.clue > [error] case udtfTestCase: UDTFSetTest > [error] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45702) Fix `the type test for pattern TypeA cannot be checked at runtime`
[ https://issues.apache.org/jira/browse/SPARK-45702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45702: - Assignee: Yang Jie > Fix `the type test for pattern TypeA cannot be checked at runtime` > -- > > Key: SPARK-45702 > URL: https://issues.apache.org/jira/browse/SPARK-45702 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > {code:java} > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala:100:21: > the type test for pattern org.apache.spark.RangePartitioner[K,V] cannot be > checked at runtime because it has type parameters eliminated by erasure > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.rdd.OrderedRDDFunctions.filterByRange.rddToFilter > [error] case Some(rp: RangePartitioner[K, V]) => > [error] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45702) Fix `the type test for pattern TypeA cannot be checked at runtime`
[ https://issues.apache.org/jira/browse/SPARK-45702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45702. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43582 [https://github.com/apache/spark/pull/43582] > Fix `the type test for pattern TypeA cannot be checked at runtime` > -- > > Key: SPARK-45702 > URL: https://issues.apache.org/jira/browse/SPARK-45702 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 4.0.0 > > > {code:java} > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala:100:21: > the type test for pattern org.apache.spark.RangePartitioner[K,V] cannot be > checked at runtime because it has type parameters eliminated by erasure > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.rdd.OrderedRDDFunctions.filterByRange.rddToFilter > [error] case Some(rp: RangePartitioner[K, V]) => > [error] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45703) Fix `abstract type TypeA in type pattern Some[TypeA] is unchecked since it is eliminated by erasure`
[ https://issues.apache.org/jira/browse/SPARK-45703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45703. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43582 [https://github.com/apache/spark/pull/43582] > Fix `abstract type TypeA in type pattern Some[TypeA] is unchecked since it is > eliminated by erasure` > > > Key: SPARK-45703 > URL: https://issues.apache.org/jira/browse/SPARK-45703 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 4.0.0 > > > {code:java} > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala:105:19: > abstract type ScalaInputType in type pattern Some[ScalaInputType] is > unchecked since it is eliminated by erasure > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.catalyst.CatalystTypeConverters.CatalystTypeConverter.toCatalyst > [error] case opt: Some[ScalaInputType] => toCatalystImpl(opt.get) > [error] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45700) Fix `The outer reference in this type test cannot be checked at run time`
[ https://issues.apache.org/jira/browse/SPARK-45700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45700: - Assignee: Yang Jie > Fix `The outer reference in this type test cannot be checked at run time` > - > > Key: SPARK-45700 > URL: https://issues.apache.org/jira/browse/SPARK-45700 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > > {code:java} > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:324:12: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.createScalaTestCase > [error] case udfTestCase: UDFTest > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:506:12: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries > [error] case udfTestCase: UDFTest => > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:508:12: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries > [error] case udtfTestCase: UDTFSetTest => > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:514:13: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries > [error] case _: PgSQLTest => > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:522:13: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries > [error] case _: AnsiTest => > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:524:13: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries > [error] case _: TimestampNTZTest => > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:584:12: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries.clue > [error] case udfTestCase: UDFTest > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:596:12: > The outer reference in this type test cannot be checked at run time. > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unchecked, > site=org.apache.spark.sql.SQLQueryTestSuite.runQueries.clue > [error] case udtfTestCase: UDTFSetTest > [error] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45683) Fix `method any2stringadd in object Predef is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45683: - Assignee: Yang Jie > Fix `method any2stringadd in object Predef is deprecated` > - > > Key: SPARK-45683 > URL: https://issues.apache.org/jira/browse/SPARK-45683 > Project: Spark > Issue Type: Sub-task > Components: Build, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:720:17: > method any2stringadd in object Predef is deprecated (since 2.13.0): Implicit > injection of + is deprecated. Convert to String to call + > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.catalyst.expressions.BinaryExpression.nullSafeCodeGen.nullSafeEval, > origin=scala.Predef.any2stringadd, version=2.13.0 > [warn] leftGen.code + ctx.nullSafeExec(left.nullable, leftGen.isNull) > { > [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45683) Fix `method any2stringadd in object Predef is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45683. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43580 [https://github.com/apache/spark/pull/43580] > Fix `method any2stringadd in object Predef is deprecated` > - > > Key: SPARK-45683 > URL: https://issues.apache.org/jira/browse/SPARK-45683 > Project: Spark > Issue Type: Sub-task > Components: Build, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:720:17: > method any2stringadd in object Predef is deprecated (since 2.13.0): Implicit > injection of + is deprecated. Convert to String to call + > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.catalyst.expressions.BinaryExpression.nullSafeCodeGen.nullSafeEval, > origin=scala.Predef.any2stringadd, version=2.13.0 > [warn] leftGen.code + ctx.nullSafeExec(left.nullable, leftGen.isNull) > { > [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45725) remove the non-default IN subquery runtime filter
[ https://issues.apache.org/jira/browse/SPARK-45725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45725. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43585 [https://github.com/apache/spark/pull/43585] > remove the non-default IN subquery runtime filter > - > > Key: SPARK-45725 > URL: https://issues.apache.org/jira/browse/SPARK-45725 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45687) Fix `Passing an explicit array value to a Scala varargs method is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781420#comment-17781420 ] Yang Jie commented on SPARK-45687: -- Thanks [~ivoson] > Fix `Passing an explicit array value to a Scala varargs method is deprecated` > - > > Key: SPARK-45687 > URL: https://issues.apache.org/jira/browse/SPARK-45687 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala:945:21: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.AggregationQuerySuite, version=2.13.0 > [warn] df.agg(udaf(allColumns: _*)), > [warn] ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:156:48: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.agg(aggFunctions.head, aggFunctions.tail: _*), > [warn] ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:161:76: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.groupBy($"id" % 4 as "mod").agg(aggFunctions.head, > aggFunctions.tail: _*), > [warn] > ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:171:50: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.agg(aggFunctions.head, aggFunctions.tail: _*), > [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45687) Fix `Passing an explicit array value to a Scala varargs method is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781419#comment-17781419 ] Tengfei Huang commented on SPARK-45687: --- I will work on this. > Fix `Passing an explicit array value to a Scala varargs method is deprecated` > - > > Key: SPARK-45687 > URL: https://issues.apache.org/jira/browse/SPARK-45687 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala:945:21: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.AggregationQuerySuite, version=2.13.0 > [warn] df.agg(udaf(allColumns: _*)), > [warn] ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:156:48: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.agg(aggFunctions.head, aggFunctions.tail: _*), > [warn] ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:161:76: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.groupBy($"id" % 4 as "mod").agg(aggFunctions.head, > aggFunctions.tail: _*), > [warn] > ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:171:50: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.agg(aggFunctions.head, aggFunctions.tail: _*), > [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41533) GRPC Errors on the client should be cleaned up
[ https://issues.apache.org/jira/browse/SPARK-41533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-41533: --- Labels: pull-request-available (was: ) > GRPC Errors on the client should be cleaned up > -- > > Key: SPARK-41533 > URL: https://issues.apache.org/jira/browse/SPARK-41533 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > When the server throws an exception we report a very deep stack trace that is > not helpful for the user. > We need to separate the cause from the user visible exception and wrap the > error into custom exception instead of publishing the RPCError from GRPC -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781374#comment-17781374 ] Tengfei Huang commented on SPARK-45694: --- sure, will include [SPARK-45695] Fix `method force in trait View is deprecated` - ASF JIRA (apache.org) in one PR. > Fix `method signum in trait ScalaNumberProxy is deprecated` > --- > > Key: SPARK-45694 > URL: https://issues.apache.org/jira/browse/SPARK-45694 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25: > method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use > `sign` method instead > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc, > origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0 > [warn] val uc = useCount.signum > [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
[ https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-45368. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43456 [https://github.com/apache/spark/pull/43456] > Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal > --- > > Key: SPARK-45368 > URL: https://issues.apache.org/jira/browse/SPARK-45368 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: tangjiafu >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
[ https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-45368: Assignee: tangjiafu > Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal > --- > > Key: SPARK-45368 > URL: https://issues.apache.org/jira/browse/SPARK-45368 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: tangjiafu >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"
[ https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781348#comment-17781348 ] Adi Wehrli commented on SPARK-45644: But I can really not say which job statement causes this problem. I'm not sure but I suppose that it could have something to do with {{org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer}} (from {{org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0}}) and the likes. > After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException > "scala.Some is not a valid external type for schema of array" > -- > > Key: SPARK-45644 > URL: https://issues.apache.org/jira/browse/SPARK-45644 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Adi Wehrli >Priority: Major > > I do not really know if this is a bug, but I am at the end with my knowledge. > A Spark job ran successfully with Spark 3.2.x and 3.3.x. > But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job > with the same data the following always occurs now: > {code} > scala.Some is not a valid external type for schema of array > {code} > The corresponding stacktrace is: > {code} > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch > worker for task 0.0 in stage 0.0 (TID 0)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.scheduler.Task.run(Task.scala:141) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) > [spark-core_2.12-3.5.0.jar:3.5.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch > worker for task 1.0 in stage 0.0 (TID 1)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at >
[jira] [Updated] (SPARK-45740) Relax the node prefix of SparkPlanGraphCluster
[ https://issues.apache.org/jira/browse/SPARK-45740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45740: --- Labels: pull-request-available (was: ) > Relax the node prefix of SparkPlanGraphCluster > -- > > Key: SPARK-45740 > URL: https://issues.apache.org/jira/browse/SPARK-45740 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45740) Relax the node prefix of SparkPlanGraphCluster
[ https://issues.apache.org/jira/browse/SPARK-45740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-45740: -- Summary: Relax the node prefix of SparkPlanGraphCluster (was: Release the node prefix of SparkPlanGraphCluster) > Relax the node prefix of SparkPlanGraphCluster > -- > > Key: SPARK-45740 > URL: https://issues.apache.org/jira/browse/SPARK-45740 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45740) Release the node prefix of SparkPlanGraphCluster
XiDuo You created SPARK-45740: - Summary: Release the node prefix of SparkPlanGraphCluster Key: SPARK-45740 URL: https://issues.apache.org/jira/browse/SPARK-45740 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: XiDuo You -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45732) Upgrade commons-text to 1.11.0
[ https://issues.apache.org/jira/browse/SPARK-45732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45732. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43590 [https://github.com/apache/spark/pull/43590] > Upgrade commons-text to 1.11.0 > -- > > Key: SPARK-45732 > URL: https://issues.apache.org/jira/browse/SPARK-45732 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45732) Upgrade commons-text to 1.11.0
[ https://issues.apache.org/jira/browse/SPARK-45732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45732: Assignee: BingKun Pan > Upgrade commons-text to 1.11.0 > -- > > Key: SPARK-45732 > URL: https://issues.apache.org/jira/browse/SPARK-45732 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45738) client will wait forever if session in spark connect server is evicted
[ https://issues.apache.org/jira/browse/SPARK-45738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45738: --- Labels: pull-request-available (was: ) > client will wait forever if session in spark connect server is evicted > -- > > Key: SPARK-45738 > URL: https://issues.apache.org/jira/browse/SPARK-45738 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: xie shuiahu >Priority: Critical > Labels: pull-request-available > > Step1. start a spark connect server > Step2. submit a spark job which will run long > {code:java} > from pyspark.sql import SparkSession > spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create() > spark.sql("A SQL will run longer than creating 100 sessions").show() {code} > > Step3. create more than 100 sessions > Tips: Run concurrently with step2 > {code:java} > for i in range(0, 200): > spark = > SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create() > spark.sql("show databases") {code} > > *When the python code in step3 is executed, the session created in step2 will > be evicted, and the client will wait forever* > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45739) Catch IOException instead of EOFException alone for faulthandler
[ https://issues.apache.org/jira/browse/SPARK-45739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45739: --- Labels: pull-request-available (was: ) > Catch IOException instead of EOFException alone for faulthandler > > > Key: SPARK-45739 > URL: https://issues.apache.org/jira/browse/SPARK-45739 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {{spark.python.worker.faulthandler.enabled}} can describe the fatal exception > such as segfault hanlder. Exceptions such as {{java.net.SocketException: > Connection reset}} can happen because the worker unexpectedly die. We should > better catch all IO exception there. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45739) Catch IOException instead of EOFException alone for faulthandler
Hyukjin Kwon created SPARK-45739: Summary: Catch IOException instead of EOFException alone for faulthandler Key: SPARK-45739 URL: https://issues.apache.org/jira/browse/SPARK-45739 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {{spark.python.worker.faulthandler.enabled}} can describe the fatal exception such as segfault hanlder. Exceptions such as {{java.net.SocketException: Connection reset}} can happen because the worker unexpectedly die. We should better catch all IO exception there. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45738) client will wait forever if session in spark connect server is evicted
[ https://issues.apache.org/jira/browse/SPARK-45738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xie shuiahu updated SPARK-45738: Description: Step1. start a spark connect server Step2. submit a spark job which will run long {code:java} from pyspark.sql import SparkSession spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create() spark.sql("A SQL will run longer than creating 100 sessions").show() {code} Step3. create more than 100 sessions Tips: Run concurrently with step2 {code:java} for i in range(0, 200): spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create() spark.sql("show databases") {code} *When the python code in step3 is executed, the session created in step2 will be evicted, and the client will wait forever* was: Step1. start a spark connect server Step2. submit a spark job which will run long {code:java} from pyspark.sql import SparkSession spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create() spark.sql("A SQL will run longer than creating 100 sessions").show() {code} Step3. create more than 100 sessions Tips: Run concurrently with step2 {code:java} for i in range(0, 200): spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create() spark.sql("show databases") {code} *When the python code in step3 is executed, the session created in step2 will be evicted, and the client will wait forever* The server will log exception like this: > client will wait forever if session in spark connect server is evicted > -- > > Key: SPARK-45738 > URL: https://issues.apache.org/jira/browse/SPARK-45738 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: xie shuiahu >Priority: Critical > > Step1. start a spark connect server > Step2. submit a spark job which will run long > {code:java} > from pyspark.sql import SparkSession > spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create() > spark.sql("A SQL will run longer than creating 100 sessions").show() {code} > > Step3. create more than 100 sessions > Tips: Run concurrently with step2 > {code:java} > for i in range(0, 200): > spark = > SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create() > spark.sql("show databases") {code} > > *When the python code in step3 is executed, the session created in step2 will > be evicted, and the client will wait forever* > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45738) client will wait forever if session in spark connect server is evicted
xie shuiahu created SPARK-45738: --- Summary: client will wait forever if session in spark connect server is evicted Key: SPARK-45738 URL: https://issues.apache.org/jira/browse/SPARK-45738 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: xie shuiahu Step1. start a spark connect server Step2. submit a spark job which will run long {code:java} from pyspark.sql import SparkSession spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id=job").create() spark.sql("A SQL will run longer than creating 100 sessions").show() {code} Step3. create more than 100 sessions Tips: Run concurrently with step2 {code:java} for i in range(0, 200): spark = SparkSession.builder.remote(f"sc://HOST:PORT/;user_id={i}").create() spark.sql("show databases") {code} *When the python code in step3 is executed, the session created in step2 will be evicted, and the client will wait forever* The server will log exception like this: -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"
[ https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781276#comment-17781276 ] Adi Wehrli commented on SPARK-45644: Thanks for this information, [~bersprockets]. I could now cut the {{MapObjects_10}} method for either version: h4. Spark 3.5.0 {code:java} private ArrayData MapObjects_10(InternalRow i) { scala.Option value_2284 = null; if (!isNull_ExternalMapToCatalyst_value_lambda_variable_42) { if (value_ExternalMapToCatalyst_value_lambda_variable_42.getClass().isArray() || value_ExternalMapToCatalyst_value_lambda_variable_42 instanceof scala.collection.Seq || value_ExternalMapToCatalyst_value_lambda_variable_42 instanceof scala.collection.immutable.Set || value_ExternalMapToCatalyst_value_lambda_variable_42 instanceof java.util.List) { value_2284 = (scala.Option) value_ExternalMapToCatalyst_value_lambda_variable_42; } else { throw new RuntimeException(value_ExternalMapToCatalyst_value_lambda_variable_42.getClass().getName() + ((java.lang.String) references[212] /* errMsg */)); } } final boolean isNull_1963 = isNull_ExternalMapToCatalyst_value_lambda_variable_42 || value_2284.isEmpty(); scala.collection.Seq value_2283 = isNull_1963 ? null : (scala.collection.Seq) value_2284.get(); ArrayData value_2282 = null; if (!isNull_1963) { int dataLength_10 = value_2283.size(); UTF8String[] convertedArray_10 = null; convertedArray_10 = new UTF8String[dataLength_10]; int loopIndex_10 = 0; scala.collection.Iterator it_10 = value_2283.toIterator(); while (loopIndex_10 < dataLength_10) { value_MapObject_lambda_variable_43 = (java.lang.Object) (it_10.next()); isNull_MapObject_lambda_variable_43 = value_MapObject_lambda_variable_43 == null; resultIsNull_127 = false; if (!resultIsNull_127) { java.lang.String value_2286 = null; if (!isNull_MapObject_lambda_variable_43) { if (value_MapObject_lambda_variable_43 instanceof java.lang.String) { value_2286 = (java.lang.String) value_MapObject_lambda_variable_43; } else { throw new RuntimeException(value_MapObject_lambda_variable_43.getClass().getName() + ((java.lang.String) references[213] /* errMsg */)); } } resultIsNull_127 = isNull_MapObject_lambda_variable_43; mutableStateArray_0[121] = value_2286; } boolean isNull_1965 = resultIsNull_127; UTF8String value_2285 = null; if (!resultIsNull_127) { value_2285 = org.apache.spark.unsafe.types.UTF8String.fromString(mutableStateArray_0[121]); } if (isNull_1965) { convertedArray_10[loopIndex_10] = null; } else { convertedArray_10[loopIndex_10] = value_2285; } loopIndex_10 += 1; } value_2282 = new org.apache.spark.sql.catalyst.util.GenericArrayData(convertedArray_10); } globalIsNull_320 = isNull_1963; return value_2282; } {code} h4. Spark 3.3.3: {code:java} private scala.collection.Seq MapObjects_10(InternalRow i) { scala.collection.Seq value_1083 = null; if (!isNull_CatalystToExternalMap_value_lambda_variable_21) { int dataLength_11 = value_CatalystToExternalMap_value_lambda_variable_21.numElements(); scala.collection.mutable.Builder collectionBuilder_10 = scala.collection.Seq$.MODULE$.newBuilder(); collectionBuilder_10.sizeHint(dataLength_11); int loopIndex_11 = 0; while (loopIndex_11 < dataLength_11) { value_MapObject_lambda_variable_22 = (UTF8String) (value_CatalystToExternalMap_value_lambda_variable_21.getUTF8String(loopIndex_11)); isNull_MapObject_lambda_variable_22 = value_CatalystToExternalMap_value_lambda_variable_21.isNullAt(loopIndex_11); boolean isNull_957 = true; java.lang.String value_1084 = null; if (!isNull_MapObject_lambda_variable_22) { isNull_957 = false; if (!isNull_957) { Object funcResult_121 = null; funcResult_121 = value_MapObject_lambda_variable_22.toString(); value_1084 = (java.lang.String) funcResult_121; } } if (isNull_957) { collectionBuilder_10.$plus$eq(null); } else { collectionBuilder_10.$plus$eq(value_1084); } loopIndex_11 += 1; } value_1083 = (scala.collection.Seq) collectionBuilder_10.result(); } globalIsNull_81 = isNull_CatalystToExternalMap_value_lambda_variable_21; return value_1083; } {code} > After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException > "scala.Some is not a valid external type for schema of array" > -- > > Key: SPARK-45644 > URL: https://issues.apache.org/jira/browse/SPARK-45644 > Project: Spark >
[jira] [Assigned] (SPARK-45737) Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function.
[ https://issues.apache.org/jira/browse/SPARK-45737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45737: -- Assignee: Apache Spark > Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` > function. > --- > > Key: SPARK-45737 > URL: https://issues.apache.org/jira/browse/SPARK-45737 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > {code:java} > if (takeFromEnd) { > while (buf.length < n && i < res.length) { > val rows = decodeUnsafeRows(res(i)._2) > if (n - buf.length >= res(i)._1) { > buf.prependAll(rows.toArray[InternalRow]) > } else { > val dropUntil = res(i)._1 - (n - buf.length) > // Same as Iterator.drop but this only takes a long. > var j: Long = 0L > while (j < dropUntil) { rows.next(); j += 1L} > buf.prependAll(rows.toArray[InternalRow]) > } > i += 1 > } > } else { > while (buf.length < n && i < res.length) { > val rows = decodeUnsafeRows(res(i)._2) > if (n - buf.length >= res(i)._1) { > buf ++= rows.toArray[InternalRow] > } else { > buf ++= rows.take(n - buf.length).toArray[InternalRow] > } > i += 1 > } > } {code} > In the above code, the input parameters of `mutable.Buffer#prependAll` and > `mutable.Growable#++=` functions are `IterableOnce`, and the type of rows is > `Iterator[InternalRow]`, which inherits from `IterableOnce`, so there is no > need to cast to an array of InternalRow anymore. > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45737) Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function.
[ https://issues.apache.org/jira/browse/SPARK-45737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45737: -- Assignee: (was: Apache Spark) > Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` > function. > --- > > Key: SPARK-45737 > URL: https://issues.apache.org/jira/browse/SPARK-45737 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > {code:java} > if (takeFromEnd) { > while (buf.length < n && i < res.length) { > val rows = decodeUnsafeRows(res(i)._2) > if (n - buf.length >= res(i)._1) { > buf.prependAll(rows.toArray[InternalRow]) > } else { > val dropUntil = res(i)._1 - (n - buf.length) > // Same as Iterator.drop but this only takes a long. > var j: Long = 0L > while (j < dropUntil) { rows.next(); j += 1L} > buf.prependAll(rows.toArray[InternalRow]) > } > i += 1 > } > } else { > while (buf.length < n && i < res.length) { > val rows = decodeUnsafeRows(res(i)._2) > if (n - buf.length >= res(i)._1) { > buf ++= rows.toArray[InternalRow] > } else { > buf ++= rows.take(n - buf.length).toArray[InternalRow] > } > i += 1 > } > } {code} > In the above code, the input parameters of `mutable.Buffer#prependAll` and > `mutable.Growable#++=` functions are `IterableOnce`, and the type of rows is > `Iterator[InternalRow]`, which inherits from `IterableOnce`, so there is no > need to cast to an array of InternalRow anymore. > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45737) Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function.
[ https://issues.apache.org/jira/browse/SPARK-45737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45737: --- Labels: pull-request-available (was: ) > Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` > function. > --- > > Key: SPARK-45737 > URL: https://issues.apache.org/jira/browse/SPARK-45737 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > {code:java} > if (takeFromEnd) { > while (buf.length < n && i < res.length) { > val rows = decodeUnsafeRows(res(i)._2) > if (n - buf.length >= res(i)._1) { > buf.prependAll(rows.toArray[InternalRow]) > } else { > val dropUntil = res(i)._1 - (n - buf.length) > // Same as Iterator.drop but this only takes a long. > var j: Long = 0L > while (j < dropUntil) { rows.next(); j += 1L} > buf.prependAll(rows.toArray[InternalRow]) > } > i += 1 > } > } else { > while (buf.length < n && i < res.length) { > val rows = decodeUnsafeRows(res(i)._2) > if (n - buf.length >= res(i)._1) { > buf ++= rows.toArray[InternalRow] > } else { > buf ++= rows.take(n - buf.length).toArray[InternalRow] > } > i += 1 > } > } {code} > In the above code, the input parameters of `mutable.Buffer#prependAll` and > `mutable.Growable#++=` functions are `IterableOnce`, and the type of rows is > `Iterator[InternalRow]`, which inherits from `IterableOnce`, so there is no > need to cast to an array of InternalRow anymore. > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45737) Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function.
Yang Jie created SPARK-45737: Summary: Remove unnecessary `.toArray[InternalRow]` in `SparkPlan#executeTake` function. Key: SPARK-45737 URL: https://issues.apache.org/jira/browse/SPARK-45737 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} if (takeFromEnd) { while (buf.length < n && i < res.length) { val rows = decodeUnsafeRows(res(i)._2) if (n - buf.length >= res(i)._1) { buf.prependAll(rows.toArray[InternalRow]) } else { val dropUntil = res(i)._1 - (n - buf.length) // Same as Iterator.drop but this only takes a long. var j: Long = 0L while (j < dropUntil) { rows.next(); j += 1L} buf.prependAll(rows.toArray[InternalRow]) } i += 1 } } else { while (buf.length < n && i < res.length) { val rows = decodeUnsafeRows(res(i)._2) if (n - buf.length >= res(i)._1) { buf ++= rows.toArray[InternalRow] } else { buf ++= rows.take(n - buf.length).toArray[InternalRow] } i += 1 } } {code} In the above code, the input parameters of `mutable.Buffer#prependAll` and `mutable.Growable#++=` functions are `IterableOnce`, and the type of rows is `Iterator[InternalRow]`, which inherits from `IterableOnce`, so there is no need to cast to an array of InternalRow anymore. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45735) Reenable CatalogTests without Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-45735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45735: Assignee: Hyukjin Kwon > Reenable CatalogTests without Spark Connect > --- > > Key: SPARK-45735 > URL: https://issues.apache.org/jira/browse/SPARK-45735 > Project: Spark > Issue Type: New Feature > Components: PySpark, Tests >Affects Versions: 3.4.1, 3.5.0, 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > https://issues.apache.org/jira/browse/SPARK-41707 mistakenly make the > original tests skipped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45735) Reenable CatalogTests without Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-45735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45735. -- Fix Version/s: 3.5.1 4.0.0 3.4.2 Resolution: Fixed Issue resolved by pull request 43595 [https://github.com/apache/spark/pull/43595] > Reenable CatalogTests without Spark Connect > --- > > Key: SPARK-45735 > URL: https://issues.apache.org/jira/browse/SPARK-45735 > Project: Spark > Issue Type: New Feature > Components: PySpark, Tests >Affects Versions: 3.4.1, 3.5.0, 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.5.1, 4.0.0, 3.4.2 > > > https://issues.apache.org/jira/browse/SPARK-41707 mistakenly make the > original tests skipped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45736) Use \s+ as separator when testing Kafka source or network source
[ https://issues.apache.org/jira/browse/SPARK-45736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45736: --- Labels: pull-request-available (was: ) > Use \s+ as separator when testing Kafka source or network source > > > Key: SPARK-45736 > URL: https://issues.apache.org/jira/browse/SPARK-45736 > Project: Spark > Issue Type: Improvement > Components: Examples >Affects Versions: 3.5.0 >Reporter: Deng Ziming >Priority: Minor > Labels: pull-request-available > > When testing data is from Kafka or network, it's possible that we generator > redundant "blank". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45701) Clean up the deprecated API usage related to `SetOps`
[ https://issues.apache.org/jira/browse/SPARK-45701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-45701: Assignee: Yang Jie > Clean up the deprecated API usage related to `SetOps` > - > > Key: SPARK-45701 > URL: https://issues.apache.org/jira/browse/SPARK-45701 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > > * method - in trait SetOps is deprecated (since 2.13.0) > * method -- in trait SetOps is deprecated (since 2.13.0) > * method + in trait SetOps is deprecated (since 2.13.0) > * method retain in trait SetOps is deprecated (since 2.13.0) > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala:70:32: > method + in trait SetOps is deprecated (since 2.13.0): Consider requiring an > immutable Set or fall back to Set.union > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.storage.BlockReplicationUtils.getSampleIds.indices.$anonfun, > origin=scala.collection.SetOps.+, version=2.13.0 > [warn] if (set.contains(t)) set + i else set + t > [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45701) Clean up the deprecated API usage related to `SetOps`
[ https://issues.apache.org/jira/browse/SPARK-45701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45701. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43575 [https://github.com/apache/spark/pull/43575] > Clean up the deprecated API usage related to `SetOps` > - > > Key: SPARK-45701 > URL: https://issues.apache.org/jira/browse/SPARK-45701 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > * method - in trait SetOps is deprecated (since 2.13.0) > * method -- in trait SetOps is deprecated (since 2.13.0) > * method + in trait SetOps is deprecated (since 2.13.0) > * method retain in trait SetOps is deprecated (since 2.13.0) > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala:70:32: > method + in trait SetOps is deprecated (since 2.13.0): Consider requiring an > immutable Set or fall back to Set.union > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.storage.BlockReplicationUtils.getSampleIds.indices.$anonfun, > origin=scala.collection.SetOps.+, version=2.13.0 > [warn] if (set.contains(t)) set + i else set + t > [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45736) Use \s+ as separator when testing Kafka source or network source
Deng Ziming created SPARK-45736: --- Summary: Use \s+ as separator when testing Kafka source or network source Key: SPARK-45736 URL: https://issues.apache.org/jira/browse/SPARK-45736 Project: Spark Issue Type: Improvement Components: Examples Affects Versions: 3.5.0 Reporter: Deng Ziming When testing data is from Kafka or network, it's possible that we generator redundant "blank". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org