[jira] [Closed] (SPARK-36593) [Deprecated] Support the Volcano Job API
[ https://issues.apache.org/jira/browse/SPARK-36593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-36593. - > [Deprecated] Support the Volcano Job API > > > Key: SPARK-36593 > URL: https://issues.apache.org/jira/browse/SPARK-36593 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Holden Karau >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36060) Support backing off dynamic allocation increases if resources are "stuck"
[ https://issues.apache.org/jira/browse/SPARK-36060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-36060. --- Resolution: Fixed > Support backing off dynamic allocation increases if resources are "stuck" > - > > Key: SPARK-36060 > URL: https://issues.apache.org/jira/browse/SPARK-36060 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Holden Karau >Priority: Major > > In a over-subscribed environment we may enter a situation where our requests > for more pods are not going to be fulfilled. Adding more requests for more > pods is not going to help and may slow down the scheduler. We should detect > this situation and hold off on increasing pod requests until the scheduler > allocates more pods to us. We have a limited version of this in the Kube > scheduler it's self but it would be better to plumb this all the way through > to the DA logic. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-36060) Support backing off dynamic allocation increases if resources are "stuck"
[ https://issues.apache.org/jira/browse/SPARK-36060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reopened SPARK-36060: --- > Support backing off dynamic allocation increases if resources are "stuck" > - > > Key: SPARK-36060 > URL: https://issues.apache.org/jira/browse/SPARK-36060 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Holden Karau >Priority: Major > > In a over-subscribed environment we may enter a situation where our requests > for more pods are not going to be fulfilled. Adding more requests for more > pods is not going to help and may slow down the scheduler. We should detect > this situation and hold off on increasing pod requests until the scheduler > allocates more pods to us. We have a limited version of this in the Kube > scheduler it's self but it would be better to plumb this all the way through > to the DA logic. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38524) [TEST] Change disable queue to capability limit way
[ https://issues.apache.org/jira/browse/SPARK-38524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505209#comment-17505209 ] Dongjoon Hyun commented on SPARK-38524: --- [~yikunkero]. FYI, here is a tip. - Remove `[TEST]` from JIRA title. - Add `Tests` to `Component/s:`. > [TEST] Change disable queue to capability limit way > --- > > Key: SPARK-38524 > URL: https://issues.apache.org/jira/browse/SPARK-38524 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > > As description from [https://volcano.sh/en/docs/queue/] > - weight is a soft constraint. > - capability is a hard constraint. > We better to use capability to make thing simple to avoid being influenced by > other queues -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38525) [TEST] Check resource after resource creation
[ https://issues.apache.org/jira/browse/SPARK-38525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505210#comment-17505210 ] Dongjoon Hyun commented on SPARK-38525: --- FYI, here is a tip. - Remove `[TEST]` from JIRA title. - Add `Tests` to `Component/s:`. > [TEST] Check resource after resource creation > - > > Key: SPARK-38525 > URL: https://issues.apache.org/jira/browse/SPARK-38525 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Tests >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-38135) Introduce `spark.kubernetes.job` sheduling related configurations
[ https://issues.apache.org/jira/browse/SPARK-38135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-38135. - > Introduce `spark.kubernetes.job` sheduling related configurations > -- > > Key: SPARK-38135 > URL: https://issues.apache.org/jira/browse/SPARK-38135 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > > spark.kubernetes.job.minCPU: the minimum cpu resources for running job > spark.kubernetes.job.minMemory: the minimum memory resources for running job > spark.kubernetes.job.minMember: the minimum number of pods for running job > spark.kubernetes.job.priorityClassName: the priority of the running job > spark.kubernetes.job.queue: the queue to which the running job belongs -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings
[ https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38511: -- Parent: SPARK-36057 Issue Type: Sub-task (was: Improvement) > Remove priorityClassName propagation in favor of explicit settings > -- > > Key: SPARK-38511 > URL: https://issues.apache.org/jira/browse/SPARK-38511 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix
[ https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38513: -- Parent: SPARK-36057 Issue Type: Sub-task (was: Improvement) > Move custom scheduler-specific configs to under > `spark.kubernetes.scheduler.NAME` prefix > > > Key: SPARK-38513 > URL: https://issues.apache.org/jira/browse/SPARK-38513 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38527) Set the minimum Volcano version
[ https://issues.apache.org/jira/browse/SPARK-38527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38527: -- Parent: SPARK-36057 Issue Type: Sub-task (was: Documentation) > Set the minimum Volcano version > --- > > Key: SPARK-38527 > URL: https://issues.apache.org/jira/browse/SPARK-38527 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38533) DS V2 aggregate push-down supports project with alias
[ https://issues.apache.org/jira/browse/SPARK-38533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38533: Assignee: Apache Spark > DS V2 aggregate push-down supports project with alias > - > > Key: SPARK-38533 > URL: https://issues.apache.org/jira/browse/SPARK-38533 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38533) DS V2 aggregate push-down supports project with alias
[ https://issues.apache.org/jira/browse/SPARK-38533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505206#comment-17505206 ] Apache Spark commented on SPARK-38533: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/35823 > DS V2 aggregate push-down supports project with alias > - > > Key: SPARK-38533 > URL: https://issues.apache.org/jira/browse/SPARK-38533 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38533) DS V2 aggregate push-down supports project with alias
[ https://issues.apache.org/jira/browse/SPARK-38533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38533: Assignee: (was: Apache Spark) > DS V2 aggregate push-down supports project with alias > - > > Key: SPARK-38533 > URL: https://issues.apache.org/jira/browse/SPARK-38533 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38533) DS V2 aggregate push-down supports project with alias
[ https://issues.apache.org/jira/browse/SPARK-38533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-38533: --- Summary: DS V2 aggregate push-down supports project with alias (was: Aggregate push-down supports project with alias) > DS V2 aggregate push-down supports project with alias > - > > Key: SPARK-38533 > URL: https://issues.apache.org/jira/browse/SPARK-38533 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38533) Aggregate push-down supports project with alias
jiaan.geng created SPARK-38533: -- Summary: Aggregate push-down supports project with alias Key: SPARK-38533 URL: https://issues.apache.org/jira/browse/SPARK-38533 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38532) Add test case for invalid gapDuration of sessionwindow
nyingping created SPARK-38532: - Summary: Add test case for invalid gapDuration of sessionwindow Key: SPARK-38532 URL: https://issues.apache.org/jira/browse/SPARK-38532 Project: Spark Issue Type: Test Components: Structured Streaming Affects Versions: 3.2.1 Reporter: nyingping Since the dynamic gapduration has been added in the session window[[33691|https://github.com/apache/spark/pull/33691]|[https://github.com/apache/spark/pull/33691]], users are allowed to enter invalid gapduration . However, for now, test cases are only added for zero and negative gapduration. I think it is necessary to add test cases for invalid gapduration. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38526) fix misleading function alias name for RuntimeReplaceable
[ https://issues.apache.org/jira/browse/SPARK-38526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38526: - Assignee: Wenchen Fan > fix misleading function alias name for RuntimeReplaceable > - > > Key: SPARK-38526 > URL: https://issues.apache.org/jira/browse/SPARK-38526 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38526) fix misleading function alias name for RuntimeReplaceable
[ https://issues.apache.org/jira/browse/SPARK-38526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38526. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35821 [https://github.com/apache/spark/pull/35821] > fix misleading function alias name for RuntimeReplaceable > - > > Key: SPARK-38526 > URL: https://issues.apache.org/jira/browse/SPARK-38526 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38516) Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active hadoop-provided
[ https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38516. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35811 [https://github.com/apache/spark/pull/35811] > Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active > hadoop-provided > - > > Key: SPARK-38516 > URL: https://issues.apache.org/jira/browse/SPARK-38516 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.3.0 > > > {noformat} > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/core/Filter > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.core.Filter > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 7 more{noformat} > {noformat} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/LogManager > at > org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42) > at > org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114) > at > org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31) > at > org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985) > at org.apache.spark.SparkContext.(SparkContext.scala:563) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.LogManager > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 26 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38516) Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active hadoop-provided
[ https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38516: - Assignee: Yuming Wang > Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active > hadoop-provided > - > > Key: SPARK-38516 > URL: https://issues.apache.org/jira/browse/SPARK-38516 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > {noformat} > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/core/Filter > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.core.Filter > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 7 more{noformat} > {noformat} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/LogManager > at > org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42) > at > org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114) > at > org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31) > at > org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985) > at org.apache.spark.SparkContext.(SparkContext.scala:563) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.LogManager > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 26 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38531) "Prune unrequired child index" branch of ColumnPruning has wrong condition
Min Yang created SPARK-38531: Summary: "Prune unrequired child index" branch of ColumnPruning has wrong condition Key: SPARK-38531 URL: https://issues.apache.org/jira/browse/SPARK-38531 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 3.2.1 Reporter: Min Yang The "prune unrequired references" branch has the condition: {code:java} case p @ Project(_, g: Generate) if p.references != g.outputSet => {code} This is wrong as generators like Inline will always enter this branch as long as it does not use all the generator output. Example: input: , b: int>>> Project(a.a as x) - Generate(Inline(col1), ..., a, b) p.references is [a] g.outputSet is [a, b] This bug makes us never enter the GeneratorNestedColumnAliasing branch below thus miss some optimization opportunities. The condition should be {code:java} g.requiredChildOutput.contains(!p.references.contains(_)) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38530) GeneratorNestedColumnAliasing does not work correctly for some expressions
Min Yang created SPARK-38530: Summary: GeneratorNestedColumnAliasing does not work correctly for some expressions Key: SPARK-38530 URL: https://issues.apache.org/jira/browse/SPARK-38530 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 3.2.1 Reporter: Min Yang [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala#L226] The code to collect ExtractValue expressions is wrong. We should do it in a bottom up way instead of only check 2 levels. It can cause incorrect result if the expression looks like ExtractValue(ExtractValue(some_other_expr)). An example to trigger the bug is: input: , b: int Project(ExtractValue(ExtractValue(CaseWhen([col.a == 1, col.b]), "a"), "a") - Generate(Explode(col1)) We will try to incorrectly push down the whole expression into the input of the Explode, now the input of CaseWhen has array<...> as input so we will get wrong result. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38529) GeneratorNestedColumnAliasing works incorrectly for non-Explode generators
Min Yang created SPARK-38529: Summary: GeneratorNestedColumnAliasing works incorrectly for non-Explode generators Key: SPARK-38529 URL: https://issues.apache.org/jira/browse/SPARK-38529 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 3.2.1 Reporter: Min Yang The Project(_, g: Generate) branch in GeneratorNestedColumnAliasing is only supposed to work for ExplodeBase generators but we do not explicitly return for other types like Inline. Currently the bug is not trigger because there is another bug in the "prune unrequired child" branch in the ColumnPruning which makes other generators like Inline always go to that branch even if it is not applicable. An easy example to show the bug: Input: , field2 int>>> Project(field1.field1 as ...) - Generate(Inline(col2), ..., field1, field2) We will try to incorrectly push the .field1 on field1 into the input of the Inline (col2). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38528) NullPointerException when selecting a generator in a Stream of aggregate expressions
[ https://issues.apache.org/jira/browse/SPARK-38528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505149#comment-17505149 ] Bruce Robbins commented on SPARK-38528: --- This is a bug in {{ExtractGenerator}} in which an array ({{{}projectExprs{}}}) is updated from within a closure passed to a map operation (the array is external to the closure). If the sequence of expressions on which the map operation is called is a {{{}Stream{}}}, the map operation is evaluated lazily, so the array is not fully updated before the rule completes. > NullPointerException when selecting a generator in a Stream of aggregate > expressions > > > Key: SPARK-38528 > URL: https://issues.apache.org/jira/browse/SPARK-38528 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.2.1, 3.3.0 >Reporter: Bruce Robbins >Priority: Major > > Assume this dataframe: > {noformat} > val df = Seq(1, 2, 3).toDF("v") > {noformat} > This works: > {noformat} > df.select(Seq(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect > {noformat} > However, this doesn't: > {noformat} > df.select(Stream(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect > {noformat} > It throws this error: > {noformat} > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$.$anonfun$containsAggregates$1(Analyzer.scala:2516) > at scala.collection.immutable.List.flatMap(List.scala:366) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$.containsAggregates(Analyzer.scala:2515) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$$anonfun$apply$31.applyOrElse(Analyzer.scala:2509) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$$anonfun$apply$31.applyOrElse(Analyzer.scala:2508) > {noformat} > The only difference between the two queries is that the first one uses > {{Seq}} to specify the varargs, whereas the second one uses {{Stream}}. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38528) NullPointerException when selecting a generator in a Stream of aggregate expressions
Bruce Robbins created SPARK-38528: - Summary: NullPointerException when selecting a generator in a Stream of aggregate expressions Key: SPARK-38528 URL: https://issues.apache.org/jira/browse/SPARK-38528 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1, 3.1.3, 3.3.0 Reporter: Bruce Robbins Assume this dataframe: {noformat} val df = Seq(1, 2, 3).toDF("v") {noformat} This works: {noformat} df.select(Seq(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect {noformat} However, this doesn't: {noformat} df.select(Stream(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect {noformat} It throws this error: {noformat} java.lang.NullPointerException at org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$.$anonfun$containsAggregates$1(Analyzer.scala:2516) at scala.collection.immutable.List.flatMap(List.scala:366) at org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$.containsAggregates(Analyzer.scala:2515) at org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$$anonfun$apply$31.applyOrElse(Analyzer.scala:2509) at org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$$anonfun$apply$31.applyOrElse(Analyzer.scala:2508) {noformat} The only difference between the two queries is that the first one uses {{Seq}} to specify the varargs, whereas the second one uses {{Stream}}. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class
[ https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504529#comment-17504529 ] Brian Schaefer edited comment on SPARK-38483 at 3/11/22, 9:57 PM: -- The column name does differ between the two when selecting a struct field. However I think it makes sense to return the name that the column _would_ take if it were selected. Seems like this should be fairly straightforward to handle: {code:python} >>> df = spark.createDataFrame([{"struct": {"outer_field": {"inner_field": >>> 1}}}]) >>> values = F.col("struct.outer_field.inner_field") >>> print(df.select(values).schema[0].name) inner_field >>> print(values._jc.toString()) struct.outer_field.inner_field >>> print(values._jc.toString().split(".")[-1]) inner_field{code} was (Author: JIRAUSER286367): The column name does differ between the two when selecting a struct field. However I think it makes sense to print out the name that the column _would_ take if it were selected. Seems like this should be fairly straightforward to handle: {code:python} >>> df = spark.createDataFrame([{"struct": {"outer_field": {"inner_field": >>> 1}}}]) >>> values = F.col("struct.outer_field.inner_field") >>> print(df.select(values).schema[0].name) inner_field >>> print(values._jc.toString()) struct.outer_field.inner_field >>> print(values._jc.toString().split(".")[-1]) inner_field{code} > Column name or alias as an attribute of the PySpark Column class > > > Key: SPARK-38483 > URL: https://issues.apache.org/jira/browse/SPARK-38483 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.2.1 >Reporter: Brian Schaefer >Priority: Minor > Labels: starter > > Having the name of a column as an attribute of PySpark {{Column}} class > instances can enable some convenient patterns, for example: > Applying a function to a column and aliasing with the original name: > {code:java} > values = F.col("values") > # repeating the column name as an alias > distinct_values = F.array_distinct(values).alias("values") > # re-using the existing column name > distinct_values = F.array_distinct(values).alias(values._name){code} > Checking the column name inside a custom function and applying conditional > logic on the name: > {code:java} > def custom_function(col: Column) -> Column: > if col._name == "my_column": > return col.astype("int") > return col.astype("string"){code} > The proposal in this issue is to add a property {{Column.\_name}} that > obtains the name or alias of a column in a similar way as currently done in > the {{Column.\_\_repr\_\_}} method: > [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.] > The choice of {{_name}} intentionally avoids collision with the existing > {{Column.name}} method, which is an alias for {{{}Column.alias{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38527) Set the minimum Volcano version
[ https://issues.apache.org/jira/browse/SPARK-38527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38527: - Assignee: Dongjoon Hyun > Set the minimum Volcano version > --- > > Key: SPARK-38527 > URL: https://issues.apache.org/jira/browse/SPARK-38527 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38527) Set the minimum Volcano version
[ https://issues.apache.org/jira/browse/SPARK-38527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38527. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35822 [https://github.com/apache/spark/pull/35822] > Set the minimum Volcano version > --- > > Key: SPARK-38527 > URL: https://issues.apache.org/jira/browse/SPARK-38527 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38508) Volcano feature doesn't work on EKS graviton instances
[ https://issues.apache.org/jira/browse/SPARK-38508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505066#comment-17505066 ] Dongjoon Hyun commented on SPARK-38508: --- Due to this issue, I created SPARK-38527 . `latest` tag is not a good practice at all. > Volcano feature doesn't work on EKS graviton instances > -- > > Key: SPARK-38508 > URL: https://issues.apache.org/jira/browse/SPARK-38508 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38527) Set the minimum Volcano version
[ https://issues.apache.org/jira/browse/SPARK-38527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505055#comment-17505055 ] Apache Spark commented on SPARK-38527: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/35822 > Set the minimum Volcano version > --- > > Key: SPARK-38527 > URL: https://issues.apache.org/jira/browse/SPARK-38527 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38527) Set the minimum Volcano version
[ https://issues.apache.org/jira/browse/SPARK-38527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38527: Assignee: (was: Apache Spark) > Set the minimum Volcano version > --- > > Key: SPARK-38527 > URL: https://issues.apache.org/jira/browse/SPARK-38527 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38527) Set the minimum Volcano version
[ https://issues.apache.org/jira/browse/SPARK-38527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38527: Assignee: Apache Spark > Set the minimum Volcano version > --- > > Key: SPARK-38527 > URL: https://issues.apache.org/jira/browse/SPARK-38527 > Project: Spark > Issue Type: Documentation > Components: Documentation, Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38527) Set the minimum Volcano version
Dongjoon Hyun created SPARK-38527: - Summary: Set the minimum Volcano version Key: SPARK-38527 URL: https://issues.apache.org/jira/browse/SPARK-38527 Project: Spark Issue Type: Documentation Components: Documentation, Kubernetes Affects Versions: 3.3.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used
[ https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504981#comment-17504981 ] Alexandros Mavrommatis edited comment on SPARK-38507 at 3/11/22, 3:48 PM: -- [~dcoliversun] Hello again. You may check the signature of select() and withColumn() methods in the following links: * [https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#select(col:String,cols:String*):org.apache.spark.sql.DataFrame] * [https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame] As you can see they have the exact same signature, so it is pretty obvious to me that they expect the same input. Additionally, as I mentioned before the exception of withColumn method call was the following: {code:java} cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, df.field2] {code} This clearly states that I try to access field "df.field3" which is actually part of the dataframe schema (it is clearly mentioned in the message). In any case, if you have any doubts, please feel free to contact spark user email group. was (Author: amavrommatis): [~dcoliversun] Hello again. You may check the signature of select() and withColumn() methods in the following links: * [https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#select(col:String,cols:String*):org.apache.spark.sql.DataFrame] * [https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame] As you can see they have the exact same signature, so it is pretty obvious to me that they expect the same input. Additionally, as I mentioned before the exception of withColumn method call was the following: {code:java} cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, df.field2] {code} This clearly states that I try to access field "df.field3" which is actually part of the dataframe schema (it is clearly mentioned in the message). In any case, if you have any doubts, please feel free to contact spark user email group. > DataFrame withColumn method not adding or replacing columns when alias is used > -- > > Key: SPARK-38507 > URL: https://issues.apache.org/jira/browse/SPARK-38507 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Alexandros Mavrommatis >Priority: Major > Labels: SQL, catalyst > > I have an input DataFrame *df* created as follows: > {code:java} > import spark.implicits._ > val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code} > When I execute either this command: > {code:java} > df.select("df.field2").show(2) {code} > or that one: > {code:java} > df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code} > I get the same result: > {code:java} > +--+ > |field2| > +--+ > | 10| > | 20| > +--+ {code} > Additionally, when I execute the following command: > {code:java} > df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code} > I get this exception: > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given > input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- > Project [field1#7, field2#8, 0 AS df.field3#31] +- SubqueryAlias df > +- Project [_1#2 AS field1#7, _2#3 AS field2#8] +- LocalRelation > [_1#2, _2#3] at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) > at >
[jira] [Comment Edited] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used
[ https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504981#comment-17504981 ] Alexandros Mavrommatis edited comment on SPARK-38507 at 3/11/22, 3:48 PM: -- [~dcoliversun] Hello again. You may check the signature of select() and withColumn() methods in the following links: * [https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#select(col:String,cols:String*):org.apache.spark.sql.DataFrame] * [https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame] As you can see they have the exact same signature, so it is pretty obvious to me that they expect the same input. Additionally, as I mentioned before the exception of withColumn method call was the following: {code:java} cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, df.field2] {code} This clearly states that I try to access field "df.field3" which is actually part of the dataframe schema (it is clearly mentioned in the message). In any case, if you have any doubts, please feel free to contact spark user email group. was (Author: amavrommatis): [~dcoliversun] Hello again. You may check the signature of select() and withColumn() methods in the following links: * [https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#select(col:String,cols:String*):org.apache.spark.sql.DataFrame] * [https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame] As you can see they have the exact same signature, so it is pretty obvious that they expect the same input. Additionally, as I mentioned before the exception of withColumn method call was the following: {code:java} cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, df.field2] {code} This clearly states that I try to access field "df.field3" which is actually part of the dataframe schema (it is clearly mentioned in the message). In any case, if you have any doubts, please feel free to contact spark user email group. > DataFrame withColumn method not adding or replacing columns when alias is used > -- > > Key: SPARK-38507 > URL: https://issues.apache.org/jira/browse/SPARK-38507 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Alexandros Mavrommatis >Priority: Major > Labels: SQL, catalyst > > I have an input DataFrame *df* created as follows: > {code:java} > import spark.implicits._ > val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code} > When I execute either this command: > {code:java} > df.select("df.field2").show(2) {code} > or that one: > {code:java} > df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code} > I get the same result: > {code:java} > +--+ > |field2| > +--+ > | 10| > | 20| > +--+ {code} > Additionally, when I execute the following command: > {code:java} > df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code} > I get this exception: > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given > input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- > Project [field1#7, field2#8, 0 AS df.field3#31] +- SubqueryAlias df > +- Project [_1#2 AS field1#7, _2#3 AS field2#8] +- LocalRelation > [_1#2, _2#3] at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) > at >
[jira] [Commented] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used
[ https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504981#comment-17504981 ] Alexandros Mavrommatis commented on SPARK-38507: [~dcoliversun] Hello again. You may check the signature of select() and withColumn() methods in the following links: * [https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#select(col:String,cols:String*):org.apache.spark.sql.DataFrame] * [https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame] As you can see they have the exact same signature, so it is pretty obvious that they expect the same input. Additionally, as I mentioned before the exception of withColumn method call was the following: {code:java} cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, df.field2] {code} This clearly states that I try to access field "df.field3" which is actually part of the dataframe schema (it is clearly mentioned in the message). In any case, if you have any doubts, please feel free to contact spark user email group. > DataFrame withColumn method not adding or replacing columns when alias is used > -- > > Key: SPARK-38507 > URL: https://issues.apache.org/jira/browse/SPARK-38507 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Alexandros Mavrommatis >Priority: Major > Labels: SQL, catalyst > > I have an input DataFrame *df* created as follows: > {code:java} > import spark.implicits._ > val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code} > When I execute either this command: > {code:java} > df.select("df.field2").show(2) {code} > or that one: > {code:java} > df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code} > I get the same result: > {code:java} > +--+ > |field2| > +--+ > | 10| > | 20| > +--+ {code} > Additionally, when I execute the following command: > {code:java} > df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code} > I get this exception: > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given > input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- > Project [field1#7, field2#8, 0 AS df.field3#31] +- SubqueryAlias df > +- Project [_1#2 AS field1#7, _2#3 AS field2#8] +- LocalRelation > [_1#2, _2#3] at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at > scala.collection.TraversableLike.map(TraversableLike.scala:238) at > scala.collection.TraversableLike.map$(TraversableLike.scala:231) at > scala.collection.AbstractTraversable.map(Traversable.scala:108) at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137) > at >
[jira] [Commented] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used
[ https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504971#comment-17504971 ] qian commented on SPARK-38507: -- [~amavrommatis] Method *select()* regards input argument like _xx.xx_ as {_}table.column{_}, which is by design. I don't agree that this is actually a bug. If you stick to your point, you could email to spark user email group about this case. :) > DataFrame withColumn method not adding or replacing columns when alias is used > -- > > Key: SPARK-38507 > URL: https://issues.apache.org/jira/browse/SPARK-38507 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Alexandros Mavrommatis >Priority: Major > Labels: SQL, catalyst > > I have an input DataFrame *df* created as follows: > {code:java} > import spark.implicits._ > val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code} > When I execute either this command: > {code:java} > df.select("df.field2").show(2) {code} > or that one: > {code:java} > df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code} > I get the same result: > {code:java} > +--+ > |field2| > +--+ > | 10| > | 20| > +--+ {code} > Additionally, when I execute the following command: > {code:java} > df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code} > I get this exception: > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given > input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- > Project [field1#7, field2#8, 0 AS df.field3#31] +- SubqueryAlias df > +- Project [_1#2 AS field1#7, _2#3 AS field2#8] +- LocalRelation > [_1#2, _2#3] at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at > scala.collection.TraversableLike.map(TraversableLike.scala:238) at > scala.collection.TraversableLike.map$(TraversableLike.scala:231) at > scala.collection.AbstractTraversable.map(Traversable.scala:108) at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:152) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:93) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:184) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:93) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:90) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:155) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:176) > at
[jira] [Commented] (SPARK-38526) fix misleading function alias name for RuntimeReplaceable
[ https://issues.apache.org/jira/browse/SPARK-38526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504954#comment-17504954 ] Apache Spark commented on SPARK-38526: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/35821 > fix misleading function alias name for RuntimeReplaceable > - > > Key: SPARK-38526 > URL: https://issues.apache.org/jira/browse/SPARK-38526 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38526) fix misleading function alias name for RuntimeReplaceable
[ https://issues.apache.org/jira/browse/SPARK-38526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38526: Assignee: (was: Apache Spark) > fix misleading function alias name for RuntimeReplaceable > - > > Key: SPARK-38526 > URL: https://issues.apache.org/jira/browse/SPARK-38526 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38526) fix misleading function alias name for RuntimeReplaceable
[ https://issues.apache.org/jira/browse/SPARK-38526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38526: Assignee: Apache Spark > fix misleading function alias name for RuntimeReplaceable > - > > Key: SPARK-38526 > URL: https://issues.apache.org/jira/browse/SPARK-38526 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38526) fix misleading function alias name for RuntimeReplaceable
Wenchen Fan created SPARK-38526: --- Summary: fix misleading function alias name for RuntimeReplaceable Key: SPARK-38526 URL: https://issues.apache.org/jira/browse/SPARK-38526 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38525) [TEST] Check resource after resource creation
[ https://issues.apache.org/jira/browse/SPARK-38525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38525: Assignee: (was: Apache Spark) > [TEST] Check resource after resource creation > - > > Key: SPARK-38525 > URL: https://issues.apache.org/jira/browse/SPARK-38525 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Tests >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38525) [TEST] Check resource after resource creation
[ https://issues.apache.org/jira/browse/SPARK-38525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504934#comment-17504934 ] Apache Spark commented on SPARK-38525: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/35820 > [TEST] Check resource after resource creation > - > > Key: SPARK-38525 > URL: https://issues.apache.org/jira/browse/SPARK-38525 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Tests >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38525) [TEST] Check resource after resource creation
[ https://issues.apache.org/jira/browse/SPARK-38525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38525: Assignee: Apache Spark > [TEST] Check resource after resource creation > - > > Key: SPARK-38525 > URL: https://issues.apache.org/jira/browse/SPARK-38525 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Tests >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38525) [TEST] Check resource after resource creation
Yikun Jiang created SPARK-38525: --- Summary: [TEST] Check resource after resource creation Key: SPARK-38525 URL: https://issues.apache.org/jira/browse/SPARK-38525 Project: Spark Issue Type: Sub-task Components: Kubernetes, Tests Affects Versions: 3.3.0 Reporter: Yikun Jiang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38524) [TEST] Change disable queue to capability limit way
[ https://issues.apache.org/jira/browse/SPARK-38524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504930#comment-17504930 ] Apache Spark commented on SPARK-38524: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/35819 > [TEST] Change disable queue to capability limit way > --- > > Key: SPARK-38524 > URL: https://issues.apache.org/jira/browse/SPARK-38524 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > > As description from [https://volcano.sh/en/docs/queue/] > - weight is a soft constraint. > - capability is a hard constraint. > We better to use capability to make thing simple to avoid being influenced by > other queues -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38524) [TEST] Change disable queue to capability limit way
[ https://issues.apache.org/jira/browse/SPARK-38524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504928#comment-17504928 ] Apache Spark commented on SPARK-38524: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/35819 > [TEST] Change disable queue to capability limit way > --- > > Key: SPARK-38524 > URL: https://issues.apache.org/jira/browse/SPARK-38524 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > > As description from [https://volcano.sh/en/docs/queue/] > - weight is a soft constraint. > - capability is a hard constraint. > We better to use capability to make thing simple to avoid being influenced by > other queues -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38524) [TEST] Change disable queue to capability limit way
[ https://issues.apache.org/jira/browse/SPARK-38524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38524: Assignee: (was: Apache Spark) > [TEST] Change disable queue to capability limit way > --- > > Key: SPARK-38524 > URL: https://issues.apache.org/jira/browse/SPARK-38524 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > > As description from [https://volcano.sh/en/docs/queue/] > - weight is a soft constraint. > - capability is a hard constraint. > We better to use capability to make thing simple to avoid being influenced by > other queues -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38524) [TEST] Change disable queue to capability limit way
[ https://issues.apache.org/jira/browse/SPARK-38524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38524: Assignee: Apache Spark > [TEST] Change disable queue to capability limit way > --- > > Key: SPARK-38524 > URL: https://issues.apache.org/jira/browse/SPARK-38524 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Apache Spark >Priority: Major > > As description from [https://volcano.sh/en/docs/queue/] > - weight is a soft constraint. > - capability is a hard constraint. > We better to use capability to make thing simple to avoid being influenced by > other queues -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38524) [TEST] Change disable queue to capability limit way
[ https://issues.apache.org/jira/browse/SPARK-38524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang updated SPARK-38524: Summary: [TEST] Change disable queue to capability limit way (was: Change disable queue to capability limit way) > [TEST] Change disable queue to capability limit way > --- > > Key: SPARK-38524 > URL: https://issues.apache.org/jira/browse/SPARK-38524 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Priority: Major > > As description from [https://volcano.sh/en/docs/queue/] > - weight is a soft constraint. > - capability is a hard constraint. > We better to use capability to make thing simple to avoid being influenced by > other queues -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38196) Reactor framework so as JDBC dialect could compile expression by self way
[ https://issues.apache.org/jira/browse/SPARK-38196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504927#comment-17504927 ] Apache Spark commented on SPARK-38196: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/35818 > Reactor framework so as JDBC dialect could compile expression by self way > - > > Key: SPARK-38196 > URL: https://issues.apache.org/jira/browse/SPARK-38196 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > https://issues.apache.org/jira/browse/SPARK-37960 provides a new framework to > represent catalyst expressions in DS V2 APIs. > Because the framework translate all catalyst expressions to a unified SQL > string and cannot keep compatibility between different JDBC database, the > framework works not good. > This PR reactor the framework so as JDBC dialect could compile expression by > self way. > First, The framework translate catalyst expressions to DS V2 expression. > Second, The JDBC dialect could compile DS V2 expression to different SQL > syntax. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38524) Change disable queue to capability limit way
Yikun Jiang created SPARK-38524: --- Summary: Change disable queue to capability limit way Key: SPARK-38524 URL: https://issues.apache.org/jira/browse/SPARK-38524 Project: Spark Issue Type: Sub-task Components: Kubernetes Affects Versions: 3.3.0 Reporter: Yikun Jiang As description from [https://volcano.sh/en/docs/queue/] - weight is a soft constraint. - capability is a hard constraint. We better to use capability to make thing simple to avoid being influenced by other queues -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38523) Failure on referring to the corrupt record from CSV
[ https://issues.apache.org/jira/browse/SPARK-38523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38523: Assignee: (was: Apache Spark) > Failure on referring to the corrupt record from CSV > --- > > Key: SPARK-38523 > URL: https://issues.apache.org/jira/browse/SPARK-38523 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > The file below has a invalid value in a field: > {code:java} > 0,2013-111_11 12:13:14 > 1,1983-08-04 {code} > where the timestamp 2013-111_11 12:13:14 is incorrect. > The query fails when it refers to the corrupt record column: > {code:java} > spark.read.format("csv") > .option("header", "true") > .schema(schema) > .load("csv_corrupt_record.csv") > .filter($"_corrupt_record".isNotNull) {code} > with the exception: > {code:java} > org.apache.spark.sql.AnalysisException: > Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the > referenced columns only include the internal corrupt record column > (named _corrupt_record by default). For example: > spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count() > and spark.read.schema(schema).csv(file).select("_corrupt_record").show(). > Instead, you can cache or save the parsed results and then send the same > query. > For example, val df = spark.read.schema(schema).csv(file).cache() and then > df.filter($"_corrupt_record".isNotNull).count(). > > at > org.apache.spark.sql.errors.QueryCompilationErrors$.queryFromRawFilesIncludeCorruptRecordColumnError(QueryCompilationErrors.scala:2047) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:116) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38523) Failure on referring to the corrupt record from CSV
[ https://issues.apache.org/jira/browse/SPARK-38523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504910#comment-17504910 ] Apache Spark commented on SPARK-38523: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/35817 > Failure on referring to the corrupt record from CSV > --- > > Key: SPARK-38523 > URL: https://issues.apache.org/jira/browse/SPARK-38523 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > The file below has a invalid value in a field: > {code:java} > 0,2013-111_11 12:13:14 > 1,1983-08-04 {code} > where the timestamp 2013-111_11 12:13:14 is incorrect. > The query fails when it refers to the corrupt record column: > {code:java} > spark.read.format("csv") > .option("header", "true") > .schema(schema) > .load("csv_corrupt_record.csv") > .filter($"_corrupt_record".isNotNull) {code} > with the exception: > {code:java} > org.apache.spark.sql.AnalysisException: > Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the > referenced columns only include the internal corrupt record column > (named _corrupt_record by default). For example: > spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count() > and spark.read.schema(schema).csv(file).select("_corrupt_record").show(). > Instead, you can cache or save the parsed results and then send the same > query. > For example, val df = spark.read.schema(schema).csv(file).cache() and then > df.filter($"_corrupt_record".isNotNull).count(). > > at > org.apache.spark.sql.errors.QueryCompilationErrors$.queryFromRawFilesIncludeCorruptRecordColumnError(QueryCompilationErrors.scala:2047) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:116) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38523) Failure on referring to the corrupt record from CSV
[ https://issues.apache.org/jira/browse/SPARK-38523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38523: Assignee: Apache Spark > Failure on referring to the corrupt record from CSV > --- > > Key: SPARK-38523 > URL: https://issues.apache.org/jira/browse/SPARK-38523 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > The file below has a invalid value in a field: > {code:java} > 0,2013-111_11 12:13:14 > 1,1983-08-04 {code} > where the timestamp 2013-111_11 12:13:14 is incorrect. > The query fails when it refers to the corrupt record column: > {code:java} > spark.read.format("csv") > .option("header", "true") > .schema(schema) > .load("csv_corrupt_record.csv") > .filter($"_corrupt_record".isNotNull) {code} > with the exception: > {code:java} > org.apache.spark.sql.AnalysisException: > Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the > referenced columns only include the internal corrupt record column > (named _corrupt_record by default). For example: > spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count() > and spark.read.schema(schema).csv(file).select("_corrupt_record").show(). > Instead, you can cache or save the parsed results and then send the same > query. > For example, val df = spark.read.schema(schema).csv(file).cache() and then > df.filter($"_corrupt_record".isNotNull).count(). > > at > org.apache.spark.sql.errors.QueryCompilationErrors$.queryFromRawFilesIncludeCorruptRecordColumnError(QueryCompilationErrors.scala:2047) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:116) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38523) Failure on referring to the corrupt record from CSV
[ https://issues.apache.org/jira/browse/SPARK-38523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38523: Assignee: Apache Spark > Failure on referring to the corrupt record from CSV > --- > > Key: SPARK-38523 > URL: https://issues.apache.org/jira/browse/SPARK-38523 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > The file below has a invalid value in a field: > {code:java} > 0,2013-111_11 12:13:14 > 1,1983-08-04 {code} > where the timestamp 2013-111_11 12:13:14 is incorrect. > The query fails when it refers to the corrupt record column: > {code:java} > spark.read.format("csv") > .option("header", "true") > .schema(schema) > .load("csv_corrupt_record.csv") > .filter($"_corrupt_record".isNotNull) {code} > with the exception: > {code:java} > org.apache.spark.sql.AnalysisException: > Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the > referenced columns only include the internal corrupt record column > (named _corrupt_record by default). For example: > spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count() > and spark.read.schema(schema).csv(file).select("_corrupt_record").show(). > Instead, you can cache or save the parsed results and then send the same > query. > For example, val df = spark.read.schema(schema).csv(file).cache() and then > df.filter($"_corrupt_record".isNotNull).count(). > > at > org.apache.spark.sql.errors.QueryCompilationErrors$.queryFromRawFilesIncludeCorruptRecordColumnError(QueryCompilationErrors.scala:2047) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:116) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38523) Failure on referring to the corrupt record from CSV
[ https://issues.apache.org/jira/browse/SPARK-38523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-38523: - Description: The file below has a invalid value in a field: {code:java} 0,2013-111_11 12:13:14 1,1983-08-04 {code} where the timestamp 2013-111_11 12:13:14 is incorrect. The query fails when it refers to the corrupt record column: {code:java} spark.read.format("csv") .option("header", "true") .schema(schema) .load("csv_corrupt_record.csv") .filter($"_corrupt_record".isNotNull) {code} with the exception: {code:java} org.apache.spark.sql.AnalysisException: Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column (named _corrupt_record by default). For example: spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count() and spark.read.schema(schema).csv(file).select("_corrupt_record").show(). Instead, you can cache or save the parsed results and then send the same query. For example, val df = spark.read.schema(schema).csv(file).cache() and then df.filter($"_corrupt_record".isNotNull).count(). at org.apache.spark.sql.errors.QueryCompilationErrors$.queryFromRawFilesIncludeCorruptRecordColumnError(QueryCompilationErrors.scala:2047) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:116) {code} was: The file below has a invalid value in a field: {code:java} 0,2013-111_11 12:13:14 1,1983-08-04 {code} where the timestamp 2013-111_11 12:13:14 is incorrect. The query fails when it refers to the corrupt record column: {code:java} spark.read.format("csv") .option("header", "true") .schema(schema) .load("csv_corrupt_record.csv") .filter($"_corrupt_record".isNotNull) {code} with the exception: > Failure on referring to the corrupt record from CSV > --- > > Key: SPARK-38523 > URL: https://issues.apache.org/jira/browse/SPARK-38523 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > The file below has a invalid value in a field: > {code:java} > 0,2013-111_11 12:13:14 > 1,1983-08-04 {code} > where the timestamp 2013-111_11 12:13:14 is incorrect. > The query fails when it refers to the corrupt record column: > {code:java} > spark.read.format("csv") > .option("header", "true") > .schema(schema) > .load("csv_corrupt_record.csv") > .filter($"_corrupt_record".isNotNull) {code} > with the exception: > {code:java} > org.apache.spark.sql.AnalysisException: > Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the > referenced columns only include the internal corrupt record column > (named _corrupt_record by default). For example: > spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count() > and spark.read.schema(schema).csv(file).select("_corrupt_record").show(). > Instead, you can cache or save the parsed results and then send the same > query. > For example, val df = spark.read.schema(schema).csv(file).cache() and then > df.filter($"_corrupt_record".isNotNull).count(). > > at > org.apache.spark.sql.errors.QueryCompilationErrors$.queryFromRawFilesIncludeCorruptRecordColumnError(QueryCompilationErrors.scala:2047) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:116) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38523) Failure on referring to the corrupt record from CSV
Max Gekk created SPARK-38523: Summary: Failure on referring to the corrupt record from CSV Key: SPARK-38523 URL: https://issues.apache.org/jira/browse/SPARK-38523 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Max Gekk The file below has a invalid value in a field: {code:java} 0,2013-111_11 12:13:14 1,1983-08-04 {code} where the timestamp 2013-111_11 12:13:14 is incorrect. The query fails when it refers to the corrupt record column: {code:java} spark.read.format("csv") .option("header", "true") .schema(schema) .load("csv_corrupt_record.csv") .filter($"_corrupt_record".isNotNull) {code} with the exception: -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38518) Implement `skipna` of `Series.all/Index.all` to exclude NA/null values
[ https://issues.apache.org/jira/browse/SPARK-38518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38518. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35813 [https://github.com/apache/spark/pull/35813] > Implement `skipna` of `Series.all/Index.all` to exclude NA/null values > -- > > Key: SPARK-38518 > URL: https://issues.apache.org/jira/browse/SPARK-38518 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.3.0 > > > Implement `skipna` of `Series.all/Index.all` to exclude NA/null values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38518) Implement `skipna` of `Series.all/Index.all` to exclude NA/null values
[ https://issues.apache.org/jira/browse/SPARK-38518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38518: Assignee: Xinrong Meng > Implement `skipna` of `Series.all/Index.all` to exclude NA/null values > -- > > Key: SPARK-38518 > URL: https://issues.apache.org/jira/browse/SPARK-38518 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Implement `skipna` of `Series.all/Index.all` to exclude NA/null values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38491) Support `ignore_index` of `Series.sort_values`
[ https://issues.apache.org/jira/browse/SPARK-38491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38491: Assignee: Xinrong Meng > Support `ignore_index` of `Series.sort_values` > -- > > Key: SPARK-38491 > URL: https://issues.apache.org/jira/browse/SPARK-38491 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Support `ignore_index` of `Series.sort_values` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38491) Support `ignore_index` of `Series.sort_values`
[ https://issues.apache.org/jira/browse/SPARK-38491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38491. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35794 [https://github.com/apache/spark/pull/35794] > Support `ignore_index` of `Series.sort_values` > -- > > Key: SPARK-38491 > URL: https://issues.apache.org/jira/browse/SPARK-38491 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.3.0 > > > Support `ignore_index` of `Series.sort_values` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38107) Use error classes in the compilation errors of python/pandas UDFs
[ https://issues.apache.org/jira/browse/SPARK-38107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38107. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35656 [https://github.com/apache/spark/pull/35656] > Use error classes in the compilation errors of python/pandas UDFs > - > > Key: SPARK-38107 > URL: https://issues.apache.org/jira/browse/SPARK-38107 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.3.0 > > > Migrate the following errors in QueryCompilationErrors: > * pandasUDFAggregateNotSupportedInPivotError > * groupAggPandasUDFUnsupportedByStreamingAggError > * cannotUseMixtureOfAggFunctionAndGroupAggPandasUDFError > * usePythonUDFInJoinConditionUnsupportedError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38107) Use error classes in the compilation errors of python/pandas UDFs
[ https://issues.apache.org/jira/browse/SPARK-38107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38107: Assignee: Haejoon Lee > Use error classes in the compilation errors of python/pandas UDFs > - > > Key: SPARK-38107 > URL: https://issues.apache.org/jira/browse/SPARK-38107 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Haejoon Lee >Priority: Major > > Migrate the following errors in QueryCompilationErrors: > * pandasUDFAggregateNotSupportedInPivotError > * groupAggPandasUDFUnsupportedByStreamingAggError > * cannotUseMixtureOfAggFunctionAndGroupAggPandasUDFError > * usePythonUDFInJoinConditionUnsupportedError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38515) Volcano queue is not deleted
[ https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504838#comment-17504838 ] Yikun Jiang commented on SPARK-38515: - docker rmi volcanosh/vc-scheduler:latest docker rmi volcanosh/vc-webhook-manager:latest docker rmi volcanosh/vc-controller-manager:latest kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml Please try above, first cleanup docker images, then apply new deploy. I also submit a issue on volcano side to change `IfNotPresent` to `latest` [1] https://github.com/volcano-sh/volcano/issues/2072 > Volcano queue is not deleted > > > Key: SPARK-38515 > URL: https://issues.apache.org/jira/browse/SPARK-38515 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Critical > > {code} > $ k delete queue queue0 > Error from server: admission webhook "validatequeue.volcano.sh" denied the > request: only queue with state `Closed` can be deleted, queue `queue0` state > is `Open` > {code} > {code} > [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED > *** (7 minutes, 40 seconds) > [info] io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: DELETE at: > https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g. > Message: admission webhook "validatequeue.volcano.sh" denied the request: > only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is > `Open`. Received status: Status(apiVersion=v1, code=400, details=null, > kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the > request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` > state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, > status=Failure, additionalProperties={}). > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38508) Volcano feature doesn't work on EKS graviton instances
[ https://issues.apache.org/jira/browse/SPARK-38508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504836#comment-17504836 ] Yikun Jiang commented on SPARK-38508: - docker rmi volcanosh/vc-scheduler-arm64:latest docker rmi volcanosh/vc-webhook-manager-arm64:latest docker rmi volcanosh/vc-controller-manager-arm64:latest kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development-arm64.yaml Please try this, first cleanup docker images, then apply new deploy. I also submit a issue on volcano side to change `IfNotPresent` to `latest` [1] https://github.com/volcano-sh/volcano/issues/2072 > Volcano feature doesn't work on EKS graviton instances > -- > > Key: SPARK-38508 > URL: https://issues.apache.org/jira/browse/SPARK-38508 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used
[ https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504835#comment-17504835 ] Alexandros Mavrommatis edited comment on SPARK-38507 at 3/11/22, 10:01 AM: --- [~dcoliversun] if you check the exception message it says: {code:java} cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, df.field2]{code} so the schema comprehends and includes the alias as expected. As you say if "field2" was something different from "df.field2" then {code:java} df.select("df.field2").show(2){code} would throw an exception too but instead it returns a result. So I am pretty convinced that this is actually a bug. was (Author: amavrommatis): [~dcoliversun] if you check the exception message it says: {code:java} cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, df.field2]{code} so the schema comprehends and includes the alias as expected. As you say if "field2" was something different from "df.field2" then {code:java} df.select("df.field2").show(2){code} would throw an exception too but it actually returns a result. So I am pretty convinced that this is actually a bug. > DataFrame withColumn method not adding or replacing columns when alias is used > -- > > Key: SPARK-38507 > URL: https://issues.apache.org/jira/browse/SPARK-38507 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Alexandros Mavrommatis >Priority: Major > Labels: SQL, catalyst > > I have an input DataFrame *df* created as follows: > {code:java} > import spark.implicits._ > val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code} > When I execute either this command: > {code:java} > df.select("df.field2").show(2) {code} > or that one: > {code:java} > df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code} > I get the same result: > {code:java} > +--+ > |field2| > +--+ > | 10| > | 20| > +--+ {code} > Additionally, when I execute the following command: > {code:java} > df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code} > I get this exception: > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given > input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- > Project [field1#7, field2#8, 0 AS df.field3#31] +- SubqueryAlias df > +- Project [_1#2 AS field1#7, _2#3 AS field2#8] +- LocalRelation > [_1#2, _2#3] at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at > scala.collection.TraversableLike.map(TraversableLike.scala:238) at > scala.collection.TraversableLike.map$(TraversableLike.scala:231) at > scala.collection.AbstractTraversable.map(Traversable.scala:108) at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137) > at >
[jira] [Comment Edited] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used
[ https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504835#comment-17504835 ] Alexandros Mavrommatis edited comment on SPARK-38507 at 3/11/22, 10:01 AM: --- [~dcoliversun] if you check the exception message it says: {code:java} cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, df.field2]{code} so the schema comprehends and includes the alias as expected. As you say if "field2" was something different from "df.field2" then {code:java} df.select("df.field2").show(2){code} would throw an exception too but it actually returns a result. So I am pretty convinced that this is actually a bug. was (Author: amavrommatis): [~dcoliversun] if you check the exception message it says: cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, df.field2] so the schema comprehends and includes the alias as expected. As you say if "field2" was something different from "df.field2" then df.select("df.field2").show(2) would throw an exception too but it actually returns a result. So I am pretty convinced that this is actually a bug. > DataFrame withColumn method not adding or replacing columns when alias is used > -- > > Key: SPARK-38507 > URL: https://issues.apache.org/jira/browse/SPARK-38507 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Alexandros Mavrommatis >Priority: Major > Labels: SQL, catalyst > > I have an input DataFrame *df* created as follows: > {code:java} > import spark.implicits._ > val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code} > When I execute either this command: > {code:java} > df.select("df.field2").show(2) {code} > or that one: > {code:java} > df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code} > I get the same result: > {code:java} > +--+ > |field2| > +--+ > | 10| > | 20| > +--+ {code} > Additionally, when I execute the following command: > {code:java} > df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code} > I get this exception: > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given > input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- > Project [field1#7, field2#8, 0 AS df.field3#31] +- SubqueryAlias df > +- Project [_1#2 AS field1#7, _2#3 AS field2#8] +- LocalRelation > [_1#2, _2#3] at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at > scala.collection.TraversableLike.map(TraversableLike.scala:238) at > scala.collection.TraversableLike.map$(TraversableLike.scala:231) at > scala.collection.AbstractTraversable.map(Traversable.scala:108) at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137) > at >
[jira] [Commented] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used
[ https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504835#comment-17504835 ] Alexandros Mavrommatis commented on SPARK-38507: [~dcoliversun] if you check the exception message it says: cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, df.field2] so the schema comprehends and includes the alias as expected. As you say if "field2" was something different from "df.field2" then df.select("df.field2").show(2) would throw an exception too but it actually returns a result. So I am pretty convinced that this is actually a bug. > DataFrame withColumn method not adding or replacing columns when alias is used > -- > > Key: SPARK-38507 > URL: https://issues.apache.org/jira/browse/SPARK-38507 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Alexandros Mavrommatis >Priority: Major > Labels: SQL, catalyst > > I have an input DataFrame *df* created as follows: > {code:java} > import spark.implicits._ > val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code} > When I execute either this command: > {code:java} > df.select("df.field2").show(2) {code} > or that one: > {code:java} > df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code} > I get the same result: > {code:java} > +--+ > |field2| > +--+ > | 10| > | 20| > +--+ {code} > Additionally, when I execute the following command: > {code:java} > df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code} > I get this exception: > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given > input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- > Project [field1#7, field2#8, 0 AS df.field3#31] +- SubqueryAlias df > +- Project [_1#2 AS field1#7, _2#3 AS field2#8] +- LocalRelation > [_1#2, _2#3] at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at > scala.collection.TraversableLike.map(TraversableLike.scala:238) at > scala.collection.TraversableLike.map$(TraversableLike.scala:231) at > scala.collection.AbstractTraversable.map(Traversable.scala:108) at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:152) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:93) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:184) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:93) > at >
[jira] [Commented] (SPARK-38522) Strengthen the contract on iterator method in StateStore
[ https://issues.apache.org/jira/browse/SPARK-38522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504779#comment-17504779 ] Apache Spark commented on SPARK-38522: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/35816 > Strengthen the contract on iterator method in StateStore > > > Key: SPARK-38522 > URL: https://issues.apache.org/jira/browse/SPARK-38522 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Jungtaek Lim >Priority: Major > > The root cause of SPARK-38320 was that the logic initialized the iterator > first, and performed some updates against state store, and iterated through > iterator expecting that all updates in between should be visible in iterator. > That is not guaranteed in RocksDB state store, and the contract of Java > ConcurrentHashMap which is used in HDFSBackedStateStore does not also > guarantee it. > It would be clearer if we update the contract to draw a line on behavioral > guarantee to callers so that callers don't get such expectation. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38522) Strengthen the contract on iterator method in StateStore
[ https://issues.apache.org/jira/browse/SPARK-38522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504778#comment-17504778 ] Apache Spark commented on SPARK-38522: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/35816 > Strengthen the contract on iterator method in StateStore > > > Key: SPARK-38522 > URL: https://issues.apache.org/jira/browse/SPARK-38522 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Jungtaek Lim >Priority: Major > > The root cause of SPARK-38320 was that the logic initialized the iterator > first, and performed some updates against state store, and iterated through > iterator expecting that all updates in between should be visible in iterator. > That is not guaranteed in RocksDB state store, and the contract of Java > ConcurrentHashMap which is used in HDFSBackedStateStore does not also > guarantee it. > It would be clearer if we update the contract to draw a line on behavioral > guarantee to callers so that callers don't get such expectation. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38522) Strengthen the contract on iterator method in StateStore
[ https://issues.apache.org/jira/browse/SPARK-38522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38522: Assignee: (was: Apache Spark) > Strengthen the contract on iterator method in StateStore > > > Key: SPARK-38522 > URL: https://issues.apache.org/jira/browse/SPARK-38522 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Jungtaek Lim >Priority: Major > > The root cause of SPARK-38320 was that the logic initialized the iterator > first, and performed some updates against state store, and iterated through > iterator expecting that all updates in between should be visible in iterator. > That is not guaranteed in RocksDB state store, and the contract of Java > ConcurrentHashMap which is used in HDFSBackedStateStore does not also > guarantee it. > It would be clearer if we update the contract to draw a line on behavioral > guarantee to callers so that callers don't get such expectation. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38522) Strengthen the contract on iterator method in StateStore
[ https://issues.apache.org/jira/browse/SPARK-38522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38522: Assignee: Apache Spark > Strengthen the contract on iterator method in StateStore > > > Key: SPARK-38522 > URL: https://issues.apache.org/jira/browse/SPARK-38522 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Jungtaek Lim >Assignee: Apache Spark >Priority: Major > > The root cause of SPARK-38320 was that the logic initialized the iterator > first, and performed some updates against state store, and iterated through > iterator expecting that all updates in between should be visible in iterator. > That is not guaranteed in RocksDB state store, and the contract of Java > ConcurrentHashMap which is used in HDFSBackedStateStore does not also > guarantee it. > It would be clearer if we update the contract to draw a line on behavioral > guarantee to callers so that callers don't get such expectation. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org