[jira] [Closed] (SPARK-36593) [Deprecated] Support the Volcano Job API

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-36593.
-

> [Deprecated] Support the Volcano Job API
> 
>
> Key: SPARK-36593
> URL: https://issues.apache.org/jira/browse/SPARK-36593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Holden Karau
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36060) Support backing off dynamic allocation increases if resources are "stuck"

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-36060.
---
Resolution: Fixed

> Support backing off dynamic allocation increases if resources are "stuck"
> -
>
> Key: SPARK-36060
> URL: https://issues.apache.org/jira/browse/SPARK-36060
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Holden Karau
>Priority: Major
>
> In a over-subscribed environment we may enter a situation where our requests 
> for more pods are not going to be fulfilled. Adding more requests for more 
> pods is not going to help and may slow down the scheduler. We should detect 
> this situation and hold off on increasing pod requests until the scheduler 
> allocates more pods to us. We have a limited version of this in the Kube 
> scheduler it's self but it would be better to plumb this all the way through 
> to the DA logic.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-36060) Support backing off dynamic allocation increases if resources are "stuck"

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-36060:
---

> Support backing off dynamic allocation increases if resources are "stuck"
> -
>
> Key: SPARK-36060
> URL: https://issues.apache.org/jira/browse/SPARK-36060
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Holden Karau
>Priority: Major
>
> In a over-subscribed environment we may enter a situation where our requests 
> for more pods are not going to be fulfilled. Adding more requests for more 
> pods is not going to help and may slow down the scheduler. We should detect 
> this situation and hold off on increasing pod requests until the scheduler 
> allocates more pods to us. We have a limited version of this in the Kube 
> scheduler it's self but it would be better to plumb this all the way through 
> to the DA logic.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38524) [TEST] Change disable queue to capability limit way

2022-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505209#comment-17505209
 ] 

Dongjoon Hyun commented on SPARK-38524:
---

[~yikunkero]. FYI, here is a tip.
- Remove `[TEST]` from JIRA title.
- Add `Tests` to `Component/s:`.

> [TEST] Change disable queue to capability limit way
> ---
>
> Key: SPARK-38524
> URL: https://issues.apache.org/jira/browse/SPARK-38524
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>
> As description from [https://volcano.sh/en/docs/queue/]
> - weight is a soft constraint.
> - capability is a hard constraint.
> We better to use capability to make thing simple to avoid being influenced by 
> other queues



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38525) [TEST] Check resource after resource creation

2022-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505210#comment-17505210
 ] 

Dongjoon Hyun commented on SPARK-38525:
---

FYI, here is a tip.
- Remove `[TEST]` from JIRA title.
- Add `Tests` to `Component/s:`.

> [TEST] Check resource after resource creation
> -
>
> Key: SPARK-38525
> URL: https://issues.apache.org/jira/browse/SPARK-38525
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Tests
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-38135) Introduce `spark.kubernetes.job` sheduling related configurations

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-38135.
-

> Introduce `spark.kubernetes.job`  sheduling related configurations
> --
>
> Key: SPARK-38135
> URL: https://issues.apache.org/jira/browse/SPARK-38135
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>
> spark.kubernetes.job.minCPU:  the minimum cpu resources for running job
> spark.kubernetes.job.minMemory:  the minimum memory resources for running job
> spark.kubernetes.job.minMember: the minimum number of pods for running job
> spark.kubernetes.job.priorityClassName: the priority of the running job
> spark.kubernetes.job.queue: the queue to which the running job belongs



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38511:
--
Parent: SPARK-36057
Issue Type: Sub-task  (was: Improvement)

> Remove priorityClassName propagation in favor of explicit settings
> --
>
> Key: SPARK-38511
> URL: https://issues.apache.org/jira/browse/SPARK-38511
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38513:
--
Parent: SPARK-36057
Issue Type: Sub-task  (was: Improvement)

> Move custom scheduler-specific configs to under 
> `spark.kubernetes.scheduler.NAME` prefix
> 
>
> Key: SPARK-38513
> URL: https://issues.apache.org/jira/browse/SPARK-38513
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38527) Set the minimum Volcano version

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38527:
--
Parent: SPARK-36057
Issue Type: Sub-task  (was: Documentation)

> Set the minimum Volcano version
> ---
>
> Key: SPARK-38527
> URL: https://issues.apache.org/jira/browse/SPARK-38527
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38533) DS V2 aggregate push-down supports project with alias

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38533:


Assignee: Apache Spark

> DS V2 aggregate push-down supports project with alias
> -
>
> Key: SPARK-38533
> URL: https://issues.apache.org/jira/browse/SPARK-38533
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38533) DS V2 aggregate push-down supports project with alias

2022-03-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505206#comment-17505206
 ] 

Apache Spark commented on SPARK-38533:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/35823

> DS V2 aggregate push-down supports project with alias
> -
>
> Key: SPARK-38533
> URL: https://issues.apache.org/jira/browse/SPARK-38533
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38533) DS V2 aggregate push-down supports project with alias

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38533:


Assignee: (was: Apache Spark)

> DS V2 aggregate push-down supports project with alias
> -
>
> Key: SPARK-38533
> URL: https://issues.apache.org/jira/browse/SPARK-38533
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38533) DS V2 aggregate push-down supports project with alias

2022-03-11 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-38533:
---
Summary: DS V2 aggregate push-down supports project with alias  (was: 
Aggregate push-down supports project with alias)

> DS V2 aggregate push-down supports project with alias
> -
>
> Key: SPARK-38533
> URL: https://issues.apache.org/jira/browse/SPARK-38533
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38533) Aggregate push-down supports project with alias

2022-03-11 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-38533:
--

 Summary: Aggregate push-down supports project with alias
 Key: SPARK-38533
 URL: https://issues.apache.org/jira/browse/SPARK-38533
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38532) Add test case for invalid gapDuration of sessionwindow

2022-03-11 Thread nyingping (Jira)
nyingping created SPARK-38532:
-

 Summary: Add test case for invalid gapDuration of sessionwindow
 Key: SPARK-38532
 URL: https://issues.apache.org/jira/browse/SPARK-38532
 Project: Spark
  Issue Type: Test
  Components: Structured Streaming
Affects Versions: 3.2.1
Reporter: nyingping


Since the dynamic gapduration has been added in the session 
window[[33691|https://github.com/apache/spark/pull/33691]|[https://github.com/apache/spark/pull/33691]],
 users are allowed to enter invalid gapduration . However, for now, test cases 
are only added for zero and negative gapduration. I think it is necessary to 
add test cases for invalid gapduration.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38526) fix misleading function alias name for RuntimeReplaceable

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38526:
-

Assignee: Wenchen Fan

> fix misleading function alias name for RuntimeReplaceable
> -
>
> Key: SPARK-38526
> URL: https://issues.apache.org/jira/browse/SPARK-38526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38526) fix misleading function alias name for RuntimeReplaceable

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38526.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35821
[https://github.com/apache/spark/pull/35821]

> fix misleading function alias name for RuntimeReplaceable
> -
>
> Key: SPARK-38526
> URL: https://issues.apache.org/jira/browse/SPARK-38526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38516) Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active hadoop-provided

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38516.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35811
[https://github.com/apache/spark/pull/35811]

> Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active 
> hadoop-provided
> -
>
> Key: SPARK-38516
> URL: https://issues.apache.org/jira/browse/SPARK-38516
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> {noformat}
> Error: A JNI error has occurred, please check your installation and try again
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/core/Filter
>     at java.lang.Class.getDeclaredMethods0(Native Method)
>     at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>     at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
>     at java.lang.Class.getMethod0(Class.java:3018)
>     at java.lang.Class.getMethod(Class.java:1784)
>     at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
>     at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.core.Filter
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     ... 7 more{noformat}
> {noformat}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/LogManager
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42)
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114)
>   at 
> org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31)
>   at 
> org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985)
>   at org.apache.spark.SparkContext.(SparkContext.scala:563)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.LogManager
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 26 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38516) Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active hadoop-provided

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38516:
-

Assignee: Yuming Wang

> Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active 
> hadoop-provided
> -
>
> Key: SPARK-38516
> URL: https://issues.apache.org/jira/browse/SPARK-38516
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> {noformat}
> Error: A JNI error has occurred, please check your installation and try again
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/core/Filter
>     at java.lang.Class.getDeclaredMethods0(Native Method)
>     at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>     at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
>     at java.lang.Class.getMethod0(Class.java:3018)
>     at java.lang.Class.getMethod(Class.java:1784)
>     at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
>     at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.core.Filter
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     ... 7 more{noformat}
> {noformat}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/LogManager
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42)
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114)
>   at 
> org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31)
>   at 
> org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985)
>   at org.apache.spark.SparkContext.(SparkContext.scala:563)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.LogManager
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 26 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38531) "Prune unrequired child index" branch of ColumnPruning has wrong condition

2022-03-11 Thread Min Yang (Jira)
Min Yang created SPARK-38531:


 Summary: "Prune unrequired child index" branch of ColumnPruning 
has wrong condition
 Key: SPARK-38531
 URL: https://issues.apache.org/jira/browse/SPARK-38531
 Project: Spark
  Issue Type: Bug
  Components: Optimizer
Affects Versions: 3.2.1
Reporter: Min Yang


The "prune unrequired references" branch has the condition:
{code:java}
case p @ Project(_, g: Generate) if p.references != g.outputSet => {code}
This is wrong as generators like Inline will always enter this branch as long 
as it does not use all the generator output.

 

Example:

 

input: , b: int>>>

 

Project(a.a as x)

- Generate(Inline(col1), ..., a, b)

 

p.references is [a]

g.outputSet is [a, b]

 

This bug makes us never enter the GeneratorNestedColumnAliasing branch below 
thus miss some optimization opportunities. The condition should be
{code:java}
g.requiredChildOutput.contains(!p.references.contains(_)) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38530) GeneratorNestedColumnAliasing does not work correctly for some expressions

2022-03-11 Thread Min Yang (Jira)
Min Yang created SPARK-38530:


 Summary: GeneratorNestedColumnAliasing does not work correctly for 
some expressions
 Key: SPARK-38530
 URL: https://issues.apache.org/jira/browse/SPARK-38530
 Project: Spark
  Issue Type: Bug
  Components: Optimizer
Affects Versions: 3.2.1
Reporter: Min Yang


[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala#L226]

The code to collect ExtractValue expressions is wrong. We should do it in a 
bottom up way instead of only check 2 levels. It can cause incorrect result if 
the expression looks like ExtractValue(ExtractValue(some_other_expr)).

 

An example to trigger the bug is:

 

input: , b: 
int

 

Project(ExtractValue(ExtractValue(CaseWhen([col.a == 1, col.b]), "a"), "a")

- Generate(Explode(col1))

 

We will try to incorrectly push down the whole expression into the input of the 
Explode, now the input of CaseWhen has array<...> as input so we will get wrong 
result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38529) GeneratorNestedColumnAliasing works incorrectly for non-Explode generators

2022-03-11 Thread Min Yang (Jira)
Min Yang created SPARK-38529:


 Summary: GeneratorNestedColumnAliasing works incorrectly for 
non-Explode generators
 Key: SPARK-38529
 URL: https://issues.apache.org/jira/browse/SPARK-38529
 Project: Spark
  Issue Type: Bug
  Components: Optimizer
Affects Versions: 3.2.1
Reporter: Min Yang


The Project(_, g: Generate) branch in GeneratorNestedColumnAliasing is only 
supposed to work for ExplodeBase generators but we do not explicitly return for 
other types like Inline. Currently the bug is not trigger because there is 
another bug in the "prune unrequired child" branch in the ColumnPruning which 
makes other generators like Inline always go to that branch even if it is not 
applicable.

 

An easy example to show the bug:

Input: , 
field2 int>>>

Project(field1.field1 as ...)

- Generate(Inline(col2), ..., field1, field2)

 

We will try to incorrectly push the .field1 on field1 into the input of the 
Inline (col2).

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38528) NullPointerException when selecting a generator in a Stream of aggregate expressions

2022-03-11 Thread Bruce Robbins (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505149#comment-17505149
 ] 

Bruce Robbins commented on SPARK-38528:
---

This is a bug in {{ExtractGenerator}} in which an array ({{{}projectExprs{}}}) 
is updated from within a closure passed to a map operation (the array is 
external to the closure). If the sequence of expressions on which the map 
operation is called is a {{{}Stream{}}}, the map operation is evaluated lazily, 
so the array is not fully updated before the rule completes.

> NullPointerException when selecting a generator in a Stream of aggregate 
> expressions
> 
>
> Key: SPARK-38528
> URL: https://issues.apache.org/jira/browse/SPARK-38528
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.1, 3.3.0
>Reporter: Bruce Robbins
>Priority: Major
>
> Assume this dataframe:
> {noformat}
> val df = Seq(1, 2, 3).toDF("v")
> {noformat}
> This works:
> {noformat}
> df.select(Seq(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect
> {noformat}
> However, this doesn't:
> {noformat}
> df.select(Stream(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect
> {noformat}
> It throws this error:
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$.$anonfun$containsAggregates$1(Analyzer.scala:2516)
>   at scala.collection.immutable.List.flatMap(List.scala:366)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$.containsAggregates(Analyzer.scala:2515)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$$anonfun$apply$31.applyOrElse(Analyzer.scala:2509)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$$anonfun$apply$31.applyOrElse(Analyzer.scala:2508)
> {noformat}
> The only difference between the two queries is that the first one uses 
> {{Seq}} to specify the varargs, whereas the second one uses {{Stream}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38528) NullPointerException when selecting a generator in a Stream of aggregate expressions

2022-03-11 Thread Bruce Robbins (Jira)
Bruce Robbins created SPARK-38528:
-

 Summary: NullPointerException when selecting a generator in a 
Stream of aggregate expressions
 Key: SPARK-38528
 URL: https://issues.apache.org/jira/browse/SPARK-38528
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.1, 3.1.3, 3.3.0
Reporter: Bruce Robbins


Assume this dataframe:
{noformat}
val df = Seq(1, 2, 3).toDF("v")
{noformat}
This works:
{noformat}
df.select(Seq(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect
{noformat}
However, this doesn't:
{noformat}
df.select(Stream(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect
{noformat}
It throws this error:
{noformat}
java.lang.NullPointerException
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$.$anonfun$containsAggregates$1(Analyzer.scala:2516)
  at scala.collection.immutable.List.flatMap(List.scala:366)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$.containsAggregates(Analyzer.scala:2515)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$$anonfun$apply$31.applyOrElse(Analyzer.scala:2509)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$$anonfun$apply$31.applyOrElse(Analyzer.scala:2508)
{noformat}
The only difference between the two queries is that the first one uses {{Seq}} 
to specify the varargs, whereas the second one uses {{Stream}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class

2022-03-11 Thread Brian Schaefer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504529#comment-17504529
 ] 

Brian Schaefer edited comment on SPARK-38483 at 3/11/22, 9:57 PM:
--

The column name does differ between the two when selecting a struct field. 
However I think it makes sense to return the name that the column _would_ take 
if it were selected. Seems like this should be fairly straightforward to handle:
{code:python}
>>> df = spark.createDataFrame([{"struct": {"outer_field": {"inner_field": 
>>> 1}}}])
>>> values = F.col("struct.outer_field.inner_field")
>>> print(df.select(values).schema[0].name)
inner_field
>>> print(values._jc.toString())
struct.outer_field.inner_field
>>> print(values._jc.toString().split(".")[-1]) 
inner_field{code}
 


was (Author: JIRAUSER286367):
The column name does differ between the two when selecting a struct field. 
However I think it makes sense to print out the name that the column _would_ 
take if it were selected. Seems like this should be fairly straightforward to 
handle:
{code:python}
>>> df = spark.createDataFrame([{"struct": {"outer_field": {"inner_field": 
>>> 1}}}])
>>> values = F.col("struct.outer_field.inner_field")
>>> print(df.select(values).schema[0].name)
inner_field
>>> print(values._jc.toString())
struct.outer_field.inner_field
>>> print(values._jc.toString().split(".")[-1]) 
inner_field{code}
 

> Column name or alias as an attribute of the PySpark Column class
> 
>
> Key: SPARK-38483
> URL: https://issues.apache.org/jira/browse/SPARK-38483
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Brian Schaefer
>Priority: Minor
>  Labels: starter
>
> Having the name of a column as an attribute of PySpark {{Column}} class 
> instances can enable some convenient patterns, for example:
> Applying a function to a column and aliasing with the original name:
> {code:java}
> values = F.col("values")
> # repeating the column name as an alias
> distinct_values = F.array_distinct(values).alias("values")
> # re-using the existing column name
> distinct_values = F.array_distinct(values).alias(values._name){code}
> Checking the column name inside a custom function and applying conditional 
> logic on the name:
> {code:java}
> def custom_function(col: Column) -> Column:
> if col._name == "my_column":
> return col.astype("int")
> return col.astype("string"){code}
> The proposal in this issue is to add a property {{Column.\_name}} that 
> obtains the name or alias of a column in a similar way as currently done in 
> the {{Column.\_\_repr\_\_}} method: 
> [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.]
>  The choice of {{_name}} intentionally avoids collision with the existing 
> {{Column.name}} method, which is an alias for {{{}Column.alias{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38527) Set the minimum Volcano version

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38527:
-

Assignee: Dongjoon Hyun

> Set the minimum Volcano version
> ---
>
> Key: SPARK-38527
> URL: https://issues.apache.org/jira/browse/SPARK-38527
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38527) Set the minimum Volcano version

2022-03-11 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38527.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35822
[https://github.com/apache/spark/pull/35822]

> Set the minimum Volcano version
> ---
>
> Key: SPARK-38527
> URL: https://issues.apache.org/jira/browse/SPARK-38527
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38508) Volcano feature doesn't work on EKS graviton instances

2022-03-11 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505066#comment-17505066
 ] 

Dongjoon Hyun commented on SPARK-38508:
---

Due to this issue, I created SPARK-38527 . `latest` tag is not a good practice 
at all.

> Volcano feature doesn't work on EKS graviton instances
> --
>
> Key: SPARK-38508
> URL: https://issues.apache.org/jira/browse/SPARK-38508
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38527) Set the minimum Volcano version

2022-03-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505055#comment-17505055
 ] 

Apache Spark commented on SPARK-38527:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35822

> Set the minimum Volcano version
> ---
>
> Key: SPARK-38527
> URL: https://issues.apache.org/jira/browse/SPARK-38527
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38527) Set the minimum Volcano version

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38527:


Assignee: (was: Apache Spark)

> Set the minimum Volcano version
> ---
>
> Key: SPARK-38527
> URL: https://issues.apache.org/jira/browse/SPARK-38527
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38527) Set the minimum Volcano version

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38527:


Assignee: Apache Spark

> Set the minimum Volcano version
> ---
>
> Key: SPARK-38527
> URL: https://issues.apache.org/jira/browse/SPARK-38527
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38527) Set the minimum Volcano version

2022-03-11 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-38527:
-

 Summary: Set the minimum Volcano version
 Key: SPARK-38527
 URL: https://issues.apache.org/jira/browse/SPARK-38527
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, Kubernetes
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used

2022-03-11 Thread Alexandros Mavrommatis (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504981#comment-17504981
 ] 

Alexandros Mavrommatis edited comment on SPARK-38507 at 3/11/22, 3:48 PM:
--

[~dcoliversun] Hello again. You may check the signature of select() and 
withColumn() methods in the following links:
 * 
[https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#select(col:String,cols:String*):org.apache.spark.sql.DataFrame]
 * 
[https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame]

As you can see they have the exact same signature, so it is pretty obvious to 
me that they expect the same input.

Additionally, as I mentioned before the exception of withColumn method call was 
the following:
{code:java}
cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, 
df.field2] {code}
This clearly states that I try to access field "df.field3" which is actually 
part of the dataframe schema (it is clearly mentioned in the message).

In any case, if you have any doubts, please feel free to contact spark user 
email group.


was (Author: amavrommatis):
[~dcoliversun] Hello again. You may check the signature of select() and 
withColumn() methods in the following links:
 * 
[https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#select(col:String,cols:String*):org.apache.spark.sql.DataFrame]
 * 
[https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame]

As you can see they have the exact same signature, so it is pretty obvious to 
me that they expect the same input.

Additionally, as I mentioned before the exception of withColumn method call was 
the following:
{code:java}
cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, 
df.field2] {code}
This clearly states that I try to access field "df.field3" which is actually 
part of the dataframe schema (it is clearly mentioned in the message).

 

In any case, if you have any doubts, please feel free to contact spark user 
email group.

> DataFrame withColumn method not adding or replacing columns when alias is used
> --
>
> Key: SPARK-38507
> URL: https://issues.apache.org/jira/browse/SPARK-38507
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Alexandros Mavrommatis
>Priority: Major
>  Labels: SQL, catalyst
>
> I have an input DataFrame *df* created as follows:
> {code:java}
> import spark.implicits._
> val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code}
> When I execute either this command:
> {code:java}
> df.select("df.field2").show(2) {code}
> or that one:
> {code:java}
> df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code}
> I get the same result:
> {code:java}
> +--+
> |field2|
> +--+
> |    10|
> |    20|
> +--+ {code}
> Additionally, when I execute the following command:
> {code:java}
> df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code}
> I get this exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given 
> input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- 
> Project [field1#7, field2#8, 0 AS df.field3#31]    +- SubqueryAlias df       
> +- Project [_1#2 AS field1#7, _2#3 AS field2#8]          +- LocalRelation 
> [_1#2, _2#3]  at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342)  
>  at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>    at 
> 

[jira] [Comment Edited] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used

2022-03-11 Thread Alexandros Mavrommatis (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504981#comment-17504981
 ] 

Alexandros Mavrommatis edited comment on SPARK-38507 at 3/11/22, 3:48 PM:
--

[~dcoliversun] Hello again. You may check the signature of select() and 
withColumn() methods in the following links:
 * 
[https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#select(col:String,cols:String*):org.apache.spark.sql.DataFrame]
 * 
[https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame]

As you can see they have the exact same signature, so it is pretty obvious to 
me that they expect the same input.

Additionally, as I mentioned before the exception of withColumn method call was 
the following:
{code:java}
cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, 
df.field2] {code}
This clearly states that I try to access field "df.field3" which is actually 
part of the dataframe schema (it is clearly mentioned in the message).

 

In any case, if you have any doubts, please feel free to contact spark user 
email group.


was (Author: amavrommatis):
[~dcoliversun] Hello again. You may check the signature of select() and 
withColumn() methods in the following links:
 * 
[https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#select(col:String,cols:String*):org.apache.spark.sql.DataFrame]
 * 
[https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame]

As you can see they have the exact same signature, so it is pretty obvious that 
they expect the same input.

Additionally, as I mentioned before the exception of withColumn method call was 
the following:

 
{code:java}
cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, 
df.field2] {code}
This clearly states that I try to access field "df.field3" which is actually 
part of the dataframe schema (it is clearly mentioned in the message).

 

In any case, if you have any doubts, please feel free to contact spark user 
email group.

> DataFrame withColumn method not adding or replacing columns when alias is used
> --
>
> Key: SPARK-38507
> URL: https://issues.apache.org/jira/browse/SPARK-38507
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Alexandros Mavrommatis
>Priority: Major
>  Labels: SQL, catalyst
>
> I have an input DataFrame *df* created as follows:
> {code:java}
> import spark.implicits._
> val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code}
> When I execute either this command:
> {code:java}
> df.select("df.field2").show(2) {code}
> or that one:
> {code:java}
> df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code}
> I get the same result:
> {code:java}
> +--+
> |field2|
> +--+
> |    10|
> |    20|
> +--+ {code}
> Additionally, when I execute the following command:
> {code:java}
> df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code}
> I get this exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given 
> input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- 
> Project [field1#7, field2#8, 0 AS df.field3#31]    +- SubqueryAlias df       
> +- Project [_1#2 AS field1#7, _2#3 AS field2#8]          +- LocalRelation 
> [_1#2, _2#3]  at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342)  
>  at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>    at 
> 

[jira] [Commented] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used

2022-03-11 Thread Alexandros Mavrommatis (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504981#comment-17504981
 ] 

Alexandros Mavrommatis commented on SPARK-38507:


[~dcoliversun] Hello again. You may check the signature of select() and 
withColumn() methods in the following links:
 * 
[https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#select(col:String,cols:String*):org.apache.spark.sql.DataFrame]
 * 
[https://spark.apache.org/docs/3.1.2/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame]

As you can see they have the exact same signature, so it is pretty obvious that 
they expect the same input.

Additionally, as I mentioned before the exception of withColumn method call was 
the following:

 
{code:java}
cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, 
df.field2] {code}
This clearly states that I try to access field "df.field3" which is actually 
part of the dataframe schema (it is clearly mentioned in the message).

 

In any case, if you have any doubts, please feel free to contact spark user 
email group.

> DataFrame withColumn method not adding or replacing columns when alias is used
> --
>
> Key: SPARK-38507
> URL: https://issues.apache.org/jira/browse/SPARK-38507
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Alexandros Mavrommatis
>Priority: Major
>  Labels: SQL, catalyst
>
> I have an input DataFrame *df* created as follows:
> {code:java}
> import spark.implicits._
> val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code}
> When I execute either this command:
> {code:java}
> df.select("df.field2").show(2) {code}
> or that one:
> {code:java}
> df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code}
> I get the same result:
> {code:java}
> +--+
> |field2|
> +--+
> |    10|
> |    20|
> +--+ {code}
> Additionally, when I execute the following command:
> {code:java}
> df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code}
> I get this exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given 
> input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- 
> Project [field1#7, field2#8, 0 AS df.field3#31]    +- SubqueryAlias df       
> +- Project [_1#2 AS field1#7, _2#3 AS field2#8]          +- LocalRelation 
> [_1#2, _2#3]  at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342)  
>  at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
>    at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)   
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)   at 
> scala.collection.TraversableLike.map(TraversableLike.scala:238)   at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:231)   at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108)   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
>    at 
> 

[jira] [Commented] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used

2022-03-11 Thread qian (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504971#comment-17504971
 ] 

qian commented on SPARK-38507:
--

[~amavrommatis] 

Method *select()* regards input argument like _xx.xx_ as {_}table.column{_}, 
which is by design. I don't agree that this is actually a bug. If you stick to 
your point, you could email to spark user email group about this case.  :)

> DataFrame withColumn method not adding or replacing columns when alias is used
> --
>
> Key: SPARK-38507
> URL: https://issues.apache.org/jira/browse/SPARK-38507
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Alexandros Mavrommatis
>Priority: Major
>  Labels: SQL, catalyst
>
> I have an input DataFrame *df* created as follows:
> {code:java}
> import spark.implicits._
> val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code}
> When I execute either this command:
> {code:java}
> df.select("df.field2").show(2) {code}
> or that one:
> {code:java}
> df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code}
> I get the same result:
> {code:java}
> +--+
> |field2|
> +--+
> |    10|
> |    20|
> +--+ {code}
> Additionally, when I execute the following command:
> {code:java}
> df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code}
> I get this exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given 
> input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- 
> Project [field1#7, field2#8, 0 AS df.field3#31]    +- SubqueryAlias df       
> +- Project [_1#2 AS field1#7, _2#3 AS field2#8]          +- LocalRelation 
> [_1#2, _2#3]  at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342)  
>  at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
>    at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)   
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)   at 
> scala.collection.TraversableLike.map(TraversableLike.scala:238)   at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:231)   at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108)   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:152)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:93)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:184)   
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:93)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:90)
>    at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:155)
>    at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:176)
>    at 

[jira] [Commented] (SPARK-38526) fix misleading function alias name for RuntimeReplaceable

2022-03-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504954#comment-17504954
 ] 

Apache Spark commented on SPARK-38526:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/35821

> fix misleading function alias name for RuntimeReplaceable
> -
>
> Key: SPARK-38526
> URL: https://issues.apache.org/jira/browse/SPARK-38526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38526) fix misleading function alias name for RuntimeReplaceable

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38526:


Assignee: (was: Apache Spark)

> fix misleading function alias name for RuntimeReplaceable
> -
>
> Key: SPARK-38526
> URL: https://issues.apache.org/jira/browse/SPARK-38526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38526) fix misleading function alias name for RuntimeReplaceable

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38526:


Assignee: Apache Spark

> fix misleading function alias name for RuntimeReplaceable
> -
>
> Key: SPARK-38526
> URL: https://issues.apache.org/jira/browse/SPARK-38526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38526) fix misleading function alias name for RuntimeReplaceable

2022-03-11 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-38526:
---

 Summary: fix misleading function alias name for RuntimeReplaceable
 Key: SPARK-38526
 URL: https://issues.apache.org/jira/browse/SPARK-38526
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38525) [TEST] Check resource after resource creation

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38525:


Assignee: (was: Apache Spark)

> [TEST] Check resource after resource creation
> -
>
> Key: SPARK-38525
> URL: https://issues.apache.org/jira/browse/SPARK-38525
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Tests
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38525) [TEST] Check resource after resource creation

2022-03-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504934#comment-17504934
 ] 

Apache Spark commented on SPARK-38525:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35820

> [TEST] Check resource after resource creation
> -
>
> Key: SPARK-38525
> URL: https://issues.apache.org/jira/browse/SPARK-38525
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Tests
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38525) [TEST] Check resource after resource creation

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38525:


Assignee: Apache Spark

> [TEST] Check resource after resource creation
> -
>
> Key: SPARK-38525
> URL: https://issues.apache.org/jira/browse/SPARK-38525
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Tests
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38525) [TEST] Check resource after resource creation

2022-03-11 Thread Yikun Jiang (Jira)
Yikun Jiang created SPARK-38525:
---

 Summary: [TEST] Check resource after resource creation
 Key: SPARK-38525
 URL: https://issues.apache.org/jira/browse/SPARK-38525
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes, Tests
Affects Versions: 3.3.0
Reporter: Yikun Jiang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38524) [TEST] Change disable queue to capability limit way

2022-03-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504930#comment-17504930
 ] 

Apache Spark commented on SPARK-38524:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35819

> [TEST] Change disable queue to capability limit way
> ---
>
> Key: SPARK-38524
> URL: https://issues.apache.org/jira/browse/SPARK-38524
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>
> As description from [https://volcano.sh/en/docs/queue/]
> - weight is a soft constraint.
> - capability is a hard constraint.
> We better to use capability to make thing simple to avoid being influenced by 
> other queues



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38524) [TEST] Change disable queue to capability limit way

2022-03-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504928#comment-17504928
 ] 

Apache Spark commented on SPARK-38524:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35819

> [TEST] Change disable queue to capability limit way
> ---
>
> Key: SPARK-38524
> URL: https://issues.apache.org/jira/browse/SPARK-38524
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>
> As description from [https://volcano.sh/en/docs/queue/]
> - weight is a soft constraint.
> - capability is a hard constraint.
> We better to use capability to make thing simple to avoid being influenced by 
> other queues



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38524) [TEST] Change disable queue to capability limit way

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38524:


Assignee: (was: Apache Spark)

> [TEST] Change disable queue to capability limit way
> ---
>
> Key: SPARK-38524
> URL: https://issues.apache.org/jira/browse/SPARK-38524
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>
> As description from [https://volcano.sh/en/docs/queue/]
> - weight is a soft constraint.
> - capability is a hard constraint.
> We better to use capability to make thing simple to avoid being influenced by 
> other queues



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38524) [TEST] Change disable queue to capability limit way

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38524:


Assignee: Apache Spark

> [TEST] Change disable queue to capability limit way
> ---
>
> Key: SPARK-38524
> URL: https://issues.apache.org/jira/browse/SPARK-38524
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>
> As description from [https://volcano.sh/en/docs/queue/]
> - weight is a soft constraint.
> - capability is a hard constraint.
> We better to use capability to make thing simple to avoid being influenced by 
> other queues



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38524) [TEST] Change disable queue to capability limit way

2022-03-11 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang updated SPARK-38524:

Summary: [TEST] Change disable queue to capability limit way  (was: Change 
disable queue to capability limit way)

> [TEST] Change disable queue to capability limit way
> ---
>
> Key: SPARK-38524
> URL: https://issues.apache.org/jira/browse/SPARK-38524
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>
> As description from [https://volcano.sh/en/docs/queue/]
> - weight is a soft constraint.
> - capability is a hard constraint.
> We better to use capability to make thing simple to avoid being influenced by 
> other queues



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38196) Reactor framework so as JDBC dialect could compile expression by self way

2022-03-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504927#comment-17504927
 ] 

Apache Spark commented on SPARK-38196:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/35818

> Reactor framework so as JDBC dialect could compile expression by self way
> -
>
> Key: SPARK-38196
> URL: https://issues.apache.org/jira/browse/SPARK-38196
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> https://issues.apache.org/jira/browse/SPARK-37960 provides a new framework to 
> represent catalyst expressions in DS V2 APIs.
> Because the framework translate all catalyst expressions to a unified SQL 
> string and cannot keep compatibility between different JDBC database, the 
> framework works not good.
> This PR reactor the framework so as JDBC dialect could compile expression by 
> self way.
> First, The framework translate catalyst expressions to DS V2 expression.
> Second, The JDBC dialect could compile DS V2 expression to different SQL 
> syntax.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38524) Change disable queue to capability limit way

2022-03-11 Thread Yikun Jiang (Jira)
Yikun Jiang created SPARK-38524:
---

 Summary: Change disable queue to capability limit way
 Key: SPARK-38524
 URL: https://issues.apache.org/jira/browse/SPARK-38524
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: Yikun Jiang


As description from [https://volcano.sh/en/docs/queue/]

- weight is a soft constraint.

- capability is a hard constraint.

We better to use capability to make thing simple to avoid being influenced by 
other queues



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38523) Failure on referring to the corrupt record from CSV

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38523:


Assignee: (was: Apache Spark)

> Failure on referring to the corrupt record from CSV
> ---
>
> Key: SPARK-38523
> URL: https://issues.apache.org/jira/browse/SPARK-38523
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> The file below has a invalid value in a field:
> {code:java}
> 0,2013-111_11 12:13:14
> 1,1983-08-04 {code}
> where the timestamp 2013-111_11 12:13:14 is incorrect.
> The query fails when it refers to the corrupt record column:
> {code:java}
> spark.read.format("csv")
>  .option("header", "true")
>  .schema(schema)
>  .load("csv_corrupt_record.csv")
>  .filter($"_corrupt_record".isNotNull) {code}
> with the exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: 
> Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
> referenced columns only include the internal corrupt record column
> (named _corrupt_record by default). For example:
> spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count()
> and spark.read.schema(schema).csv(file).select("_corrupt_record").show().
> Instead, you can cache or save the parsed results and then send the same 
> query.
> For example, val df = spark.read.schema(schema).csv(file).cache() and then
> df.filter($"_corrupt_record".isNotNull).count().
>       
>     at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.queryFromRawFilesIncludeCorruptRecordColumnError(QueryCompilationErrors.scala:2047)
>     at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:116)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38523) Failure on referring to the corrupt record from CSV

2022-03-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504910#comment-17504910
 ] 

Apache Spark commented on SPARK-38523:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/35817

> Failure on referring to the corrupt record from CSV
> ---
>
> Key: SPARK-38523
> URL: https://issues.apache.org/jira/browse/SPARK-38523
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> The file below has a invalid value in a field:
> {code:java}
> 0,2013-111_11 12:13:14
> 1,1983-08-04 {code}
> where the timestamp 2013-111_11 12:13:14 is incorrect.
> The query fails when it refers to the corrupt record column:
> {code:java}
> spark.read.format("csv")
>  .option("header", "true")
>  .schema(schema)
>  .load("csv_corrupt_record.csv")
>  .filter($"_corrupt_record".isNotNull) {code}
> with the exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: 
> Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
> referenced columns only include the internal corrupt record column
> (named _corrupt_record by default). For example:
> spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count()
> and spark.read.schema(schema).csv(file).select("_corrupt_record").show().
> Instead, you can cache or save the parsed results and then send the same 
> query.
> For example, val df = spark.read.schema(schema).csv(file).cache() and then
> df.filter($"_corrupt_record".isNotNull).count().
>       
>     at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.queryFromRawFilesIncludeCorruptRecordColumnError(QueryCompilationErrors.scala:2047)
>     at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:116)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38523) Failure on referring to the corrupt record from CSV

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38523:


Assignee: Apache Spark

> Failure on referring to the corrupt record from CSV
> ---
>
> Key: SPARK-38523
> URL: https://issues.apache.org/jira/browse/SPARK-38523
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The file below has a invalid value in a field:
> {code:java}
> 0,2013-111_11 12:13:14
> 1,1983-08-04 {code}
> where the timestamp 2013-111_11 12:13:14 is incorrect.
> The query fails when it refers to the corrupt record column:
> {code:java}
> spark.read.format("csv")
>  .option("header", "true")
>  .schema(schema)
>  .load("csv_corrupt_record.csv")
>  .filter($"_corrupt_record".isNotNull) {code}
> with the exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: 
> Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
> referenced columns only include the internal corrupt record column
> (named _corrupt_record by default). For example:
> spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count()
> and spark.read.schema(schema).csv(file).select("_corrupt_record").show().
> Instead, you can cache or save the parsed results and then send the same 
> query.
> For example, val df = spark.read.schema(schema).csv(file).cache() and then
> df.filter($"_corrupt_record".isNotNull).count().
>       
>     at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.queryFromRawFilesIncludeCorruptRecordColumnError(QueryCompilationErrors.scala:2047)
>     at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:116)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38523) Failure on referring to the corrupt record from CSV

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38523:


Assignee: Apache Spark

> Failure on referring to the corrupt record from CSV
> ---
>
> Key: SPARK-38523
> URL: https://issues.apache.org/jira/browse/SPARK-38523
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The file below has a invalid value in a field:
> {code:java}
> 0,2013-111_11 12:13:14
> 1,1983-08-04 {code}
> where the timestamp 2013-111_11 12:13:14 is incorrect.
> The query fails when it refers to the corrupt record column:
> {code:java}
> spark.read.format("csv")
>  .option("header", "true")
>  .schema(schema)
>  .load("csv_corrupt_record.csv")
>  .filter($"_corrupt_record".isNotNull) {code}
> with the exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: 
> Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
> referenced columns only include the internal corrupt record column
> (named _corrupt_record by default). For example:
> spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count()
> and spark.read.schema(schema).csv(file).select("_corrupt_record").show().
> Instead, you can cache or save the parsed results and then send the same 
> query.
> For example, val df = spark.read.schema(schema).csv(file).cache() and then
> df.filter($"_corrupt_record".isNotNull).count().
>       
>     at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.queryFromRawFilesIncludeCorruptRecordColumnError(QueryCompilationErrors.scala:2047)
>     at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:116)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38523) Failure on referring to the corrupt record from CSV

2022-03-11 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38523:
-
Description: 
The file below has a invalid value in a field:
{code:java}
0,2013-111_11 12:13:14
1,1983-08-04 {code}
where the timestamp 2013-111_11 12:13:14 is incorrect.

The query fails when it refers to the corrupt record column:
{code:java}
spark.read.format("csv")
 .option("header", "true")
 .schema(schema)
 .load("csv_corrupt_record.csv")
 .filter($"_corrupt_record".isNotNull) {code}
with the exception:
{code:java}
org.apache.spark.sql.AnalysisException: 
Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
referenced columns only include the internal corrupt record column
(named _corrupt_record by default). For example:
spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count()
and spark.read.schema(schema).csv(file).select("_corrupt_record").show().
Instead, you can cache or save the parsed results and then send the same query.
For example, val df = spark.read.schema(schema).csv(file).cache() and then
df.filter($"_corrupt_record".isNotNull).count().
      
    at 
org.apache.spark.sql.errors.QueryCompilationErrors$.queryFromRawFilesIncludeCorruptRecordColumnError(QueryCompilationErrors.scala:2047)
    at 
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:116)
 {code}

  was:
The file below has a invalid value in a field:
{code:java}
0,2013-111_11 12:13:14
1,1983-08-04 {code}
where the timestamp 2013-111_11 12:13:14 is incorrect.

The query fails when it refers to the corrupt record column:
{code:java}
spark.read.format("csv")
 .option("header", "true")
 .schema(schema)
 .load("csv_corrupt_record.csv")
 .filter($"_corrupt_record".isNotNull) {code}
with the exception:

 


> Failure on referring to the corrupt record from CSV
> ---
>
> Key: SPARK-38523
> URL: https://issues.apache.org/jira/browse/SPARK-38523
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> The file below has a invalid value in a field:
> {code:java}
> 0,2013-111_11 12:13:14
> 1,1983-08-04 {code}
> where the timestamp 2013-111_11 12:13:14 is incorrect.
> The query fails when it refers to the corrupt record column:
> {code:java}
> spark.read.format("csv")
>  .option("header", "true")
>  .schema(schema)
>  .load("csv_corrupt_record.csv")
>  .filter($"_corrupt_record".isNotNull) {code}
> with the exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: 
> Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
> referenced columns only include the internal corrupt record column
> (named _corrupt_record by default). For example:
> spark.read.schema(schema).csv(file).filter($"_corrupt_record".isNotNull).count()
> and spark.read.schema(schema).csv(file).select("_corrupt_record").show().
> Instead, you can cache or save the parsed results and then send the same 
> query.
> For example, val df = spark.read.schema(schema).csv(file).cache() and then
> df.filter($"_corrupt_record".isNotNull).count().
>       
>     at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.queryFromRawFilesIncludeCorruptRecordColumnError(QueryCompilationErrors.scala:2047)
>     at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:116)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38523) Failure on referring to the corrupt record from CSV

2022-03-11 Thread Max Gekk (Jira)
Max Gekk created SPARK-38523:


 Summary: Failure on referring to the corrupt record from CSV
 Key: SPARK-38523
 URL: https://issues.apache.org/jira/browse/SPARK-38523
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Max Gekk


The file below has a invalid value in a field:
{code:java}
0,2013-111_11 12:13:14
1,1983-08-04 {code}
where the timestamp 2013-111_11 12:13:14 is incorrect.

The query fails when it refers to the corrupt record column:
{code:java}
spark.read.format("csv")
 .option("header", "true")
 .schema(schema)
 .load("csv_corrupt_record.csv")
 .filter($"_corrupt_record".isNotNull) {code}
with the exception:

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38518) Implement `skipna` of `Series.all/Index.all` to exclude NA/null values

2022-03-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38518.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35813
[https://github.com/apache/spark/pull/35813]

> Implement `skipna` of `Series.all/Index.all` to exclude NA/null values
> --
>
> Key: SPARK-38518
> URL: https://issues.apache.org/jira/browse/SPARK-38518
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.3.0
>
>
> Implement `skipna` of `Series.all/Index.all` to exclude NA/null values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38518) Implement `skipna` of `Series.all/Index.all` to exclude NA/null values

2022-03-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-38518:


Assignee: Xinrong Meng

> Implement `skipna` of `Series.all/Index.all` to exclude NA/null values
> --
>
> Key: SPARK-38518
> URL: https://issues.apache.org/jira/browse/SPARK-38518
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Implement `skipna` of `Series.all/Index.all` to exclude NA/null values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38491) Support `ignore_index` of `Series.sort_values`

2022-03-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-38491:


Assignee: Xinrong Meng

> Support `ignore_index` of `Series.sort_values`
> --
>
> Key: SPARK-38491
> URL: https://issues.apache.org/jira/browse/SPARK-38491
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Support `ignore_index` of `Series.sort_values`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38491) Support `ignore_index` of `Series.sort_values`

2022-03-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38491.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35794
[https://github.com/apache/spark/pull/35794]

> Support `ignore_index` of `Series.sort_values`
> --
>
> Key: SPARK-38491
> URL: https://issues.apache.org/jira/browse/SPARK-38491
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.3.0
>
>
> Support `ignore_index` of `Series.sort_values`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38107) Use error classes in the compilation errors of python/pandas UDFs

2022-03-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38107.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35656
[https://github.com/apache/spark/pull/35656]

> Use error classes in the compilation errors of python/pandas UDFs
> -
>
> Key: SPARK-38107
> URL: https://issues.apache.org/jira/browse/SPARK-38107
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.3.0
>
>
> Migrate the following errors in QueryCompilationErrors:
> * pandasUDFAggregateNotSupportedInPivotError
> * groupAggPandasUDFUnsupportedByStreamingAggError
> * cannotUseMixtureOfAggFunctionAndGroupAggPandasUDFError
> * usePythonUDFInJoinConditionUnsupportedError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38107) Use error classes in the compilation errors of python/pandas UDFs

2022-03-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-38107:


Assignee: Haejoon Lee

> Use error classes in the compilation errors of python/pandas UDFs
> -
>
> Key: SPARK-38107
> URL: https://issues.apache.org/jira/browse/SPARK-38107
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Haejoon Lee
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * pandasUDFAggregateNotSupportedInPivotError
> * groupAggPandasUDFUnsupportedByStreamingAggError
> * cannotUseMixtureOfAggFunctionAndGroupAggPandasUDFError
> * usePythonUDFInJoinConditionUnsupportedError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38515) Volcano queue is not deleted

2022-03-11 Thread Yikun Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504838#comment-17504838
 ] 

Yikun Jiang commented on SPARK-38515:
-

docker rmi volcanosh/vc-scheduler:latest

docker rmi volcanosh/vc-webhook-manager:latest

docker rmi volcanosh/vc-controller-manager:latest

kubectl apply -f 
https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml

 

Please try above, first cleanup docker images, then apply new deploy.

 

I also submit a issue on volcano side to change `IfNotPresent` to `latest`

[1] https://github.com/volcano-sh/volcano/issues/2072

> Volcano queue is not deleted
> 
>
> Key: SPARK-38515
> URL: https://issues.apache.org/jira/browse/SPARK-38515
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> {code}
> $ k delete queue queue0
> Error from server: admission webhook "validatequeue.volcano.sh" denied the 
> request: only queue with state `Closed` can be deleted, queue `queue0` state 
> is `Open`
> {code}
> {code}
> [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED 
> *** (7 minutes, 40 seconds)
> [info]   io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: DELETE at: 
> https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g.
>  Message: admission webhook "validatequeue.volcano.sh" denied the request: 
> only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is 
> `Open`. Received status: Status(apiVersion=v1, code=400, details=null, 
> kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the 
> request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` 
> state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, 
> status=Failure, additionalProperties={}).
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38508) Volcano feature doesn't work on EKS graviton instances

2022-03-11 Thread Yikun Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504836#comment-17504836
 ] 

Yikun Jiang commented on SPARK-38508:
-

docker rmi volcanosh/vc-scheduler-arm64:latest

docker rmi volcanosh/vc-webhook-manager-arm64:latest

docker rmi volcanosh/vc-controller-manager-arm64:latest

kubectl apply -f 
https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development-arm64.yaml

 

Please try this, first cleanup docker images, then apply new deploy.

I also submit a issue on volcano side to change `IfNotPresent` to `latest`

[1] https://github.com/volcano-sh/volcano/issues/2072

> Volcano feature doesn't work on EKS graviton instances
> --
>
> Key: SPARK-38508
> URL: https://issues.apache.org/jira/browse/SPARK-38508
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used

2022-03-11 Thread Alexandros Mavrommatis (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504835#comment-17504835
 ] 

Alexandros Mavrommatis edited comment on SPARK-38507 at 3/11/22, 10:01 AM:
---

[~dcoliversun] if you check the exception message it says:
{code:java}
cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, 
df.field2]{code}
so the schema comprehends and includes the alias as expected.

As you say if "field2" was something different from "df.field2" then 
{code:java}
df.select("df.field2").show(2){code}
would throw an exception too but instead it returns a result. So I am pretty 
convinced that this is actually a bug.


was (Author: amavrommatis):
[~dcoliversun] if you check the exception message it says:
{code:java}
cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, 
df.field2]{code}
so the schema comprehends and includes the alias as expected.

As you say if "field2" was something different from "df.field2" then 
{code:java}
df.select("df.field2").show(2){code}
would throw an exception too but it actually returns a result. So I am pretty 
convinced that this is actually a bug.

> DataFrame withColumn method not adding or replacing columns when alias is used
> --
>
> Key: SPARK-38507
> URL: https://issues.apache.org/jira/browse/SPARK-38507
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Alexandros Mavrommatis
>Priority: Major
>  Labels: SQL, catalyst
>
> I have an input DataFrame *df* created as follows:
> {code:java}
> import spark.implicits._
> val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code}
> When I execute either this command:
> {code:java}
> df.select("df.field2").show(2) {code}
> or that one:
> {code:java}
> df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code}
> I get the same result:
> {code:java}
> +--+
> |field2|
> +--+
> |    10|
> |    20|
> +--+ {code}
> Additionally, when I execute the following command:
> {code:java}
> df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code}
> I get this exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given 
> input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- 
> Project [field1#7, field2#8, 0 AS df.field3#31]    +- SubqueryAlias df       
> +- Project [_1#2 AS field1#7, _2#3 AS field2#8]          +- LocalRelation 
> [_1#2, _2#3]  at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342)  
>  at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
>    at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)   
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)   at 
> scala.collection.TraversableLike.map(TraversableLike.scala:238)   at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:231)   at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108)   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
>    at 
> 

[jira] [Comment Edited] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used

2022-03-11 Thread Alexandros Mavrommatis (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504835#comment-17504835
 ] 

Alexandros Mavrommatis edited comment on SPARK-38507 at 3/11/22, 10:01 AM:
---

[~dcoliversun] if you check the exception message it says:
{code:java}
cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, 
df.field2]{code}
so the schema comprehends and includes the alias as expected.

As you say if "field2" was something different from "df.field2" then 
{code:java}
df.select("df.field2").show(2){code}
would throw an exception too but it actually returns a result. So I am pretty 
convinced that this is actually a bug.


was (Author: amavrommatis):
[~dcoliversun] if you check the exception message it says:
cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, 
df.field2]
so the schema comprehends and includes the alias as expected.

As you say if "field2" was something different from "df.field2" then 
df.select("df.field2").show(2)
would throw an exception too but it actually returns a result. So I am pretty 
convinced that this is actually a bug.

> DataFrame withColumn method not adding or replacing columns when alias is used
> --
>
> Key: SPARK-38507
> URL: https://issues.apache.org/jira/browse/SPARK-38507
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Alexandros Mavrommatis
>Priority: Major
>  Labels: SQL, catalyst
>
> I have an input DataFrame *df* created as follows:
> {code:java}
> import spark.implicits._
> val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code}
> When I execute either this command:
> {code:java}
> df.select("df.field2").show(2) {code}
> or that one:
> {code:java}
> df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code}
> I get the same result:
> {code:java}
> +--+
> |field2|
> +--+
> |    10|
> |    20|
> +--+ {code}
> Additionally, when I execute the following command:
> {code:java}
> df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code}
> I get this exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given 
> input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- 
> Project [field1#7, field2#8, 0 AS df.field3#31]    +- SubqueryAlias df       
> +- Project [_1#2 AS field1#7, _2#3 AS field2#8]          +- LocalRelation 
> [_1#2, _2#3]  at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342)  
>  at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
>    at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)   
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)   at 
> scala.collection.TraversableLike.map(TraversableLike.scala:238)   at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:231)   at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108)   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
>    at 
> 

[jira] [Commented] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used

2022-03-11 Thread Alexandros Mavrommatis (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504835#comment-17504835
 ] 

Alexandros Mavrommatis commented on SPARK-38507:


[~dcoliversun] if you check the exception message it says:
cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, 
df.field2]
so the schema comprehends and includes the alias as expected.

As you say if "field2" was something different from "df.field2" then 
df.select("df.field2").show(2)
would throw an exception too but it actually returns a result. So I am pretty 
convinced that this is actually a bug.

> DataFrame withColumn method not adding or replacing columns when alias is used
> --
>
> Key: SPARK-38507
> URL: https://issues.apache.org/jira/browse/SPARK-38507
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Alexandros Mavrommatis
>Priority: Major
>  Labels: SQL, catalyst
>
> I have an input DataFrame *df* created as follows:
> {code:java}
> import spark.implicits._
> val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code}
> When I execute either this command:
> {code:java}
> df.select("df.field2").show(2) {code}
> or that one:
> {code:java}
> df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code}
> I get the same result:
> {code:java}
> +--+
> |field2|
> +--+
> |    10|
> |    20|
> +--+ {code}
> Additionally, when I execute the following command:
> {code:java}
> df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code}
> I get this exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given 
> input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- 
> Project [field1#7, field2#8, 0 AS df.field3#31]    +- SubqueryAlias df       
> +- Project [_1#2 AS field1#7, _2#3 AS field2#8]          +- LocalRelation 
> [_1#2, _2#3]  at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342)  
>  at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
>    at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)   
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)   at 
> scala.collection.TraversableLike.map(TraversableLike.scala:238)   at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:231)   at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108)   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:152)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:93)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:184)   
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:93)
>    at 
> 

[jira] [Commented] (SPARK-38522) Strengthen the contract on iterator method in StateStore

2022-03-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504779#comment-17504779
 ] 

Apache Spark commented on SPARK-38522:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/35816

> Strengthen the contract on iterator method in StateStore
> 
>
> Key: SPARK-38522
> URL: https://issues.apache.org/jira/browse/SPARK-38522
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> The root cause of SPARK-38320 was that the logic initialized the iterator 
> first, and performed some updates against state store, and iterated through 
> iterator expecting that all updates in between should be visible in iterator.
> That is not guaranteed in RocksDB state store, and the contract of Java 
> ConcurrentHashMap which is used in HDFSBackedStateStore does not also 
> guarantee it.
> It would be clearer if we update the contract to draw a line on behavioral 
> guarantee to callers so that callers don't get such expectation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38522) Strengthen the contract on iterator method in StateStore

2022-03-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504778#comment-17504778
 ] 

Apache Spark commented on SPARK-38522:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/35816

> Strengthen the contract on iterator method in StateStore
> 
>
> Key: SPARK-38522
> URL: https://issues.apache.org/jira/browse/SPARK-38522
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> The root cause of SPARK-38320 was that the logic initialized the iterator 
> first, and performed some updates against state store, and iterated through 
> iterator expecting that all updates in between should be visible in iterator.
> That is not guaranteed in RocksDB state store, and the contract of Java 
> ConcurrentHashMap which is used in HDFSBackedStateStore does not also 
> guarantee it.
> It would be clearer if we update the contract to draw a line on behavioral 
> guarantee to callers so that callers don't get such expectation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38522) Strengthen the contract on iterator method in StateStore

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38522:


Assignee: (was: Apache Spark)

> Strengthen the contract on iterator method in StateStore
> 
>
> Key: SPARK-38522
> URL: https://issues.apache.org/jira/browse/SPARK-38522
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> The root cause of SPARK-38320 was that the logic initialized the iterator 
> first, and performed some updates against state store, and iterated through 
> iterator expecting that all updates in between should be visible in iterator.
> That is not guaranteed in RocksDB state store, and the contract of Java 
> ConcurrentHashMap which is used in HDFSBackedStateStore does not also 
> guarantee it.
> It would be clearer if we update the contract to draw a line on behavioral 
> guarantee to callers so that callers don't get such expectation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38522) Strengthen the contract on iterator method in StateStore

2022-03-11 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38522:


Assignee: Apache Spark

> Strengthen the contract on iterator method in StateStore
> 
>
> Key: SPARK-38522
> URL: https://issues.apache.org/jira/browse/SPARK-38522
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> The root cause of SPARK-38320 was that the logic initialized the iterator 
> first, and performed some updates against state store, and iterated through 
> iterator expecting that all updates in between should be visible in iterator.
> That is not guaranteed in RocksDB state store, and the contract of Java 
> ConcurrentHashMap which is used in HDFSBackedStateStore does not also 
> guarantee it.
> It would be clearer if we update the contract to draw a line on behavioral 
> guarantee to callers so that callers don't get such expectation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org