[jira] [Updated] (SPARK-24266) Spark client terminates while driver is still running
[ https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24266: -- Fix Version/s: 3.0.2 > Spark client terminates while driver is still running > - > > Key: SPARK-24266 > URL: https://issues.apache.org/jira/browse/SPARK-24266 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core >Affects Versions: 2.3.0, 3.0.0 >Reporter: Chun Chen >Assignee: Stijn De Haes >Priority: Critical > Fix For: 3.0.2, 3.1.0 > > > {code} > Warning: Ignoring non-spark config property: Default=system properties > included when running spark-submit. > 18/05/11 14:50:12 WARN Config: Error reading service account token from: > [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring. > 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: > Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf) > 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. > Mounting Hadoop specific files > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: N/A >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: 2018-05-11T06:50:17Z >container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9 >phase: Pending >status: [ContainerStatus(containerID=null, > image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, > lastState=ContainerState(running=null, terminated=null, waiting=null, > additionalProperties={}), name=spark-kubernetes-driver, ready=false, > restartCount=0, state=ContainerState(running=null, terminated=null, > waiting=ContainerStateWaiting(message=null, reason=PodInitializing, > additionalProperties={}), additionalProperties={}), additionalProperties={})] > 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to > finish... > 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, >
[jira] [Commented] (SPARK-29392) Remove use of deprecated symbol literal " 'name " syntax in favor Symbol("name")
[ https://issues.apache.org/jira/browse/SPARK-29392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225169#comment-17225169 ] Yang Jie commented on SPARK-29392: -- It seems that there are still many similar issue, especially the catalyst module and sql module. Maven's compilation warnings log will only print 100 lines, and when we fix some of them, another part of the compilation warnings will appear > Remove use of deprecated symbol literal " 'name " syntax in favor > Symbol("name") > > > Key: SPARK-29392 > URL: https://issues.apache.org/jira/browse/SPARK-29392 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Tests >Affects Versions: 3.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > Fix For: 3.0.0 > > > Example: > {code} > [WARNING] [Warn] > /Users/seanowen/Documents/spark_2.13/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala:308: > symbol literal is deprecated; use Symbol("assertInvariants") instead > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33324) Upgrade kubernetes-client to 4.11.1
[ https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33324. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30233 [https://github.com/apache/spark/pull/30233] > Upgrade kubernetes-client to 4.11.1 > --- > > Key: SPARK-33324 > URL: https://issues.apache.org/jira/browse/SPARK-33324 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33324) Upgrade kubernetes-client to 4.11.1
[ https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33324: - Assignee: Dongjoon Hyun > Upgrade kubernetes-client to 4.11.1 > --- > > Key: SPARK-33324 > URL: https://issues.apache.org/jira/browse/SPARK-33324 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225154#comment-17225154 ] Dongjoon Hyun commented on SPARK-33317: --- Since it's already resolved, it's okay, [~hyukjin.kwon]. > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Major > Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225150#comment-17225150 ] Hyukjin Kwon edited comment on SPARK-33317 at 11/3/20, 6:00 AM: [~dongjoon], feel free to reopen if you think this ticket should be accessed further. I tend to take an action to JIRAs a bit aggressively so any correction to my action is welcome :-). was (Author: hyukjin.kwon): [~dongjoon], feel free to reopen if you think this ticket should be accessed further. I tend to take an action to JIRAs a bit aggressively so welcome to any correction to my action :-). > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Major > Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225150#comment-17225150 ] Hyukjin Kwon commented on SPARK-33317: -- [~dongjoon], feel free to reopen if you think this ticket should be accessed further. I tend to take an action to JIRAs a bit aggressively so welcome to any correction to my action :-). > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Major > Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33156) Upgrade GithubAction image from 18.04 to 20.04
[ https://issues.apache.org/jira/browse/SPARK-33156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225149#comment-17225149 ] Dongjoon Hyun commented on SPARK-33156: --- This is backported as a preparation for AmbLab Jenkins Farm OS upgrade (to `Ubuntu 20.04`). > Upgrade GithubAction image from 18.04 to 20.04 > -- > > Key: SPARK-33156 > URL: https://issues.apache.org/jira/browse/SPARK-33156 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.0.2, 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33156) Upgrade GithubAction image from 18.04 to 20.04
[ https://issues.apache.org/jira/browse/SPARK-33156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33156: -- Priority: Major (was: Minor) > Upgrade GithubAction image from 18.04 to 20.04 > -- > > Key: SPARK-33156 > URL: https://issues.apache.org/jira/browse/SPARK-33156 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.2, 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33156) Upgrade GithubAction image from 18.04 to 20.04
[ https://issues.apache.org/jira/browse/SPARK-33156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33156: -- Fix Version/s: 3.0.2 > Upgrade GithubAction image from 18.04 to 20.04 > -- > > Key: SPARK-33156 > URL: https://issues.apache.org/jira/browse/SPARK-33156 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.0.2, 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225147#comment-17225147 ] Dongjoon Hyun commented on SPARK-33317: --- [~hyukjin.kwon]. The reported case was incorrect from the beginning, but we had better try to understand the reported situation correctly. > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Major > Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225144#comment-17225144 ] Dongjoon Hyun commented on SPARK-33317: --- Hi, [~qwe1398775315]. You are right there but the reason I asked [~dgodnaik] about the background and context is that he reported an empty dataframe. He is describing another situation and I'm trying to understand his procedure. > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Major > Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33317. -- Resolution: Not A Problem > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Major > Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225141#comment-17225141 ] Liu Neng commented on SPARK-33317: -- I run these sql on spark 3.0.0, condition 1 +(between ' 1000405134' and '1000772585')+ find 6012 records, condition 2 ++(between '1000405134' and '1000772585'++) find 2798 records. I find that comparator in codegen is UTF8String !image-2020-11-03-13-30-12-049.png! " 1000405134" is smaller than "1000405134" I think that it isn't an issue, because comparing value is String not Number. I tried to analyze the parse tree, "1000405134" is a String literal. > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Major > Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Neng updated SPARK-33317: - Attachment: image-2020-11-03-13-30-12-049.png > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Major > Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-33324) Upgrade kubernetes-client to 4.11.1
[ https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33324: -- Comment: was deleted (was: User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30233) > Upgrade kubernetes-client to 4.11.1 > --- > > Key: SPARK-33324 > URL: https://issues.apache.org/jira/browse/SPARK-33324 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33324) Upgrade kubernetes-client to 4.11.1
[ https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225126#comment-17225126 ] Apache Spark commented on SPARK-33324: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30233 > Upgrade kubernetes-client to 4.11.1 > --- > > Key: SPARK-33324 > URL: https://issues.apache.org/jira/browse/SPARK-33324 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33324) Upgrade kubernetes-client to 4.11.1
[ https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225127#comment-17225127 ] Apache Spark commented on SPARK-33324: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30233 > Upgrade kubernetes-client to 4.11.1 > --- > > Key: SPARK-33324 > URL: https://issues.apache.org/jira/browse/SPARK-33324 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33324) Upgrade kubernetes-client to 4.11.1
[ https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33324: Assignee: Apache Spark > Upgrade kubernetes-client to 4.11.1 > --- > > Key: SPARK-33324 > URL: https://issues.apache.org/jira/browse/SPARK-33324 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33324) Upgrade kubernetes-client to 4.11.1
[ https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33324: Assignee: (was: Apache Spark) > Upgrade kubernetes-client to 4.11.1 > --- > > Key: SPARK-33324 > URL: https://issues.apache.org/jira/browse/SPARK-33324 > Project: Spark > Issue Type: Sub-task > Components: Build, Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33324) Upgrade kubernetes-client to 4.11.1
Dongjoon Hyun created SPARK-33324: - Summary: Upgrade kubernetes-client to 4.11.1 Key: SPARK-33324 URL: https://issues.apache.org/jira/browse/SPARK-33324 Project: Spark Issue Type: Sub-task Components: Build, Kubernetes Affects Versions: 3.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24266) Spark client terminates while driver is still running
[ https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24266: -- Priority: Critical (was: Major) > Spark client terminates while driver is still running > - > > Key: SPARK-24266 > URL: https://issues.apache.org/jira/browse/SPARK-24266 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core >Affects Versions: 2.3.0, 3.0.0 >Reporter: Chun Chen >Assignee: Stijn De Haes >Priority: Critical > Fix For: 3.1.0 > > > {code} > Warning: Ignoring non-spark config property: Default=system properties > included when running spark-submit. > 18/05/11 14:50:12 WARN Config: Error reading service account token from: > [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring. > 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: > Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf) > 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. > Mounting Hadoop specific files > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: N/A >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: 2018-05-11T06:50:17Z >container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9 >phase: Pending >status: [ContainerStatus(containerID=null, > image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, > lastState=ContainerState(running=null, terminated=null, waiting=null, > additionalProperties={}), name=spark-kubernetes-driver, ready=false, > restartCount=0, state=ContainerState(running=null, terminated=null, > waiting=ContainerStateWaiting(message=null, reason=PodInitializing, > additionalProperties={}), additionalProperties={}), additionalProperties={})] > 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to > finish... > 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, >
[jira] [Assigned] (SPARK-24266) Spark client terminates while driver is still running
[ https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-24266: - Assignee: Stijn De Haes > Spark client terminates while driver is still running > - > > Key: SPARK-24266 > URL: https://issues.apache.org/jira/browse/SPARK-24266 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core >Affects Versions: 2.3.0, 3.0.0 >Reporter: Chun Chen >Assignee: Stijn De Haes >Priority: Major > Fix For: 3.1.0 > > > {code} > Warning: Ignoring non-spark config property: Default=system properties > included when running spark-submit. > 18/05/11 14:50:12 WARN Config: Error reading service account token from: > [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring. > 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: > Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf) > 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. > Mounting Hadoop specific files > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: N/A >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: 2018-05-11T06:50:17Z >container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9 >phase: Pending >status: [ContainerStatus(containerID=null, > image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, > lastState=ContainerState(running=null, terminated=null, waiting=null, > additionalProperties={}), name=spark-kubernetes-driver, ready=false, > restartCount=0, state=ContainerState(running=null, terminated=null, > waiting=ContainerStateWaiting(message=null, reason=PodInitializing, > additionalProperties={}), additionalProperties={}), additionalProperties={})] > 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to > finish... > 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, >
[jira] [Commented] (SPARK-24266) Spark client terminates while driver is still running
[ https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225123#comment-17225123 ] Dongjoon Hyun commented on SPARK-24266: --- Please see the on-going backport PR. The validation seems to fail on branch-3.0. > Spark client terminates while driver is still running > - > > Key: SPARK-24266 > URL: https://issues.apache.org/jira/browse/SPARK-24266 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core >Affects Versions: 2.3.0, 3.0.0 >Reporter: Chun Chen >Priority: Major > Fix For: 3.1.0 > > > {code} > Warning: Ignoring non-spark config property: Default=system properties > included when running spark-submit. > 18/05/11 14:50:12 WARN Config: Error reading service account token from: > [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring. > 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: > Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf) > 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. > Mounting Hadoop specific files > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: N/A >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: 2018-05-11T06:50:17Z >container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9 >phase: Pending >status: [ContainerStatus(containerID=null, > image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, > lastState=ContainerState(running=null, terminated=null, waiting=null, > additionalProperties={}), name=spark-kubernetes-driver, ready=false, > restartCount=0, state=ContainerState(running=null, terminated=null, > waiting=ContainerStateWaiting(message=null, reason=PodInitializing, > additionalProperties={}), additionalProperties={}), additionalProperties={})] > 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to > finish... > 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files,
[jira] [Updated] (SPARK-24266) Spark client terminates while driver is still running
[ https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24266: -- Parent: SPARK-33005 Issue Type: Sub-task (was: Bug) > Spark client terminates while driver is still running > - > > Key: SPARK-24266 > URL: https://issues.apache.org/jira/browse/SPARK-24266 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core >Affects Versions: 2.3.0, 3.0.0 >Reporter: Chun Chen >Priority: Major > Fix For: 3.1.0 > > > {code} > Warning: Ignoring non-spark config property: Default=system properties > included when running spark-submit. > 18/05/11 14:50:12 WARN Config: Error reading service account token from: > [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring. > 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: > Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf) > 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. > Mounting Hadoop specific files > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: N/A >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: N/A >container images: N/A >phase: Pending >status: [] > 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, > default-token-xvjt9 >node name: tbds-100-98-45-69 >start time: 2018-05-11T06:50:17Z >container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9 >phase: Pending >status: [ContainerStatus(containerID=null, > image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, > lastState=ContainerState(running=null, terminated=null, waiting=null, > additionalProperties={}), name=spark-kubernetes-driver, ready=false, > restartCount=0, state=ContainerState(running=null, terminated=null, > waiting=ContainerStateWaiting(message=null, reason=PodInitializing, > additionalProperties={}), additionalProperties={}), additionalProperties={})] > 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to > finish... > 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: >pod name: spark-64-293-980-1526021412180-driver >namespace: tione-603074457 >labels: network -> FLOATINGIP, spark-app-selector -> > spark-2843da19c690485b93780ad7992a101e, spark-role -> driver >pod uid: 90558303-54e7-11e8-9e64-525400da65d8 >creation time: 2018-05-11T06:50:17Z >service account name: default >volumes: spark-local-dir-0-spark-local, spark-init-properties, > download-jars-volume, download-files, spark-init-secret, hadoop-properties, >
[jira] [Commented] (SPARK-33156) Upgrade GithubAction image from 18.04 to 20.04
[ https://issues.apache.org/jira/browse/SPARK-33156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225115#comment-17225115 ] Apache Spark commented on SPARK-33156: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30232 > Upgrade GithubAction image from 18.04 to 20.04 > -- > > Key: SPARK-33156 > URL: https://issues.apache.org/jira/browse/SPARK-33156 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33156) Upgrade GithubAction image from 18.04 to 20.04
[ https://issues.apache.org/jira/browse/SPARK-33156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225114#comment-17225114 ] Apache Spark commented on SPARK-33156: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30231 > Upgrade GithubAction image from 18.04 to 20.04 > -- > > Key: SPARK-33156 > URL: https://issues.apache.org/jira/browse/SPARK-33156 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33300) Rule SimplifyCasts will not work for nested columns
[ https://issues.apache.org/jira/browse/SPARK-33300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225089#comment-17225089 ] chendihao commented on SPARK-33300: --- Great and thanks [~EveLiao] . I'm not familiar with Catalyst optimizer but it should recursively run the rule in child expressions. It's easy to reproduce in Spark 3.0 and please let me know if you need any help. > Rule SimplifyCasts will not work for nested columns > --- > > Key: SPARK-33300 > URL: https://issues.apache.org/jira/browse/SPARK-33300 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.0 >Reporter: chendihao >Priority: Minor > > We use SparkSQL and Catalyst to optimize the Spark job. We have read the > source code and test the rule of SimplifyCasts which will work for simple SQL > without nested cast. > The SQL "select cast(string_date as string) from t1" will be optimized. > {code:java} > == Analyzed Logical Plan == > string_date: string > Project [cast(string_date#12 as string) AS string_date#24] > +- SubqueryAlias t1 > +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, > string_timestamp#13, timestamp_field#14, bool_field#15], false > == Optimized Logical Plan == > Project [string_date#12] > +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, > string_timestamp#13, timestamp_field#14, bool_field#15], false > {code} > However, it fail to optimize with the nested cast like this "select > cast(cast(string_date as string) as string) from t1". > {code:java} > == Analyzed Logical Plan == > CAST(CAST(string_date AS STRING) AS STRING): string > Project [cast(cast(string_date#12 as string) as string) AS > CAST(CAST(string_date AS STRING) AS STRING)#24] > +- SubqueryAlias t1 > +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, > string_timestamp#13, timestamp_field#14, bool_field#15], false > == Optimized Logical Plan == > Project [string_date#12 AS CAST(CAST(string_date AS STRING) AS STRING)#24] > +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, > string_timestamp#13, timestamp_field#14, bool_field#15], false > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33323) Add query resolved check before convert hive relation
[ https://issues.apache.org/jira/browse/SPARK-33323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33323: Assignee: (was: Apache Spark) > Add query resolved check before convert hive relation > - > > Key: SPARK-33323 > URL: https://issues.apache.org/jira/browse/SPARK-33323 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > > Add query resolved check before convert hive relation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33323) Add query resolved check before convert hive relation
[ https://issues.apache.org/jira/browse/SPARK-33323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33323: Assignee: Apache Spark > Add query resolved check before convert hive relation > - > > Key: SPARK-33323 > URL: https://issues.apache.org/jira/browse/SPARK-33323 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Assignee: Apache Spark >Priority: Minor > > Add query resolved check before convert hive relation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33323) Add query resolved check before convert hive relation
[ https://issues.apache.org/jira/browse/SPARK-33323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225087#comment-17225087 ] Apache Spark commented on SPARK-33323: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/30230 > Add query resolved check before convert hive relation > - > > Key: SPARK-33323 > URL: https://issues.apache.org/jira/browse/SPARK-33323 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > > Add query resolved check before convert hive relation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33323) Add query resolved check before convert hive relation
ulysses you created SPARK-33323: --- Summary: Add query resolved check before convert hive relation Key: SPARK-33323 URL: https://issues.apache.org/jira/browse/SPARK-33323 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: ulysses you Add query resolved check before convert hive relation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33312) Provide latest Spark 2.4.7 runnable distribution
[ https://issues.apache.org/jira/browse/SPARK-33312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225084#comment-17225084 ] Prateek Dubey commented on SPARK-33312: --- Thanks [~dongjoon] and [~hyukjin.kwon] If Spark 2.4.8 is releasing in Dec 2020, I think I can wait till then :). Also, I'll follow the snapshots approach for now as mentioned by [~hyukjin.kwon] > Provide latest Spark 2.4.7 runnable distribution > > > Key: SPARK-33312 > URL: https://issues.apache.org/jira/browse/SPARK-33312 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 2.4.7 >Reporter: Prateek Dubey >Priority: Major > > Not sure if this is the right approach, however it would be great if latest > Spark 2.4.7 runnable distribution can be provided here - > [https://spark.apache.org/downloads.html] > Currently it seems the last build was done on Sept 12th' 2020. > I'm working on running Spark workloads on EKS using EKS IRSA. I'm able to run > Spark workloads on EKS using IRSA with Spark 3.0/ Hadoop 3.2, however I want > to do the same with Spark 2.4.7/ Hadoop 2.7. > Recently this PR was merged with 2.4.x - > [https://github.com/apache/spark/pull/29877] and therefore I'm in need of > latest Spark distribution > > PS: I tried building latest Spark 2.4.7 myself as well using Maven, however > there are too many errors every-time when it reaches R, therefore it would be > great if Spark community itself can provide the latest build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33245) Add built-in UDF - GETBIT
[ https://issues.apache.org/jira/browse/SPARK-33245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225078#comment-17225078 ] Yuming Wang commented on SPARK-33245: - We can use {{substring(bin(col),-8,1)}} instead. > Add built-in UDF - GETBIT > -- > > Key: SPARK-33245 > URL: https://issues.apache.org/jira/browse/SPARK-33245 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > Teradata, Impala, Snowflake and Yellowbrick support this function: > https://docs.teradata.com/reader/kmuOwjp1zEYg98JsB8fu_A/PK1oV1b2jqvG~ohRnOro9w > https://docs.cloudera.com/runtime/7.2.0/impala-sql-reference/topics/impala-bit-functions.html#bit_functions__getbit > https://docs.snowflake.com/en/sql-reference/functions/getbit.html > https://www.yellowbrick.com/docs/2.2/ybd_sqlref/getbit.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-33285: - Description: There are too many "Auto-application to `()` is deprecated." related compilation warnings when compile with Scala 2.13 like {code:java} [WARNING] [Warn] /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: Auto-application to `()` is deprecated. Supply the empty argument list `()` explicitly to invoke method stdev, or remove the empty argument list from its definition (Java-defined methods are exempt). In Scala 3, an unapplied method like this will be eta-expanded into a function. {code} There are a lot of them, but it's easy to fix. If there is a definition as follows: {code:java} Class Foo { def bar(): Unit = {} } val foo = new Foo{code} Should be {code:java} foo.bar() {code} not {code:java} foo.bar {code} was: There are too many "Auto-application to `()` is deprecated." related compilation warnings when compile with Scala 2.13 like {code:java} [WARNING] [Warn] /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: Auto-application to `()` is deprecated. Supply the empty argument list `()` explicitly to invoke method stdev, or remove the empty argument list from its definition (Java-defined methods are exempt). In Scala 3, an unapplied method like this will be eta-expanded into a function. [WARNING] [Warn] /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:247: Auto-application to `()` is deprecated. Supply the empty argument list `()` explicitly to invoke method variance, or remove the empty argument list from its definition (Java-defined methods are exempt). In Scala 3, an unapplied method like this will be eta-expanded into a function. [WARNING] [Warn] /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:247: Auto-application to `()` is deprecated. Supply the empty argument list `()` explicitly to invoke method popVariance, or remove the empty argument list from its definition (Java-defined methods are exempt). In Scala 3, an unapplied method like this will be eta-expanded into a function. [WARNING] [Warn] /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:248: Auto-application to `()` is deprecated. Supply the empty argument list `()` explicitly to invoke method stdev, or remove the empty argument list from its definition (Java-defined methods are exempt). In Scala 3, an unapplied method like this will be eta-expanded into a function. [WARNING] [Warn] /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:248: Auto-application to `()` is deprecated. Supply the empty argument list `()` explicitly to invoke method popStdev, or remove the empty argument list from its definition (Java-defined methods are exempt). In Scala 3, an unapplied method like this will be eta-expanded into a function. {code} Maybe these will mask some of the more important compilation warnings > Too many "Auto-application to `()` is deprecated." related compilation > warnings > > > Key: SPARK-33285 > URL: https://issues.apache.org/jira/browse/SPARK-33285 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > There are too many "Auto-application to `()` is deprecated." related > compilation warnings when compile with Scala 2.13 like > {code:java} > [WARNING] [Warn] > /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: > Auto-application to `()` is deprecated. Supply the empty argument list `()` > explicitly to invoke method stdev, > or remove the empty argument list from its definition (Java-defined methods > are exempt). > In Scala 3, an unapplied method like this will be eta-expanded into a > function. > {code} > There are a lot of them, but it's easy to fix. > If there is a definition as follows: > {code:java} > Class Foo { >def bar(): Unit = {} > } > val foo = new Foo{code} > Should be > {code:java} > foo.bar() > {code} > not > {code:java} > foo.bar {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-33285: - Description: There are too many "Auto-application to `()` is deprecated." related compilation warnings when compile with Scala 2.13 like {code:java} [WARNING] [Warn] /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: Auto-application to `()` is deprecated. Supply the empty argument list `()` explicitly to invoke method stdev, or remove the empty argument list from its definition (Java-defined methods are exempt). In Scala 3, an unapplied method like this will be eta-expanded into a function. {code} A lot of them, but it's easy to fix. If there is a definition as follows: {code:java} Class Foo { def bar(): Unit = {} } val foo = new Foo{code} Should be {code:java} foo.bar() {code} not {code:java} foo.bar {code} was: There are too many "Auto-application to `()` is deprecated." related compilation warnings when compile with Scala 2.13 like {code:java} [WARNING] [Warn] /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: Auto-application to `()` is deprecated. Supply the empty argument list `()` explicitly to invoke method stdev, or remove the empty argument list from its definition (Java-defined methods are exempt). In Scala 3, an unapplied method like this will be eta-expanded into a function. {code} There are a lot of them, but it's easy to fix. If there is a definition as follows: {code:java} Class Foo { def bar(): Unit = {} } val foo = new Foo{code} Should be {code:java} foo.bar() {code} not {code:java} foo.bar {code} > Too many "Auto-application to `()` is deprecated." related compilation > warnings > > > Key: SPARK-33285 > URL: https://issues.apache.org/jira/browse/SPARK-33285 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > There are too many "Auto-application to `()` is deprecated." related > compilation warnings when compile with Scala 2.13 like > {code:java} > [WARNING] [Warn] > /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: > Auto-application to `()` is deprecated. Supply the empty argument list `()` > explicitly to invoke method stdev, > or remove the empty argument list from its definition (Java-defined methods > are exempt). > In Scala 3, an unapplied method like this will be eta-expanded into a > function. > {code} > A lot of them, but it's easy to fix. > If there is a definition as follows: > {code:java} > Class Foo { >def bar(): Unit = {} > } > val foo = new Foo{code} > Should be > {code:java} > foo.bar() > {code} > not > {code:java} > foo.bar {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33322) Dataframe: data is wrongly presented because of column name
[ https://issues.apache.org/jira/browse/SPARK-33322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihaly Hazag updated SPARK-33322: - Attachment: image-2020-11-03-14-57-09-433.png > Dataframe: data is wrongly presented because of column name > --- > > Key: SPARK-33322 > URL: https://issues.apache.org/jira/browse/SPARK-33322 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5 >Reporter: Mihaly Hazag >Priority: Major > Attachments: image-2020-11-03-14-57-09-433.png, > image-2020-11-03-14-57-37-308.png > > > Consider the code below: `some_text` column got the `some_int` value, while > its value is null in the dataframe. > !image-2020-11-03-14-42-52-840.png! > > Renaming the field from `some_text` to `some_apple`, fixed the problem! > !image-2020-11-03-14-43-13-528.png! > > Here is the code to reproduce the problem > {code:python} > from datetime import datetime > from pyspark.sql import Row > from pyspark.sql.types import StructType, StructField, DateType, StringType, > IntegerType > > schema = StructType( > [ > StructField('dfdt', DateType(), True), > StructField('some_text', StringType(), True), > StructField('some_int', IntegerType(), True), > ] > ) > > test_df = spark.createDataFrame([ > Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', > some_int=100) > ], schema) > > display(test_df) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33322) Dataframe: data is wrongly presented because of column name
[ https://issues.apache.org/jira/browse/SPARK-33322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihaly Hazag updated SPARK-33322: - Description: Consider the code below: `some_text` column got the `some_int` value, while its value is null in the dataframe. !image-2020-11-03-14-57-09-433.png! Renaming the field from `some_text` to `some_apple`, fixed the problem! Here is the code to reproduce the problem {code:python} from datetime import datetime from pyspark.sql import Row from pyspark.sql.types import StructType, StructField, DateType, StringType, IntegerType schema = StructType( [ StructField('dfdt', DateType(), True), StructField('some_text', StringType(), True), StructField('some_int', IntegerType(), True), ] ) test_df = spark.createDataFrame([ Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', some_int=100) ], schema) display(test_df) {code} was: Consider the code below: `some_text` column got the `some_int` value, while its value is null in the dataframe. !image-2020-11-03-14-42-52-840.png! Renaming the field from `some_text` to `some_apple`, fixed the problem! !image-2020-11-03-14-43-13-528.png! Here is the code to reproduce the problem {code:python} from datetime import datetime from pyspark.sql import Row from pyspark.sql.types import StructType, StructField, DateType, StringType, IntegerType schema = StructType( [ StructField('dfdt', DateType(), True), StructField('some_text', StringType(), True), StructField('some_int', IntegerType(), True), ] ) test_df = spark.createDataFrame([ Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', some_int=100) ], schema) display(test_df) {code} > Dataframe: data is wrongly presented because of column name > --- > > Key: SPARK-33322 > URL: https://issues.apache.org/jira/browse/SPARK-33322 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5 >Reporter: Mihaly Hazag >Priority: Major > Attachments: image-2020-11-03-14-57-09-433.png, > image-2020-11-03-14-57-37-308.png > > > Consider the code below: `some_text` column got the `some_int` value, while > its value is null in the dataframe. > !image-2020-11-03-14-57-09-433.png! > > Renaming the field from `some_text` to `some_apple`, fixed the problem! > > > Here is the code to reproduce the problem > {code:python} > from datetime import datetime > from pyspark.sql import Row > from pyspark.sql.types import StructType, StructField, DateType, StringType, > IntegerType > > schema = StructType( > [ > StructField('dfdt', DateType(), True), > StructField('some_text', StringType(), True), > StructField('some_int', IntegerType(), True), > ] > ) > > test_df = spark.createDataFrame([ > Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', > some_int=100) > ], schema) > > display(test_df) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33322) Dataframe: data is wrongly presented because of column name
[ https://issues.apache.org/jira/browse/SPARK-33322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihaly Hazag updated SPARK-33322: - Description: Consider the code below: `some_text` column got the `some_int` value, while its value is null in the dataframe. !image-2020-11-03-14-57-09-433.png! Renaming the field from `some_text` to `some_apple`, fixed the problem! !image-2020-11-03-14-57-37-308.png! Here is the code to reproduce the problem {code:python} from datetime import datetime from pyspark.sql import Row from pyspark.sql.types import StructType, StructField, DateType, StringType, IntegerType schema = StructType( [ StructField('dfdt', DateType(), True), StructField('some_text', StringType(), True), StructField('some_int', IntegerType(), True), ] ) test_df = spark.createDataFrame([ Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', some_int=100) ], schema) display(test_df) {code} was: Consider the code below: `some_text` column got the `some_int` value, while its value is null in the dataframe. !image-2020-11-03-14-57-09-433.png! Renaming the field from `some_text` to `some_apple`, fixed the problem! Here is the code to reproduce the problem {code:python} from datetime import datetime from pyspark.sql import Row from pyspark.sql.types import StructType, StructField, DateType, StringType, IntegerType schema = StructType( [ StructField('dfdt', DateType(), True), StructField('some_text', StringType(), True), StructField('some_int', IntegerType(), True), ] ) test_df = spark.createDataFrame([ Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', some_int=100) ], schema) display(test_df) {code} > Dataframe: data is wrongly presented because of column name > --- > > Key: SPARK-33322 > URL: https://issues.apache.org/jira/browse/SPARK-33322 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5 >Reporter: Mihaly Hazag >Priority: Major > Attachments: image-2020-11-03-14-57-09-433.png, > image-2020-11-03-14-57-37-308.png > > > Consider the code below: `some_text` column got the `some_int` value, while > its value is null in the dataframe. > !image-2020-11-03-14-57-09-433.png! > > Renaming the field from `some_text` to `some_apple`, fixed the problem! > !image-2020-11-03-14-57-37-308.png! > > > Here is the code to reproduce the problem > {code:python} > from datetime import datetime > from pyspark.sql import Row > from pyspark.sql.types import StructType, StructField, DateType, StringType, > IntegerType > > schema = StructType( > [ > StructField('dfdt', DateType(), True), > StructField('some_text', StringType(), True), > StructField('some_int', IntegerType(), True), > ] > ) > > test_df = spark.createDataFrame([ > Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', > some_int=100) > ], schema) > > display(test_df) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33322) Dataframe: data is wrongly presented because of column name
[ https://issues.apache.org/jira/browse/SPARK-33322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihaly Hazag updated SPARK-33322: - Attachment: image-2020-11-03-14-57-37-308.png > Dataframe: data is wrongly presented because of column name > --- > > Key: SPARK-33322 > URL: https://issues.apache.org/jira/browse/SPARK-33322 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5 >Reporter: Mihaly Hazag >Priority: Major > Attachments: image-2020-11-03-14-57-09-433.png, > image-2020-11-03-14-57-37-308.png > > > Consider the code below: `some_text` column got the `some_int` value, while > its value is null in the dataframe. > !image-2020-11-03-14-57-09-433.png! > > Renaming the field from `some_text` to `some_apple`, fixed the problem! > > > Here is the code to reproduce the problem > {code:python} > from datetime import datetime > from pyspark.sql import Row > from pyspark.sql.types import StructType, StructField, DateType, StringType, > IntegerType > > schema = StructType( > [ > StructField('dfdt', DateType(), True), > StructField('some_text', StringType(), True), > StructField('some_int', IntegerType(), True), > ] > ) > > test_df = spark.createDataFrame([ > Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', > some_int=100) > ], schema) > > display(test_df) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33322) Dataframe: data is wrongly presented because of column name
Mihaly Hazag created SPARK-33322: Summary: Dataframe: data is wrongly presented because of column name Key: SPARK-33322 URL: https://issues.apache.org/jira/browse/SPARK-33322 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.5 Reporter: Mihaly Hazag Consider the code below: `some_text` column got the `some_int` value, while its value is null in the dataframe. !image-2020-11-03-14-42-52-840.png! Renaming the field from `some_text` to `some_apple`, fixed the problem! !image-2020-11-03-14-43-13-528.png! Here is the code to reproduce the problem {code:python} from datetime import datetime from pyspark.sql import Row from pyspark.sql.types import StructType, StructField, DateType, StringType, IntegerType schema = StructType( [ StructField('dfdt', DateType(), True), StructField('some_text', StringType(), True), StructField('some_int', IntegerType(), True), ] ) test_df = spark.createDataFrame([ Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', some_int=100) ], schema) display(test_df) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33321) Migrate ANALYZE TABLE to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-33321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225044#comment-17225044 ] Apache Spark commented on SPARK-33321: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/30229 > Migrate ANALYZE TABLE to new resolution framework > - > > Key: SPARK-33321 > URL: https://issues.apache.org/jira/browse/SPARK-33321 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Priority: Minor > > Migrate ANALYZE TABLE to new resolution framework. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33321) Migrate ANALYZE TABLE to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-33321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33321: Assignee: Apache Spark > Migrate ANALYZE TABLE to new resolution framework > - > > Key: SPARK-33321 > URL: https://issues.apache.org/jira/browse/SPARK-33321 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Assignee: Apache Spark >Priority: Minor > > Migrate ANALYZE TABLE to new resolution framework. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33321) Migrate ANALYZE TABLE to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-33321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33321: Assignee: (was: Apache Spark) > Migrate ANALYZE TABLE to new resolution framework > - > > Key: SPARK-33321 > URL: https://issues.apache.org/jira/browse/SPARK-33321 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Priority: Minor > > Migrate ANALYZE TABLE to new resolution framework. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33321) Migrate ANALYZE TABLE to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-33321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225043#comment-17225043 ] Apache Spark commented on SPARK-33321: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/30229 > Migrate ANALYZE TABLE to new resolution framework > - > > Key: SPARK-33321 > URL: https://issues.apache.org/jira/browse/SPARK-33321 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Priority: Minor > > Migrate ANALYZE TABLE to new resolution framework. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33250) Migration to NumPy documentation style in SQL (pyspark.sql.*)
[ https://issues.apache.org/jira/browse/SPARK-33250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33250. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30181 [https://github.com/apache/spark/pull/30181] > Migration to NumPy documentation style in SQL (pyspark.sql.*) > - > > Key: SPARK-33250 > URL: https://issues.apache.org/jira/browse/SPARK-33250 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.1.0 > > > Migration to NumPy documentation style in SQL (pyspark.sql.*) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33250) Migration to NumPy documentation style in SQL (pyspark.sql.*)
[ https://issues.apache.org/jira/browse/SPARK-33250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33250: Assignee: Hyukjin Kwon > Migration to NumPy documentation style in SQL (pyspark.sql.*) > - > > Key: SPARK-33250 > URL: https://issues.apache.org/jira/browse/SPARK-33250 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > Migration to NumPy documentation style in SQL (pyspark.sql.*) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33321) Migrate ANALYZE TABLE to new resolution framework
Terry Kim created SPARK-33321: - Summary: Migrate ANALYZE TABLE to new resolution framework Key: SPARK-33321 URL: https://issues.apache.org/jira/browse/SPARK-33321 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Terry Kim Migrate ANALYZE TABLE to new resolution framework. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33300) Rule SimplifyCasts will not work for nested columns
[ https://issues.apache.org/jira/browse/SPARK-33300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224937#comment-17224937 ] Aoyuan Liao commented on SPARK-33300: - I would like to work on this. > Rule SimplifyCasts will not work for nested columns > --- > > Key: SPARK-33300 > URL: https://issues.apache.org/jira/browse/SPARK-33300 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.0 >Reporter: chendihao >Priority: Minor > > We use SparkSQL and Catalyst to optimize the Spark job. We have read the > source code and test the rule of SimplifyCasts which will work for simple SQL > without nested cast. > The SQL "select cast(string_date as string) from t1" will be optimized. > {code:java} > == Analyzed Logical Plan == > string_date: string > Project [cast(string_date#12 as string) AS string_date#24] > +- SubqueryAlias t1 > +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, > string_timestamp#13, timestamp_field#14, bool_field#15], false > == Optimized Logical Plan == > Project [string_date#12] > +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, > string_timestamp#13, timestamp_field#14, bool_field#15], false > {code} > However, it fail to optimize with the nested cast like this "select > cast(cast(string_date as string) as string) from t1". > {code:java} > == Analyzed Logical Plan == > CAST(CAST(string_date AS STRING) AS STRING): string > Project [cast(cast(string_date#12 as string) as string) AS > CAST(CAST(string_date AS STRING) AS STRING)#24] > +- SubqueryAlias t1 > +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, > string_timestamp#13, timestamp_field#14, bool_field#15], false > == Optimized Logical Plan == > Project [string_date#12 AS CAST(CAST(string_date AS STRING) AS STRING)#24] > +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, > string_timestamp#13, timestamp_field#14, bool_field#15], false > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24432) Add support for dynamic resource allocation
[ https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224905#comment-17224905 ] Dongjoon Hyun commented on SPARK-24432: --- BTW, FYI, for K8s environment, the followings are the current status. - The initial K8s dynamic allocation is already shipped at Apache Spark 3.0.0 with shuffle tracking. - The K8s dynamic allocation with storage migration is already in `master` branch for Apache Spark 3.1.0. > Add support for dynamic resource allocation > --- > > Key: SPARK-24432 > URL: https://issues.apache.org/jira/browse/SPARK-24432 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Yinan Li >Priority: Major > > This is an umbrella ticket for work on adding support for dynamic resource > allocation into the Kubernetes mode. This requires a Kubernetes-specific > external shuffle service. The feature is available in our fork at > github.com/apache-spark-on-k8s/spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24432) Add support for dynamic resource allocation
[ https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224905#comment-17224905 ] Dongjoon Hyun edited comment on SPARK-24432 at 11/2/20, 7:17 PM: - BTW, FYI, for K8s environment, the followings are the current status. - The initial K8s dynamic allocation is already shipped at Apache Spark 3.0.0 with shuffle tracking. - The K8s dynamic allocation with storage migration between executors is already in `master` branch for Apache Spark 3.1.0. was (Author: dongjoon): BTW, FYI, for K8s environment, the followings are the current status. - The initial K8s dynamic allocation is already shipped at Apache Spark 3.0.0 with shuffle tracking. - The K8s dynamic allocation with storage migration is already in `master` branch for Apache Spark 3.1.0. > Add support for dynamic resource allocation > --- > > Key: SPARK-24432 > URL: https://issues.apache.org/jira/browse/SPARK-24432 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Yinan Li >Priority: Major > > This is an umbrella ticket for work on adding support for dynamic resource > allocation into the Kubernetes mode. This requires a Kubernetes-specific > external shuffle service. The feature is available in our fork at > github.com/apache-spark-on-k8s/spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24432) Add support for dynamic resource allocation
[ https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224901#comment-17224901 ] Dongjoon Hyun edited comment on SPARK-24432 at 11/2/20, 7:15 PM: - SPARK-30602 is focusing on YARN environment. I don't think that is targeting K8s yet. But, I agree with [~aryaKetan] that this issue should be refreshed. was (Author: dongjoon): SPARK-30602 is focusing on YARN environment. I don't think that is targeting K8s yet. > Add support for dynamic resource allocation > --- > > Key: SPARK-24432 > URL: https://issues.apache.org/jira/browse/SPARK-24432 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Yinan Li >Priority: Major > > This is an umbrella ticket for work on adding support for dynamic resource > allocation into the Kubernetes mode. This requires a Kubernetes-specific > external shuffle service. The feature is available in our fork at > github.com/apache-spark-on-k8s/spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24432) Add support for dynamic resource allocation
[ https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224901#comment-17224901 ] Dongjoon Hyun commented on SPARK-24432: --- SPARK-30602 is focusing on YARN environment. I don't think that is targeting K8s yet. > Add support for dynamic resource allocation > --- > > Key: SPARK-24432 > URL: https://issues.apache.org/jira/browse/SPARK-24432 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Yinan Li >Priority: Major > > This is an umbrella ticket for work on adding support for dynamic resource > allocation into the Kubernetes mode. This requires a Kubernetes-specific > external shuffle service. The feature is available in our fork at > github.com/apache-spark-on-k8s/spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33282) Replace Probot Autolabeler with Github Action
[ https://issues.apache.org/jira/browse/SPARK-33282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224900#comment-17224900 ] Dongjoon Hyun commented on SPARK-33282: --- +1 > Replace Probot Autolabeler with Github Action > - > > Key: SPARK-33282 > URL: https://issues.apache.org/jira/browse/SPARK-33282 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.0.1 >Reporter: Kyle Bendickson >Priority: Major > > The Probot Autolabeler that we were using in both the Iceberg and the Spark > repo is no longer working. I've confirmed that with the devleper, github user > [at]mithro, who has indicated that the Probot Autolabeler is end of life and > will not be maintained moving forward. > PRs have not been labeled for a few weeks now. > > As I'm already interfacing with ASF Infra to have the probot permissions > revoked from the Iceberg repo, and I've already submitted a patch to switch > Iceberg to the standard github labeler action, I figured I would go ahead and > volunteer myself to switch the Spark repo as well. > I will have a patch to switch to the new github labeler open within a few > days. > > Also thank you [~blue] (or [~holden]) for shepherding this! I didn't exactly > ask, but it was understood in our group meeting for Iceberg that I'd be > converting our labeler there so I figured I'd tackle the spark issue while > I'm getting my hands into the labeling configs anyway =) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33317: -- Priority: Major (was: Blocker) > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Major > Attachments: farmers.csv > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224896#comment-17224896 ] Dongjoon Hyun commented on SPARK-33317: --- In Apache Spark 2.4.7, the following is the result for me. Could you provide your procedure? {code} scala> spark.version res0: String = 2.4.7 scala> spark.read.option("header", true).csv("/tmp/csv/farmers.csv").createOrReplaceTempView("farmers") scala> sql("select fmid from farmers where fmid between ' 1000405134' and '1000772585' limit 3").show +--+ | fmid| +--+ |1000405134| |1000159765| |1000489848| +--+ {code} > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Blocker > Attachments: farmers.csv > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32029) Check spark context is stoped when get active session
[ https://issues.apache.org/jira/browse/SPARK-32029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32029. --- Resolution: Won't Do Please see the discussion on the closed PR. > Check spark context is stoped when get active session > - > > Key: SPARK-32029 > URL: https://issues.apache.org/jira/browse/SPARK-32029 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33299) Unify schema parsing in from_json/from_csv across all APIs
[ https://issues.apache.org/jira/browse/SPARK-33299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33299: - Assignee: Maxim Gekk > Unify schema parsing in from_json/from_csv across all APIs > -- > > Key: SPARK-33299 > URL: https://issues.apache.org/jira/browse/SPARK-33299 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Currently, from_json() has extra capability in Scala API. It accepts schema > in JSON format but other API (SQL, Python, R) lacks the feature. The ticket > aims to unify all APIs, and support schemas in JSON format everywhere. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33299) Unify schema parsing in from_json/from_csv across all APIs
[ https://issues.apache.org/jira/browse/SPARK-33299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33299. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30226 [https://github.com/apache/spark/pull/30226] > Unify schema parsing in from_json/from_csv across all APIs > -- > > Key: SPARK-33299 > URL: https://issues.apache.org/jira/browse/SPARK-33299 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > Currently, from_json() has extra capability in Scala API. It accepts schema > in JSON format but other API (SQL, Python, R) lacks the feature. The ticket > aims to unify all APIs, and support schemas in JSON format everywhere. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224829#comment-17224829 ] Debadutta commented on SPARK-33317: --- [^farmers.csv] Attached the farmers dataset. By hive connector I mean metastore-based default way to run sql query on hive within spark context. > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Blocker > Attachments: farmers.csv > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33312) Provide latest Spark 2.4.7 runnable distribution
[ https://issues.apache.org/jira/browse/SPARK-33312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224827#comment-17224827 ] Dongjoon Hyun commented on SPARK-33312: --- Hi, [~dprateek]. Apache Spark 2.4.7 is already voted and released our website. You should ask Apache Spark 2.4.8 release and Apache Spark has a release cadence. 2.4.8 will be released at early December 2020. So, please wait for a month. > Provide latest Spark 2.4.7 runnable distribution > > > Key: SPARK-33312 > URL: https://issues.apache.org/jira/browse/SPARK-33312 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 2.4.7 >Reporter: Prateek Dubey >Priority: Major > > Not sure if this is the right approach, however it would be great if latest > Spark 2.4.7 runnable distribution can be provided here - > [https://spark.apache.org/downloads.html] > Currently it seems the last build was done on Sept 12th' 2020. > I'm working on running Spark workloads on EKS using EKS IRSA. I'm able to run > Spark workloads on EKS using IRSA with Spark 3.0/ Hadoop 3.2, however I want > to do the same with Spark 2.4.7/ Hadoop 2.7. > Recently this PR was merged with 2.4.x - > [https://github.com/apache/spark/pull/29877] and therefore I'm in need of > latest Spark distribution > > PS: I tried building latest Spark 2.4.7 myself as well using Maven, however > there are too many errors every-time when it reaches R, therefore it would be > great if Spark community itself can provide the latest build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-33312) Provide latest Spark 2.4.7 runnable distribution
[ https://issues.apache.org/jira/browse/SPARK-33312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-33312. - > Provide latest Spark 2.4.7 runnable distribution > > > Key: SPARK-33312 > URL: https://issues.apache.org/jira/browse/SPARK-33312 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 2.4.7 >Reporter: Prateek Dubey >Priority: Major > > Not sure if this is the right approach, however it would be great if latest > Spark 2.4.7 runnable distribution can be provided here - > [https://spark.apache.org/downloads.html] > Currently it seems the last build was done on Sept 12th' 2020. > I'm working on running Spark workloads on EKS using EKS IRSA. I'm able to run > Spark workloads on EKS using IRSA with Spark 3.0/ Hadoop 3.2, however I want > to do the same with Spark 2.4.7/ Hadoop 2.7. > Recently this PR was merged with 2.4.x - > [https://github.com/apache/spark/pull/29877] and therefore I'm in need of > latest Spark distribution > > PS: I tried building latest Spark 2.4.7 myself as well using Maven, however > there are too many errors every-time when it reaches R, therefore it would be > great if Spark community itself can provide the latest build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33312) Provide latest Spark 2.4.7 runnable distribution
[ https://issues.apache.org/jira/browse/SPARK-33312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33312. --- Resolution: Not A Problem > Provide latest Spark 2.4.7 runnable distribution > > > Key: SPARK-33312 > URL: https://issues.apache.org/jira/browse/SPARK-33312 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 2.4.7 >Reporter: Prateek Dubey >Priority: Major > > Not sure if this is the right approach, however it would be great if latest > Spark 2.4.7 runnable distribution can be provided here - > [https://spark.apache.org/downloads.html] > Currently it seems the last build was done on Sept 12th' 2020. > I'm working on running Spark workloads on EKS using EKS IRSA. I'm able to run > Spark workloads on EKS using IRSA with Spark 3.0/ Hadoop 3.2, however I want > to do the same with Spark 2.4.7/ Hadoop 2.7. > Recently this PR was merged with 2.4.x - > [https://github.com/apache/spark/pull/29877] and therefore I'm in need of > latest Spark distribution > > PS: I tried building latest Spark 2.4.7 myself as well using Maven, however > there are too many errors every-time when it reaches R, therefore it would be > great if Spark community itself can provide the latest build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33312) Provide latest Spark 2.4.7 runnable distribution
[ https://issues.apache.org/jira/browse/SPARK-33312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224827#comment-17224827 ] Dongjoon Hyun edited comment on SPARK-33312 at 11/2/20, 5:30 PM: - Hi, [~dprateek]. Apache Spark 2.4.7 is already voted and released at our website. You should ask Apache Spark 2.4.8 release and Apache Spark has a release cadence. 2.4.8 will be released at early December 2020. So, please wait for a month. was (Author: dongjoon): Hi, [~dprateek]. Apache Spark 2.4.7 is already voted and released our website. You should ask Apache Spark 2.4.8 release and Apache Spark has a release cadence. 2.4.8 will be released at early December 2020. So, please wait for a month. > Provide latest Spark 2.4.7 runnable distribution > > > Key: SPARK-33312 > URL: https://issues.apache.org/jira/browse/SPARK-33312 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 2.4.7 >Reporter: Prateek Dubey >Priority: Major > > Not sure if this is the right approach, however it would be great if latest > Spark 2.4.7 runnable distribution can be provided here - > [https://spark.apache.org/downloads.html] > Currently it seems the last build was done on Sept 12th' 2020. > I'm working on running Spark workloads on EKS using EKS IRSA. I'm able to run > Spark workloads on EKS using IRSA with Spark 3.0/ Hadoop 3.2, however I want > to do the same with Spark 2.4.7/ Hadoop 2.7. > Recently this PR was merged with 2.4.x - > [https://github.com/apache/spark/pull/29877] and therefore I'm in need of > latest Spark distribution > > PS: I tried building latest Spark 2.4.7 myself as well using Maven, however > there are too many errors every-time when it reaches R, therefore it would be > great if Spark community itself can provide the latest build. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Debadutta updated SPARK-33317: -- Attachment: farmers.csv > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Blocker > Attachments: farmers.csv > > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe
[ https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224824#comment-17224824 ] Dongjoon Hyun commented on SPARK-33317: --- Hi, [~dgodnaik]. 1. What is the data inside `farmers` table? 2. What is `hive connector` you are referring? > Spark Hive SQL returning empty dataframe > > > Key: SPARK-33317 > URL: https://issues.apache.org/jira/browse/SPARK-33317 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.4.6 >Reporter: Debadutta >Priority: Blocker > > I am trying to run a sql query on a hive table using hive connector in spark > but I am getting an empty dataframe. The query I am trying to run:- > {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' > and '1000772585'")}} > This is failing but if I remove the leading whitespaces it works. > {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' > and '1000772585'")}} > Currently, I am removing leading and trailing whitespaces as a workaround. > But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33318) Ability to set Dynamodb table name while reading from Kinesis
[ https://issues.apache.org/jira/browse/SPARK-33318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224822#comment-17224822 ] Dongjoon Hyun commented on SPARK-33318: --- Thank you for filing a JIRA issue, [~chethan_g]. For new features and improvement, Apache Spark community deliver it in a new release like 3.1.0. > Ability to set Dynamodb table name while reading from Kinesis > - > > Key: SPARK-33318 > URL: https://issues.apache.org/jira/browse/SPARK-33318 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: chethan gowda >Priority: Minor > > * Need the ability to set dynamodb table while reading data from kinesis. The > KCL library provides the ability to set dynamodb table name. example: > [https://aws.amazon.com/premiumsupport/knowledge-center/kinesis-kcl-apps-dynamodb-table/] > . We would like to have a similar interface to pass the dynamodb table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33318) Ability to set Dynamodb table name while reading from Kinesis
[ https://issues.apache.org/jira/browse/SPARK-33318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33318: -- Affects Version/s: (was: 2.4.7) 3.1.0 > Ability to set Dynamodb table name while reading from Kinesis > - > > Key: SPARK-33318 > URL: https://issues.apache.org/jira/browse/SPARK-33318 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: chethan gowda >Priority: Minor > > * Need the ability to set dynamodb table while reading data from kinesis. The > KCL library provides the ability to set dynamodb table name. example: > [https://aws.amazon.com/premiumsupport/knowledge-center/kinesis-kcl-apps-dynamodb-table/] > . We would like to have a similar interface to pass the dynamodb table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code
[ https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224799#comment-17224799 ] Itay Bittan commented on SPARK-26365: - thanks [~oscar.bonilla]. We ended up with a temporary solution: {code:java} spark-submit .. 2>&1 | tee output.log ; grep -q \"exit code: 0\" output.log{code} > spark-submit for k8s cluster doesn't propagate exit code > > > Key: SPARK-26365 > URL: https://issues.apache.org/jira/browse/SPARK-26365 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Spark Submit >Affects Versions: 2.3.2, 2.4.0 >Reporter: Oscar Bonilla >Priority: Minor > Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, > spark-3.0.0-raise-exception-k8s-failure.patch > > > When launching apps using spark-submit in a kubernetes cluster, if the Spark > applications fails (returns exit code = 1 for example), spark-submit will > still exit gracefully and return exit code = 0. > This is problematic, since there's no way to know if there's been a problem > with the Spark application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33319) Add all built-in SerDes to HiveSerDeReadWriteSuite
[ https://issues.apache.org/jira/browse/SPARK-33319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33319: - Assignee: Yuming Wang > Add all built-in SerDes to HiveSerDeReadWriteSuite > -- > > Key: SPARK-33319 > URL: https://issues.apache.org/jira/browse/SPARK-33319 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33319) Add all built-in SerDes to HiveSerDeReadWriteSuite
[ https://issues.apache.org/jira/browse/SPARK-33319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33319. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30228 [https://github.com/apache/spark/pull/30228] > Add all built-in SerDes to HiveSerDeReadWriteSuite > -- > > Key: SPARK-33319 > URL: https://issues.apache.org/jira/browse/SPARK-33319 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33320) ExecutorMetrics are not written to CSV and StatsD sinks
[ https://issues.apache.org/jira/browse/SPARK-33320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Podlovics updated SPARK-33320: Affects Version/s: (was: 2.4.6) 2.4.4 Environment: I was using Spark 2.4.4 on EMR with YARN. The relevant part of the config is below: {noformat} spark.metrics.executorMetricsSource.enabled=true spark.eventLog.logStageExecutorMetrics=true spark.metrics.conf.*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink spark.metrics.conf.*.sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet spark.metrics.conf.*.sink.servlet.path=/home/hadoop/metrics/json spark.metrics.conf.*.sink.statsd.class=org.apache.spark.metrics.sink.StatsdSink spark.metrics.conf.*.sink.statsd.host=localhost spark.metrics.conf.*.sink.statsd.port=8125 spark.metrics.conf.*.sink.statsd.period=10 spark.metrics.conf.*.sink.statsd.unit=seconds spark.metrics.conf.*.sink.statsd.prefix=spark master.sink.servlet.path=/home/hadoop/metrics/master/json applications.sink.servlet.path=/home/hadoop/metrics/applications/json {noformat} was: I used the following configuration while running Spark on YARN: {noformat} spark.metrics.executorMetricsSource.enabled=true spark.eventLog.logStageExecutorMetrics=true spark.metrics.conf.*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink spark.metrics.conf.*.sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet spark.metrics.conf.*.sink.servlet.path=/home/hadoop/metrics/json spark.metrics.conf.*.sink.statsd.class=org.apache.spark.metrics.sink.StatsdSink spark.metrics.conf.*.sink.statsd.host=localhost spark.metrics.conf.*.sink.statsd.port=8125 spark.metrics.conf.*.sink.statsd.period=10 spark.metrics.conf.*.sink.statsd.unit=seconds spark.metrics.conf.*.sink.statsd.prefix=spark master.sink.servlet.path=/home/hadoop/metrics/master/json applications.sink.servlet.path=/home/hadoop/metrics/applications/json {noformat} > ExecutorMetrics are not written to CSV and StatsD sinks > --- > > Key: SPARK-33320 > URL: https://issues.apache.org/jira/browse/SPARK-33320 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 > Environment: I was using Spark 2.4.4 on EMR with YARN. The relevant > part of the config is below: > {noformat} > spark.metrics.executorMetricsSource.enabled=true > spark.eventLog.logStageExecutorMetrics=true > spark.metrics.conf.*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink > spark.metrics.conf.*.sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet > spark.metrics.conf.*.sink.servlet.path=/home/hadoop/metrics/json > spark.metrics.conf.*.sink.statsd.class=org.apache.spark.metrics.sink.StatsdSink > spark.metrics.conf.*.sink.statsd.host=localhost > spark.metrics.conf.*.sink.statsd.port=8125 > spark.metrics.conf.*.sink.statsd.period=10 > spark.metrics.conf.*.sink.statsd.unit=seconds > spark.metrics.conf.*.sink.statsd.prefix=spark > master.sink.servlet.path=/home/hadoop/metrics/master/json > applications.sink.servlet.path=/home/hadoop/metrics/applications/json > {noformat} >Reporter: Peter Podlovics >Priority: Major > > Metrics from the {{ExecutorMetrics}} namespace are not written to the CSV and > StatsD sinks, even though some of them is available through the REST API > (e.g.: {{memoryMetrics.usedOnHeapStorageMemory}}). > I couldn't find the {{ExecutorMetrics}} either on the driver or the workers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33320) ExecutorMetrics are not written to CSV and StatsD sinks
[ https://issues.apache.org/jira/browse/SPARK-33320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Podlovics updated SPARK-33320: Affects Version/s: (was: 2.2.1) 2.4.6 > ExecutorMetrics are not written to CSV and StatsD sinks > --- > > Key: SPARK-33320 > URL: https://issues.apache.org/jira/browse/SPARK-33320 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.6 > Environment: I used the following configuration while running Spark > on YARN: > {noformat} > spark.metrics.executorMetricsSource.enabled=true > spark.eventLog.logStageExecutorMetrics=true > spark.metrics.conf.*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink > spark.metrics.conf.*.sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet > spark.metrics.conf.*.sink.servlet.path=/home/hadoop/metrics/json > spark.metrics.conf.*.sink.statsd.class=org.apache.spark.metrics.sink.StatsdSink > spark.metrics.conf.*.sink.statsd.host=localhost > spark.metrics.conf.*.sink.statsd.port=8125 > spark.metrics.conf.*.sink.statsd.period=10 > spark.metrics.conf.*.sink.statsd.unit=seconds > spark.metrics.conf.*.sink.statsd.prefix=spark > master.sink.servlet.path=/home/hadoop/metrics/master/json > applications.sink.servlet.path=/home/hadoop/metrics/applications/json > {noformat} >Reporter: Peter Podlovics >Priority: Major > > Metrics from the {{ExecutorMetrics}} namespace are not written to the CSV and > StatsD sinks, even though some of them is available through the REST API > (e.g.: {{memoryMetrics.usedOnHeapStorageMemory}}). > I couldn't find the {{ExecutorMetrics}} either on the driver or the workers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33320) ExecutorMetrics are not written to CSV and StatsD sinks
Peter Podlovics created SPARK-33320: --- Summary: ExecutorMetrics are not written to CSV and StatsD sinks Key: SPARK-33320 URL: https://issues.apache.org/jira/browse/SPARK-33320 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.1 Environment: I used the following configuration while running Spark on YARN: {noformat} spark.metrics.executorMetricsSource.enabled=true spark.eventLog.logStageExecutorMetrics=true spark.metrics.conf.*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink spark.metrics.conf.*.sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet spark.metrics.conf.*.sink.servlet.path=/home/hadoop/metrics/json spark.metrics.conf.*.sink.statsd.class=org.apache.spark.metrics.sink.StatsdSink spark.metrics.conf.*.sink.statsd.host=localhost spark.metrics.conf.*.sink.statsd.port=8125 spark.metrics.conf.*.sink.statsd.period=10 spark.metrics.conf.*.sink.statsd.unit=seconds spark.metrics.conf.*.sink.statsd.prefix=spark master.sink.servlet.path=/home/hadoop/metrics/master/json applications.sink.servlet.path=/home/hadoop/metrics/applications/json {noformat} Reporter: Peter Podlovics Metrics from the {{ExecutorMetrics}} namespace are not written to the CSV and StatsD sinks, even though some of them is available through the REST API (e.g.: {{memoryMetrics.usedOnHeapStorageMemory}}). I couldn't find the {{ExecutorMetrics}} either on the driver or the workers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33273) Fix Flaky Test: ThriftServerQueryTestSuite. subquery_scalar_subquery_scalar_subquery_select_sql
[ https://issues.apache.org/jira/browse/SPARK-33273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224705#comment-17224705 ] Yuming Wang commented on SPARK-33273: - I have no idea. I cannot reproduce locally. > Fix Flaky Test: ThriftServerQueryTestSuite. > subquery_scalar_subquery_scalar_subquery_select_sql > --- > > Key: SPARK-33273 > URL: https://issues.apache.org/jira/browse/SPARK-33273 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Blocker > Labels: correctness > > - > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130369/testReport/org.apache.spark.sql.hive.thriftserver/ThriftServerQueryTestSuite/subquery_scalar_subquery_scalar_subquery_select_sql/ > {code} > [info] - subquery/scalar-subquery/scalar-subquery-select.sql *** FAILED *** > (3 seconds, 877 milliseconds) > [info] Expected "[1]0 2017-05-04 01:01:0...", but got "[]0 > 2017-05-04 01:01:0..." Result did not match for query #3 > [info] SELECT (SELECT min(t3d) FROM t3) min_t3d, > [info] (SELECT max(t2h) FROM t2) max_t2h > [info] FROM t1 > [info] WHERE t1a = 'val1c' (ThriftServerQueryTestSuite.scala:197) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33319) Add all built-in SerDes to HiveSerDeReadWriteSuite
[ https://issues.apache.org/jira/browse/SPARK-33319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33319: Assignee: (was: Apache Spark) > Add all built-in SerDes to HiveSerDeReadWriteSuite > -- > > Key: SPARK-33319 > URL: https://issues.apache.org/jira/browse/SPARK-33319 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33319) Add all built-in SerDes to HiveSerDeReadWriteSuite
[ https://issues.apache.org/jira/browse/SPARK-33319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33319: Assignee: Apache Spark > Add all built-in SerDes to HiveSerDeReadWriteSuite > -- > > Key: SPARK-33319 > URL: https://issues.apache.org/jira/browse/SPARK-33319 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33319) Add all built-in SerDes to HiveSerDeReadWriteSuite
[ https://issues.apache.org/jira/browse/SPARK-33319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224704#comment-17224704 ] Apache Spark commented on SPARK-33319: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30228 > Add all built-in SerDes to HiveSerDeReadWriteSuite > -- > > Key: SPARK-33319 > URL: https://issues.apache.org/jira/browse/SPARK-33319 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33319) Add all built-in SerDes to HiveSerDeReadWriteSuite
Yuming Wang created SPARK-33319: --- Summary: Add all built-in SerDes to HiveSerDeReadWriteSuite Key: SPARK-33319 URL: https://issues.apache.org/jira/browse/SPARK-33319 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33257) Support Column inputs in PySpark ordering functions (asc*, desc*)
[ https://issues.apache.org/jira/browse/SPARK-33257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224666#comment-17224666 ] Apache Spark commented on SPARK-33257: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/30227 > Support Column inputs in PySpark ordering functions (asc*, desc*) > - > > Key: SPARK-33257 > URL: https://issues.apache.org/jira/browse/SPARK-33257 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.1.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > According to SPARK-26979, PySpark functions should support both {{Column}} > and {{str}} arguments, when possible. > However, the following ordering support only {{str}} > - {{asc}} > - {{desc}} > - {{asc_nulls_first}} > - {{asc_nulls_last}} > - {{desc_nulls_first}} > - {{desc_nulls_last}} > support only {{str}}. This is because Scala side doesn't provide {{Column => > Column}} variants. > To fix this, we do one of the following: > - Call corresponding {{Column}} methods as > [suggested|https://github.com/apache/spark/pull/30143#discussion_r512366978] > by [~hyukjin.kwon] > - Add missing signatures on Scala side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33257) Support Column inputs in PySpark ordering functions (asc*, desc*)
[ https://issues.apache.org/jira/browse/SPARK-33257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224665#comment-17224665 ] Apache Spark commented on SPARK-33257: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/30227 > Support Column inputs in PySpark ordering functions (asc*, desc*) > - > > Key: SPARK-33257 > URL: https://issues.apache.org/jira/browse/SPARK-33257 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.1.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > According to SPARK-26979, PySpark functions should support both {{Column}} > and {{str}} arguments, when possible. > However, the following ordering support only {{str}} > - {{asc}} > - {{desc}} > - {{asc_nulls_first}} > - {{asc_nulls_last}} > - {{desc_nulls_first}} > - {{desc_nulls_last}} > support only {{str}}. This is because Scala side doesn't provide {{Column => > Column}} variants. > To fix this, we do one of the following: > - Call corresponding {{Column}} methods as > [suggested|https://github.com/apache/spark/pull/30143#discussion_r512366978] > by [~hyukjin.kwon] > - Add missing signatures on Scala side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33257) Support Column inputs in PySpark ordering functions (asc*, desc*)
[ https://issues.apache.org/jira/browse/SPARK-33257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33257: Assignee: (was: Apache Spark) > Support Column inputs in PySpark ordering functions (asc*, desc*) > - > > Key: SPARK-33257 > URL: https://issues.apache.org/jira/browse/SPARK-33257 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.1.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > According to SPARK-26979, PySpark functions should support both {{Column}} > and {{str}} arguments, when possible. > However, the following ordering support only {{str}} > - {{asc}} > - {{desc}} > - {{asc_nulls_first}} > - {{asc_nulls_last}} > - {{desc_nulls_first}} > - {{desc_nulls_last}} > support only {{str}}. This is because Scala side doesn't provide {{Column => > Column}} variants. > To fix this, we do one of the following: > - Call corresponding {{Column}} methods as > [suggested|https://github.com/apache/spark/pull/30143#discussion_r512366978] > by [~hyukjin.kwon] > - Add missing signatures on Scala side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33257) Support Column inputs in PySpark ordering functions (asc*, desc*)
[ https://issues.apache.org/jira/browse/SPARK-33257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33257: Assignee: Apache Spark > Support Column inputs in PySpark ordering functions (asc*, desc*) > - > > Key: SPARK-33257 > URL: https://issues.apache.org/jira/browse/SPARK-33257 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.1.0 >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Major > > According to SPARK-26979, PySpark functions should support both {{Column}} > and {{str}} arguments, when possible. > However, the following ordering support only {{str}} > - {{asc}} > - {{desc}} > - {{asc_nulls_first}} > - {{asc_nulls_last}} > - {{desc_nulls_first}} > - {{desc_nulls_last}} > support only {{str}}. This is because Scala side doesn't provide {{Column => > Column}} variants. > To fix this, we do one of the following: > - Call corresponding {{Column}} methods as > [suggested|https://github.com/apache/spark/pull/30143#discussion_r512366978] > by [~hyukjin.kwon] > - Add missing signatures on Scala side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33318) Ability to set Dynamodb table name while reading from Kinesis
chethan gowda created SPARK-33318: - Summary: Ability to set Dynamodb table name while reading from Kinesis Key: SPARK-33318 URL: https://issues.apache.org/jira/browse/SPARK-33318 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 2.4.7 Reporter: chethan gowda * Need the ability to set dynamodb table while reading data from kinesis. The KCL library provides the ability to set dynamodb table name. example: [https://aws.amazon.com/premiumsupport/knowledge-center/kinesis-kcl-apps-dynamodb-table/] . We would like to have a similar interface to pass the dynamodb table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code
[ https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224648#comment-17224648 ] Oscar Cassetti commented on SPARK-26365: [~itayb] I have a patch [^spark-3.0.0-raise-exception-k8s-failure.patch] which I tested for spark-3.0.0. It is not pretty but it does the job I also have one for v2.4.5 [^spark-2.4.5-raise-exception-k8s-failure.patch] again the code is a bit ugly but I have been using it in production since June > spark-submit for k8s cluster doesn't propagate exit code > > > Key: SPARK-26365 > URL: https://issues.apache.org/jira/browse/SPARK-26365 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Spark Submit >Affects Versions: 2.3.2, 2.4.0 >Reporter: Oscar Bonilla >Priority: Minor > Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, > spark-3.0.0-raise-exception-k8s-failure.patch > > > When launching apps using spark-submit in a kubernetes cluster, if the Spark > applications fails (returns exit code = 1 for example), spark-submit will > still exit gracefully and return exit code = 0. > This is problematic, since there's no way to know if there's been a problem > with the Spark application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code
[ https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar Cassetti updated SPARK-26365: --- Attachment: spark-2.4.5-raise-exception-k8s-failure.patch > spark-submit for k8s cluster doesn't propagate exit code > > > Key: SPARK-26365 > URL: https://issues.apache.org/jira/browse/SPARK-26365 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Spark Submit >Affects Versions: 2.3.2, 2.4.0 >Reporter: Oscar Bonilla >Priority: Minor > Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, > spark-3.0.0-raise-exception-k8s-failure.patch > > > When launching apps using spark-submit in a kubernetes cluster, if the Spark > applications fails (returns exit code = 1 for example), spark-submit will > still exit gracefully and return exit code = 0. > This is problematic, since there's no way to know if there's been a problem > with the Spark application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code
[ https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar Cassetti updated SPARK-26365: --- Attachment: spark-3.0.0-raise-exception-k8s-failure.patch > spark-submit for k8s cluster doesn't propagate exit code > > > Key: SPARK-26365 > URL: https://issues.apache.org/jira/browse/SPARK-26365 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Spark Submit >Affects Versions: 2.3.2, 2.4.0 >Reporter: Oscar Bonilla >Priority: Minor > Attachments: spark-3.0.0-raise-exception-k8s-failure.patch > > > When launching apps using spark-submit in a kubernetes cluster, if the Spark > applications fails (returns exit code = 1 for example), spark-submit will > still exit gracefully and return exit code = 0. > This is problematic, since there's no way to know if there's been a problem > with the Spark application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33299) Unify schema parsing in from_json/from_csv across all APIs
[ https://issues.apache.org/jira/browse/SPARK-33299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224602#comment-17224602 ] Apache Spark commented on SPARK-33299: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/30226 > Unify schema parsing in from_json/from_csv across all APIs > -- > > Key: SPARK-33299 > URL: https://issues.apache.org/jira/browse/SPARK-33299 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Currently, from_json() has extra capability in Scala API. It accepts schema > in JSON format but other API (SQL, Python, R) lacks the feature. The ticket > aims to unify all APIs, and support schemas in JSON format everywhere. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28180) Encoding CSV to Pojo works with Encoders.bean on RDD but fail on asserts when attemtping it from a Dataset
[ https://issues.apache.org/jira/browse/SPARK-28180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224601#comment-17224601 ] Julien commented on SPARK-28180: Not too sure what your issue is. It is to indicate that a meaningful error message should be produced? BTW, stating that "_Scala_ costs a lot to _Spark"_ cannot be correct: spark is implemented in scala :) However, the java API is not at the same level as the scala API. I myself have several issue with the Encoders.bean tool and not being able to construct own-made encoders for java POJO. Automatic is not always a good idea. In your case, the automatic parsing of the POJO getters to list the encoder fields is not helpful... > Encoding CSV to Pojo works with Encoders.bean on RDD but fail on asserts when > attemtping it from a Dataset > --- > > Key: SPARK-28180 > URL: https://issues.apache.org/jira/browse/SPARK-28180 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 > Environment: Debian 9, Java 8. >Reporter: Marc Le Bihan >Priority: Major > > I am converting an _RDD_ spark program to a _Dataset_ one. > Once, it was converting a CSV file mapped the help of a Jackson loader to a > RDD of Enterprise objects with Encoders.bean(Entreprise.class), and now it is > doing the conversion more simplier, by loading the CSV content into a > Dataset and applying the _Encoders.bean(Entreprise.class)_ on it. > {code:java} > Dataset csv = this.session.read().format("csv") > .option("header","true").option("quote", "\"").option("escape", "\"") > .load(source.getAbsolutePath()) > .selectExpr( > "ActivitePrincipaleUniteLegale as ActivitePrincipale", > "CAST(AnneeCategorieEntreprise as INTEGER) as > AnneeCategorieEntreprise", > "CAST(AnneeEffectifsUniteLegale as INTEGER) as > AnneeValiditeEffectifSalarie", > "CAST(CaractereEmployeurUniteLegale == 'O' as BOOLEAN) as > CaractereEmployeur", > "CategorieEntreprise", > "CategorieJuridiqueUniteLegale as CategorieJuridique", > "DateCreationUniteLegale as DateCreationEntreprise", "DateDebut > as DateDebutHistorisation", "DateDernierTraitementUniteLegale as > DateDernierTraitement", > "DenominationUniteLegale as Denomination", > "DenominationUsuelle1UniteLegale as DenominationUsuelle1", > "DenominationUsuelle2UniteLegale as DenominationUsuelle2", > "DenominationUsuelle3UniteLegale as DenominationUsuelle3", > "CAST(EconomieSocialeSolidaireUniteLegale == 'O' as BOOLEAN) as > EconomieSocialeSolidaire", > "CAST(EtatAdministratifUniteLegale == 'A' as BOOLEAN) as Active", > "IdentifiantAssociationUniteLegale as IdentifiantAssociation", > "NicSiegeUniteLegale as NicSiege", > "CAST(NombrePeriodesUniteLegale as INTEGER) as NombrePeriodes", > "NomenclatureActivitePrincipaleUniteLegale as > NomenclatureActivitePrincipale", > "NomUniteLegale as NomNaissance", "NomUsageUniteLegale as > NomUsage", > "Prenom1UniteLegale as Prenom1", "Prenom2UniteLegale as Prenom2", > "Prenom3UniteLegale as Prenom3", "Prenom4UniteLegale as Prenom4", > "PrenomUsuelUniteLegale as PrenomUsuel", > "PseudonymeUniteLegale as Pseudonyme", > "SexeUniteLegale as Sexe", > "SigleUniteLegale as Sigle", > "Siren", > "TrancheEffectifsUniteLegale as TrancheEffectifSalarie" > ); > {code} > The _Dataset_ is succesfully created. But the following call of > _Encoders.bean(Enterprise.class)_ fails : > {code:java} > java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:87) > at org.apache.spark.sql.Encoders$.bean(Encoders.scala:142) > at org.apache.spark.sql.Encoders.bean(Encoders.scala) > at > fr.ecoemploi.spark.entreprise.EntrepriseService.dsEntreprises(EntrepriseService.java:178) > at > test.fr.ecoemploi.spark.entreprise.EntreprisesIT.datasetEntreprises(EntreprisesIT.java:72) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:532) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:115) >
[jira] [Commented] (SPARK-33060) approxSimilarityJoin in Structured Stream causes state to explode in size
[ https://issues.apache.org/jira/browse/SPARK-33060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224589#comment-17224589 ] Bram van den Akker commented on SPARK-33060: [~tdas] any idea how this could be addressed? > approxSimilarityJoin in Structured Stream causes state to explode in size > - > > Key: SPARK-33060 > URL: https://issues.apache.org/jira/browse/SPARK-33060 > Project: Spark > Issue Type: Bug > Components: ML, PySpark, Structured Streaming >Affects Versions: 3.0.0 >Reporter: Bram van den Akker >Priority: Major > Attachments: Screenshot 2020-10-01 at 16.03.26.png > > > I'm writing a PySpark application that joins a static and streaming dataframe > together using the approxSimilarityJoin function from the ML package. Because > of the high volume of data, we need to apply a watermark to make sure a > minimal amount of state is preserved. However, the [approxSimilarityJoin > scala code contains a `distinct` > action|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala#L289] > right after it joins the two datasets together. This call results in a > state being created to account for late arriving data. > Watermarks created in the PySpark code are being ignored and still lead to > the state accumulating in size. > My expectation is that the watermarking is lost in between the communication > from Python to Scala. > I've created [this Stackoverflow > question|https://stackoverflow.com/questions/64157104/stream-static-join-without-aggregation-still-results-in-accumulating-spark-state] > earlier this week, but after more investigation this really seem like a bug > rather than a user error. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code
[ https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224569#comment-17224569 ] Itay Bittan commented on SPARK-26365: - Hi, we are having the same issue. It's critical in a scenario that triggers another job based on the first app success/failure. Any idea for a workaround meanwhile? > spark-submit for k8s cluster doesn't propagate exit code > > > Key: SPARK-26365 > URL: https://issues.apache.org/jira/browse/SPARK-26365 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Spark Submit >Affects Versions: 2.3.2, 2.4.0 >Reporter: Oscar Bonilla >Priority: Minor > > When launching apps using spark-submit in a kubernetes cluster, if the Spark > applications fails (returns exit code = 1 for example), spark-submit will > still exit gracefully and return exit code = 0. > This is problematic, since there's no way to know if there's been a problem > with the Spark application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33187) Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method
[ https://issues.apache.org/jira/browse/SPARK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33187: Assignee: (was: Apache Spark) > Add a check on the number of returned partitions in the > HiveShim#getPartitionsByFilter method > - > > Key: SPARK-33187 > URL: https://issues.apache.org/jira/browse/SPARK-33187 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: jinhai >Priority: Major > > In the method Shim#getPartitionsByFilter, when filter is empty or when the > hive table has a large number of partitions, calling getAllPartitionsMethod > or getPartitionsByFilterMethod will results in Driver OOM. > I think we need add a check on the number of returned partitions by calling > Hive#getNumPartitionsByFilter, and add SQLConf > spark.sql.hive.metastorePartitionLimit, default value is 100_000 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33187) Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method
[ https://issues.apache.org/jira/browse/SPARK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33187: Assignee: Apache Spark > Add a check on the number of returned partitions in the > HiveShim#getPartitionsByFilter method > - > > Key: SPARK-33187 > URL: https://issues.apache.org/jira/browse/SPARK-33187 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: jinhai >Assignee: Apache Spark >Priority: Major > > In the method Shim#getPartitionsByFilter, when filter is empty or when the > hive table has a large number of partitions, calling getAllPartitionsMethod > or getPartitionsByFilterMethod will results in Driver OOM. > I think we need add a check on the number of returned partitions by calling > Hive#getNumPartitionsByFilter, and add SQLConf > spark.sql.hive.metastorePartitionLimit, default value is 100_000 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33187) Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method
[ https://issues.apache.org/jira/browse/SPARK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224567#comment-17224567 ] Apache Spark commented on SPARK-33187: -- User 'manbuyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30225 > Add a check on the number of returned partitions in the > HiveShim#getPartitionsByFilter method > - > > Key: SPARK-33187 > URL: https://issues.apache.org/jira/browse/SPARK-33187 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: jinhai >Priority: Major > > In the method Shim#getPartitionsByFilter, when filter is empty or when the > hive table has a large number of partitions, calling getAllPartitionsMethod > or getPartitionsByFilterMethod will results in Driver OOM. > I think we need add a check on the number of returned partitions by calling > Hive#getNumPartitionsByFilter, and add SQLConf > spark.sql.hive.metastorePartitionLimit, default value is 100_000 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33317) Spark Hive SQL returning empty dataframe
Debadutta created SPARK-33317: - Summary: Spark Hive SQL returning empty dataframe Key: SPARK-33317 URL: https://issues.apache.org/jira/browse/SPARK-33317 Project: Spark Issue Type: Bug Components: Spark Core, Spark Shell Affects Versions: 2.4.6 Reporter: Debadutta I am trying to run a sql query on a hive table using hive connector in spark but I am getting an empty dataframe. The query I am trying to run:- {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' and '1000772585'")}} This is failing but if I remove the leading whitespaces it works. {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' and '1000772585'")}} Currently, I am removing leading and trailing whitespaces as a workaround. But the same query with whitespaces works fine in hive console. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33316) Support nullable Avro schemas for non-nullable data in Avro writing
[ https://issues.apache.org/jira/browse/SPARK-33316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33316: Assignee: Apache Spark > Support nullable Avro schemas for non-nullable data in Avro writing > --- > > Key: SPARK-33316 > URL: https://issues.apache.org/jira/browse/SPARK-33316 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0, 3.0.0, 3.0.1 >Reporter: Bo Zhang >Assignee: Apache Spark >Priority: Major > > Currently when users try to use nullable Avro schemas for non-nullable data > in Avro writing, Spark will throw a IncompatibleSchemaException. > There are some cases when users do not have full control over the nullability > of the data, or the nullability of the Avro schemas they have to use. We > should support nullable Avro schemas for non-nullable data in Avro writing > for better usability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33316) Support nullable Avro schemas for non-nullable data in Avro writing
[ https://issues.apache.org/jira/browse/SPARK-33316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33316: Assignee: (was: Apache Spark) > Support nullable Avro schemas for non-nullable data in Avro writing > --- > > Key: SPARK-33316 > URL: https://issues.apache.org/jira/browse/SPARK-33316 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0, 3.0.0, 3.0.1 >Reporter: Bo Zhang >Priority: Major > > Currently when users try to use nullable Avro schemas for non-nullable data > in Avro writing, Spark will throw a IncompatibleSchemaException. > There are some cases when users do not have full control over the nullability > of the data, or the nullability of the Avro schemas they have to use. We > should support nullable Avro schemas for non-nullable data in Avro writing > for better usability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33316) Support nullable Avro schemas for non-nullable data in Avro writing
[ https://issues.apache.org/jira/browse/SPARK-33316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224566#comment-17224566 ] Apache Spark commented on SPARK-33316: -- User 'bozhang2820' has created a pull request for this issue: https://github.com/apache/spark/pull/30224 > Support nullable Avro schemas for non-nullable data in Avro writing > --- > > Key: SPARK-33316 > URL: https://issues.apache.org/jira/browse/SPARK-33316 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0, 3.0.0, 3.0.1 >Reporter: Bo Zhang >Priority: Major > > Currently when users try to use nullable Avro schemas for non-nullable data > in Avro writing, Spark will throw a IncompatibleSchemaException. > There are some cases when users do not have full control over the nullability > of the data, or the nullability of the Avro schemas they have to use. We > should support nullable Avro schemas for non-nullable data in Avro writing > for better usability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33187) Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method
[ https://issues.apache.org/jira/browse/SPARK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jinhai updated SPARK-33187: --- Description: In the method Shim#getPartitionsByFilter, when filter is empty or when the hive table has a large number of partitions, calling getAllPartitionsMethod or getPartitionsByFilterMethod will results in Driver OOM. I think we need add a check on the number of returned partitions by calling Hive#getNumPartitionsByFilter, and add SQLConf spark.sql.hive.metastorePartitionLimit, default value is 100_000 was: In the method Shim#getPartitionsByFilter, when filter is empty or when the hive table has a large number of partitions, calling getAllPartitionsMethod or getPartitionsByFilterMethod will results in Driver OOM. I think we need add a check on the number of returned partitions by calling Hive#getNumPartitionsByFilter, and add SQLConf spark.sql.hive.exceeded.partition.limit, default value is 100_000 > Add a check on the number of returned partitions in the > HiveShim#getPartitionsByFilter method > - > > Key: SPARK-33187 > URL: https://issues.apache.org/jira/browse/SPARK-33187 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: jinhai >Priority: Major > > In the method Shim#getPartitionsByFilter, when filter is empty or when the > hive table has a large number of partitions, calling getAllPartitionsMethod > or getPartitionsByFilterMethod will results in Driver OOM. > I think we need add a check on the number of returned partitions by calling > Hive#getNumPartitionsByFilter, and add SQLConf > spark.sql.hive.metastorePartitionLimit, default value is 100_000 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33306) TimezoneID is needed when there cast from Date to String
[ https://issues.apache.org/jira/browse/SPARK-33306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224525#comment-17224525 ] Apache Spark commented on SPARK-33306: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/30223 > TimezoneID is needed when there cast from Date to String > > > Key: SPARK-33306 > URL: https://issues.apache.org/jira/browse/SPARK-33306 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0 >Reporter: EdisonWang >Assignee: EdisonWang >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > A simple way to reproduce this is > {code} > spark-shell --conf spark.sql.legacy.typeCoercion.datetimeToString.enabled > scala> sql(""" > select a.d1 from > (select to_date(concat('2000-01-0', id)) as d1 from range(1, 2)) a > join > (select concat('2000-01-0', id) as d2 from range(1, 2)) b > on a.d1 = b.d2 > """).show > {code} > > it will throw > {code} > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:529) > at scala.None$.get(Option.scala:527) > at > org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId(datetimeExpressions.scala:56) > at > org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId$(datetimeExpressions.scala:56) > at > org.apache.spark.sql.catalyst.expressions.CastBase.zoneId$lzycompute(Cast.scala:253) > at org.apache.spark.sql.catalyst.expressions.CastBase.zoneId(Cast.scala:253) > at > org.apache.spark.sql.catalyst.expressions.CastBase.dateFormatter$lzycompute(Cast.scala:287) > at > org.apache.spark.sql.catalyst.expressions.CastBase.dateFormatter(Cast.scala:287) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org