[jira] [Updated] (SPARK-24266) Spark client terminates while driver is still running

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24266:
--
Fix Version/s: 3.0.2

> Spark client terminates while driver is still running
> -
>
> Key: SPARK-24266
> URL: https://issues.apache.org/jira/browse/SPARK-24266
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Chun Chen
>Assignee: Stijn De Haes
>Priority: Critical
> Fix For: 3.0.2, 3.1.0
>
>
> {code}
> Warning: Ignoring non-spark config property: Default=system properties 
> included when running spark-submit.
> 18/05/11 14:50:12 WARN Config: Error reading service account token from: 
> [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
> 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: 
> Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf)
> 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. 
> Mounting Hadoop specific files
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: N/A
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: 2018-05-11T06:50:17Z
>container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9
>phase: Pending
>status: [ContainerStatus(containerID=null, 
> image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, 
> lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
> restartCount=0, state=ContainerState(running=null, terminated=null, 
> waiting=ContainerStateWaiting(message=null, reason=PodInitializing, 
> additionalProperties={}), additionalProperties={}), additionalProperties={})]
> 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to 
> finish...
> 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> 

[jira] [Commented] (SPARK-29392) Remove use of deprecated symbol literal " 'name " syntax in favor Symbol("name")

2020-11-02 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225169#comment-17225169
 ] 

Yang Jie commented on SPARK-29392:
--

It seems that there are still many similar issue, especially the catalyst 
module and sql module. Maven's compilation warnings log will only print 100 
lines, and when we fix some of them, another part of the compilation warnings 
will appear 

 

> Remove use of deprecated symbol literal " 'name " syntax in favor 
> Symbol("name")
> 
>
> Key: SPARK-29392
> URL: https://issues.apache.org/jira/browse/SPARK-29392
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.0.0
>
>
> Example:
> {code}
> [WARNING] [Warn] 
> /Users/seanowen/Documents/spark_2.13/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala:308:
>  symbol literal is deprecated; use Symbol("assertInvariants") instead
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33324) Upgrade kubernetes-client to 4.11.1

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33324.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30233
[https://github.com/apache/spark/pull/30233]

> Upgrade kubernetes-client to 4.11.1
> ---
>
> Key: SPARK-33324
> URL: https://issues.apache.org/jira/browse/SPARK-33324
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33324) Upgrade kubernetes-client to 4.11.1

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33324:
-

Assignee: Dongjoon Hyun

> Upgrade kubernetes-client to 4.11.1
> ---
>
> Key: SPARK-33324
> URL: https://issues.apache.org/jira/browse/SPARK-33324
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225154#comment-17225154
 ] 

Dongjoon Hyun commented on SPARK-33317:
---

Since it's already resolved, it's okay, [~hyukjin.kwon].

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Major
> Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225150#comment-17225150
 ] 

Hyukjin Kwon edited comment on SPARK-33317 at 11/3/20, 6:00 AM:


[~dongjoon], feel free to reopen if you think this ticket should be accessed 
further. I tend to take an action to JIRAs a bit aggressively so any correction 
to my action is welcome :-).


was (Author: hyukjin.kwon):
[~dongjoon], feel free to reopen if you think this ticket should be accessed 
further. I tend to take an action to JIRAs a bit aggressively so welcome to any 
correction to my action :-).

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Major
> Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225150#comment-17225150
 ] 

Hyukjin Kwon commented on SPARK-33317:
--

[~dongjoon], feel free to reopen if you think this ticket should be accessed 
further. I tend to take an action to JIRAs a bit aggressively so welcome to any 
correction to my action :-).

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Major
> Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33156) Upgrade GithubAction image from 18.04 to 20.04

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225149#comment-17225149
 ] 

Dongjoon Hyun commented on SPARK-33156:
---

This is backported as a preparation for AmbLab Jenkins Farm OS upgrade (to 
`Ubuntu 20.04`).

> Upgrade GithubAction image from 18.04 to 20.04
> --
>
> Key: SPARK-33156
> URL: https://issues.apache.org/jira/browse/SPARK-33156
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.2, 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33156) Upgrade GithubAction image from 18.04 to 20.04

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33156:
--
Priority: Major  (was: Minor)

> Upgrade GithubAction image from 18.04 to 20.04
> --
>
> Key: SPARK-33156
> URL: https://issues.apache.org/jira/browse/SPARK-33156
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33156) Upgrade GithubAction image from 18.04 to 20.04

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33156:
--
Fix Version/s: 3.0.2

> Upgrade GithubAction image from 18.04 to 20.04
> --
>
> Key: SPARK-33156
> URL: https://issues.apache.org/jira/browse/SPARK-33156
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.2, 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225147#comment-17225147
 ] 

Dongjoon Hyun commented on SPARK-33317:
---

[~hyukjin.kwon]. The reported case was incorrect from the beginning, but we had 
better try to understand the reported situation correctly.

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Major
> Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225144#comment-17225144
 ] 

Dongjoon Hyun commented on SPARK-33317:
---

Hi, [~qwe1398775315]. You are right there but the reason I asked [~dgodnaik] 
about the background and context is that he reported an empty dataframe. He is 
describing another situation and I'm trying to understand his procedure.

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Major
> Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33317.
--
Resolution: Not A Problem

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Major
> Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Liu Neng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225141#comment-17225141
 ] 

Liu Neng commented on SPARK-33317:
--

I run these sql on spark 3.0.0, condition 1 +(between ' 1000405134' and 
'1000772585')+ find 6012 records, condition 2 ++(between '1000405134' and 
'1000772585'++) find 2798 records.

I find that comparator in codegen is UTF8String

!image-2020-11-03-13-30-12-049.png!

" 1000405134"  is smaller than "1000405134" 

I think that it isn't an issue, because comparing value is String not Number. 

I tried to analyze the parse tree, "1000405134"  is a String literal.

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Major
> Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Liu Neng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Neng updated SPARK-33317:
-
Attachment: image-2020-11-03-13-30-12-049.png

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Major
> Attachments: farmers.csv, image-2020-11-03-13-30-12-049.png
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-33324) Upgrade kubernetes-client to 4.11.1

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33324:
--
Comment: was deleted

(was: User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30233)

> Upgrade kubernetes-client to 4.11.1
> ---
>
> Key: SPARK-33324
> URL: https://issues.apache.org/jira/browse/SPARK-33324
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33324) Upgrade kubernetes-client to 4.11.1

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225126#comment-17225126
 ] 

Apache Spark commented on SPARK-33324:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30233

> Upgrade kubernetes-client to 4.11.1
> ---
>
> Key: SPARK-33324
> URL: https://issues.apache.org/jira/browse/SPARK-33324
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33324) Upgrade kubernetes-client to 4.11.1

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225127#comment-17225127
 ] 

Apache Spark commented on SPARK-33324:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30233

> Upgrade kubernetes-client to 4.11.1
> ---
>
> Key: SPARK-33324
> URL: https://issues.apache.org/jira/browse/SPARK-33324
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33324) Upgrade kubernetes-client to 4.11.1

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33324:


Assignee: Apache Spark

> Upgrade kubernetes-client to 4.11.1
> ---
>
> Key: SPARK-33324
> URL: https://issues.apache.org/jira/browse/SPARK-33324
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33324) Upgrade kubernetes-client to 4.11.1

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33324:


Assignee: (was: Apache Spark)

> Upgrade kubernetes-client to 4.11.1
> ---
>
> Key: SPARK-33324
> URL: https://issues.apache.org/jira/browse/SPARK-33324
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33324) Upgrade kubernetes-client to 4.11.1

2020-11-02 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-33324:
-

 Summary: Upgrade kubernetes-client to 4.11.1
 Key: SPARK-33324
 URL: https://issues.apache.org/jira/browse/SPARK-33324
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Kubernetes
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24266) Spark client terminates while driver is still running

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24266:
--
Priority: Critical  (was: Major)

> Spark client terminates while driver is still running
> -
>
> Key: SPARK-24266
> URL: https://issues.apache.org/jira/browse/SPARK-24266
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Chun Chen
>Assignee: Stijn De Haes
>Priority: Critical
> Fix For: 3.1.0
>
>
> {code}
> Warning: Ignoring non-spark config property: Default=system properties 
> included when running spark-submit.
> 18/05/11 14:50:12 WARN Config: Error reading service account token from: 
> [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
> 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: 
> Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf)
> 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. 
> Mounting Hadoop specific files
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: N/A
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: 2018-05-11T06:50:17Z
>container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9
>phase: Pending
>status: [ContainerStatus(containerID=null, 
> image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, 
> lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
> restartCount=0, state=ContainerState(running=null, terminated=null, 
> waiting=ContainerStateWaiting(message=null, reason=PodInitializing, 
> additionalProperties={}), additionalProperties={}), additionalProperties={})]
> 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to 
> finish...
> 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> 

[jira] [Assigned] (SPARK-24266) Spark client terminates while driver is still running

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-24266:
-

Assignee: Stijn De Haes

> Spark client terminates while driver is still running
> -
>
> Key: SPARK-24266
> URL: https://issues.apache.org/jira/browse/SPARK-24266
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Chun Chen
>Assignee: Stijn De Haes
>Priority: Major
> Fix For: 3.1.0
>
>
> {code}
> Warning: Ignoring non-spark config property: Default=system properties 
> included when running spark-submit.
> 18/05/11 14:50:12 WARN Config: Error reading service account token from: 
> [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
> 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: 
> Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf)
> 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. 
> Mounting Hadoop specific files
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: N/A
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: 2018-05-11T06:50:17Z
>container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9
>phase: Pending
>status: [ContainerStatus(containerID=null, 
> image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, 
> lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
> restartCount=0, state=ContainerState(running=null, terminated=null, 
> waiting=ContainerStateWaiting(message=null, reason=PodInitializing, 
> additionalProperties={}), additionalProperties={}), additionalProperties={})]
> 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to 
> finish...
> 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> 

[jira] [Commented] (SPARK-24266) Spark client terminates while driver is still running

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225123#comment-17225123
 ] 

Dongjoon Hyun commented on SPARK-24266:
---

Please see the on-going backport PR. The validation seems to fail on branch-3.0.

> Spark client terminates while driver is still running
> -
>
> Key: SPARK-24266
> URL: https://issues.apache.org/jira/browse/SPARK-24266
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Chun Chen
>Priority: Major
> Fix For: 3.1.0
>
>
> {code}
> Warning: Ignoring non-spark config property: Default=system properties 
> included when running spark-submit.
> 18/05/11 14:50:12 WARN Config: Error reading service account token from: 
> [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
> 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: 
> Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf)
> 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. 
> Mounting Hadoop specific files
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: N/A
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: 2018-05-11T06:50:17Z
>container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9
>phase: Pending
>status: [ContainerStatus(containerID=null, 
> image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, 
> lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
> restartCount=0, state=ContainerState(running=null, terminated=null, 
> waiting=ContainerStateWaiting(message=null, reason=PodInitializing, 
> additionalProperties={}), additionalProperties={}), additionalProperties={})]
> 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to 
> finish...
> 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, 

[jira] [Updated] (SPARK-24266) Spark client terminates while driver is still running

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24266:
--
Parent: SPARK-33005
Issue Type: Sub-task  (was: Bug)

> Spark client terminates while driver is still running
> -
>
> Key: SPARK-24266
> URL: https://issues.apache.org/jira/browse/SPARK-24266
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Chun Chen
>Priority: Major
> Fix For: 3.1.0
>
>
> {code}
> Warning: Ignoring non-spark config property: Default=system properties 
> included when running spark-submit.
> 18/05/11 14:50:12 WARN Config: Error reading service account token from: 
> [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
> 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: 
> Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf)
> 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. 
> Mounting Hadoop specific files
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: N/A
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: 2018-05-11T06:50:17Z
>container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9
>phase: Pending
>status: [ContainerStatus(containerID=null, 
> image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, 
> lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
> restartCount=0, state=ContainerState(running=null, terminated=null, 
> waiting=ContainerStateWaiting(message=null, reason=PodInitializing, 
> additionalProperties={}), additionalProperties={}), additionalProperties={})]
> 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to 
> finish...
> 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> 

[jira] [Commented] (SPARK-33156) Upgrade GithubAction image from 18.04 to 20.04

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225115#comment-17225115
 ] 

Apache Spark commented on SPARK-33156:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30232

> Upgrade GithubAction image from 18.04 to 20.04
> --
>
> Key: SPARK-33156
> URL: https://issues.apache.org/jira/browse/SPARK-33156
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33156) Upgrade GithubAction image from 18.04 to 20.04

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225114#comment-17225114
 ] 

Apache Spark commented on SPARK-33156:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30231

> Upgrade GithubAction image from 18.04 to 20.04
> --
>
> Key: SPARK-33156
> URL: https://issues.apache.org/jira/browse/SPARK-33156
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33300) Rule SimplifyCasts will not work for nested columns

2020-11-02 Thread chendihao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225089#comment-17225089
 ] 

chendihao commented on SPARK-33300:
---

Great and thanks [~EveLiao] . I'm not familiar with Catalyst optimizer but it 
should recursively run the rule in child expressions. It's easy to reproduce in 
Spark 3.0 and please let me know if you need any help.

> Rule SimplifyCasts will not work for nested columns
> ---
>
> Key: SPARK-33300
> URL: https://issues.apache.org/jira/browse/SPARK-33300
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.0
>Reporter: chendihao
>Priority: Minor
>
> We use SparkSQL and Catalyst to optimize the Spark job. We have read the 
> source code and test the rule of SimplifyCasts which will work for simple SQL 
> without nested cast.
> The SQL "select cast(string_date as string) from t1" will be optimized.
> {code:java}
> == Analyzed Logical Plan ==
> string_date: string
> Project [cast(string_date#12 as string) AS string_date#24]
> +- SubqueryAlias t1
>  +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, 
> string_timestamp#13, timestamp_field#14, bool_field#15], false
> == Optimized Logical Plan ==
> Project [string_date#12]
> +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, 
> string_timestamp#13, timestamp_field#14, bool_field#15], false
> {code}
> However, it fail to optimize with the nested cast like this "select 
> cast(cast(string_date as string) as string) from t1".
> {code:java}
> == Analyzed Logical Plan ==
> CAST(CAST(string_date AS STRING) AS STRING): string
> Project [cast(cast(string_date#12 as string) as string) AS 
> CAST(CAST(string_date AS STRING) AS STRING)#24]
> +- SubqueryAlias t1
>  +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, 
> string_timestamp#13, timestamp_field#14, bool_field#15], false
> == Optimized Logical Plan ==
> Project [string_date#12 AS CAST(CAST(string_date AS STRING) AS STRING)#24]
> +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, 
> string_timestamp#13, timestamp_field#14, bool_field#15], false
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33323) Add query resolved check before convert hive relation

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33323:


Assignee: (was: Apache Spark)

> Add query resolved check before convert hive relation
> -
>
> Key: SPARK-33323
> URL: https://issues.apache.org/jira/browse/SPARK-33323
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>
> Add query resolved check before convert hive relation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33323) Add query resolved check before convert hive relation

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33323:


Assignee: Apache Spark

> Add query resolved check before convert hive relation
> -
>
> Key: SPARK-33323
> URL: https://issues.apache.org/jira/browse/SPARK-33323
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: Apache Spark
>Priority: Minor
>
> Add query resolved check before convert hive relation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33323) Add query resolved check before convert hive relation

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225087#comment-17225087
 ] 

Apache Spark commented on SPARK-33323:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/30230

> Add query resolved check before convert hive relation
> -
>
> Key: SPARK-33323
> URL: https://issues.apache.org/jira/browse/SPARK-33323
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>
> Add query resolved check before convert hive relation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33323) Add query resolved check before convert hive relation

2020-11-02 Thread ulysses you (Jira)
ulysses you created SPARK-33323:
---

 Summary: Add query resolved check before convert hive relation
 Key: SPARK-33323
 URL: https://issues.apache.org/jira/browse/SPARK-33323
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: ulysses you


Add query resolved check before convert hive relation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33312) Provide latest Spark 2.4.7 runnable distribution

2020-11-02 Thread Prateek Dubey (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225084#comment-17225084
 ] 

Prateek Dubey commented on SPARK-33312:
---

Thanks [~dongjoon] and [~hyukjin.kwon] 

If Spark 2.4.8 is releasing in Dec 2020, I think I can wait till then :). Also, 
I'll follow the snapshots approach for now as mentioned by [~hyukjin.kwon]

> Provide latest Spark 2.4.7 runnable distribution
> 
>
> Key: SPARK-33312
> URL: https://issues.apache.org/jira/browse/SPARK-33312
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 2.4.7
>Reporter: Prateek Dubey
>Priority: Major
>
> Not sure if this is the right approach, however it would be great if latest 
> Spark 2.4.7 runnable distribution can be provided here - 
> [https://spark.apache.org/downloads.html]
> Currently it seems the last build was done on Sept 12th' 2020. 
> I'm working on running Spark workloads on EKS using EKS IRSA. I'm able to run 
> Spark workloads on EKS using IRSA with Spark 3.0/ Hadoop 3.2, however I want 
> to do the same with Spark 2.4.7/ Hadoop 2.7. 
> Recently this PR was merged with 2.4.x - 
> [https://github.com/apache/spark/pull/29877] and therefore I'm in need of 
> latest Spark distribution 
>  
> PS: I tried building latest Spark 2.4.7 myself as well using Maven, however 
> there are too many errors every-time when it reaches R, therefore it would be 
> great if Spark community itself can provide the latest build.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33245) Add built-in UDF - GETBIT

2020-11-02 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225078#comment-17225078
 ] 

Yuming Wang commented on SPARK-33245:
-

We can use {{substring(bin(col),-8,1)}} instead.

> Add built-in UDF - GETBIT 
> --
>
> Key: SPARK-33245
> URL: https://issues.apache.org/jira/browse/SPARK-33245
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> Teradata, Impala, Snowflake and Yellowbrick support this function:
> https://docs.teradata.com/reader/kmuOwjp1zEYg98JsB8fu_A/PK1oV1b2jqvG~ohRnOro9w
> https://docs.cloudera.com/runtime/7.2.0/impala-sql-reference/topics/impala-bit-functions.html#bit_functions__getbit
> https://docs.snowflake.com/en/sql-reference/functions/getbit.html
> https://www.yellowbrick.com/docs/2.2/ybd_sqlref/getbit.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings

2020-11-02 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33285:
-
Description: 
There are too many  "Auto-application to `()` is deprecated." related 
compilation warnings when compile with Scala 2.13 like
{code:java}
[WARNING] [Warn] 
/spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: 
Auto-application to `()` is deprecated. Supply the empty argument list `()` 
explicitly to invoke method stdev,
or remove the empty argument list from its definition (Java-defined methods are 
exempt).
In Scala 3, an unapplied method like this will be eta-expanded into a function.
{code}
There are a lot of them, but it's easy to fix.

If there is a definition as follows:
{code:java}
Class Foo {
   def bar(): Unit = {}
}

val foo = new Foo{code}
Should be
{code:java}
foo.bar()
{code}
not
{code:java}
foo.bar {code}

  was:
There are too many  "Auto-application to `()` is deprecated." related 
compilation warnings when compile with Scala 2.13 like
{code:java}
[WARNING] [Warn] 
/spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: 
Auto-application to `()` is deprecated. Supply the empty argument list `()` 
explicitly to invoke method stdev,
or remove the empty argument list from its definition (Java-defined methods are 
exempt).
In Scala 3, an unapplied method like this will be eta-expanded into a function.
[WARNING] [Warn] 
/spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:247: 
Auto-application to `()` is deprecated. Supply the empty argument list `()` 
explicitly to invoke method variance,
or remove the empty argument list from its definition (Java-defined methods are 
exempt).
In Scala 3, an unapplied method like this will be eta-expanded into a function.
[WARNING] [Warn] 
/spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:247: 
Auto-application to `()` is deprecated. Supply the empty argument list `()` 
explicitly to invoke method popVariance,
or remove the empty argument list from its definition (Java-defined methods are 
exempt).
In Scala 3, an unapplied method like this will be eta-expanded into a function.
[WARNING] [Warn] 
/spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:248: 
Auto-application to `()` is deprecated. Supply the empty argument list `()` 
explicitly to invoke method stdev,
or remove the empty argument list from its definition (Java-defined methods are 
exempt).
In Scala 3, an unapplied method like this will be eta-expanded into a function.
[WARNING] [Warn] 
/spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:248: 
Auto-application to `()` is deprecated. Supply the empty argument list `()` 
explicitly to invoke method popStdev,
or remove the empty argument list from its definition (Java-defined methods are 
exempt).
In Scala 3, an unapplied method like this will be eta-expanded into a function.
{code}
Maybe these will mask some of the more important compilation warnings

 


> Too many "Auto-application to `()` is deprecated."  related compilation 
> warnings
> 
>
> Key: SPARK-33285
> URL: https://issues.apache.org/jira/browse/SPARK-33285
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are too many  "Auto-application to `()` is deprecated." related 
> compilation warnings when compile with Scala 2.13 like
> {code:java}
> [WARNING] [Warn] 
> /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: 
> Auto-application to `()` is deprecated. Supply the empty argument list `()` 
> explicitly to invoke method stdev,
> or remove the empty argument list from its definition (Java-defined methods 
> are exempt).
> In Scala 3, an unapplied method like this will be eta-expanded into a 
> function.
> {code}
> There are a lot of them, but it's easy to fix.
> If there is a definition as follows:
> {code:java}
> Class Foo {
>def bar(): Unit = {}
> }
> val foo = new Foo{code}
> Should be
> {code:java}
> foo.bar()
> {code}
> not
> {code:java}
> foo.bar {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings

2020-11-02 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33285:
-
Description: 
There are too many  "Auto-application to `()` is deprecated." related 
compilation warnings when compile with Scala 2.13 like
{code:java}
[WARNING] [Warn] 
/spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: 
Auto-application to `()` is deprecated. Supply the empty argument list `()` 
explicitly to invoke method stdev,
or remove the empty argument list from its definition (Java-defined methods are 
exempt).
In Scala 3, an unapplied method like this will be eta-expanded into a function.
{code}
A lot of them, but it's easy to fix.

If there is a definition as follows:
{code:java}
Class Foo {
   def bar(): Unit = {}
}

val foo = new Foo{code}
Should be
{code:java}
foo.bar()
{code}
not
{code:java}
foo.bar {code}

  was:
There are too many  "Auto-application to `()` is deprecated." related 
compilation warnings when compile with Scala 2.13 like
{code:java}
[WARNING] [Warn] 
/spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: 
Auto-application to `()` is deprecated. Supply the empty argument list `()` 
explicitly to invoke method stdev,
or remove the empty argument list from its definition (Java-defined methods are 
exempt).
In Scala 3, an unapplied method like this will be eta-expanded into a function.
{code}
There are a lot of them, but it's easy to fix.

If there is a definition as follows:
{code:java}
Class Foo {
   def bar(): Unit = {}
}

val foo = new Foo{code}
Should be
{code:java}
foo.bar()
{code}
not
{code:java}
foo.bar {code}


> Too many "Auto-application to `()` is deprecated."  related compilation 
> warnings
> 
>
> Key: SPARK-33285
> URL: https://issues.apache.org/jira/browse/SPARK-33285
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are too many  "Auto-application to `()` is deprecated." related 
> compilation warnings when compile with Scala 2.13 like
> {code:java}
> [WARNING] [Warn] 
> /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: 
> Auto-application to `()` is deprecated. Supply the empty argument list `()` 
> explicitly to invoke method stdev,
> or remove the empty argument list from its definition (Java-defined methods 
> are exempt).
> In Scala 3, an unapplied method like this will be eta-expanded into a 
> function.
> {code}
> A lot of them, but it's easy to fix.
> If there is a definition as follows:
> {code:java}
> Class Foo {
>def bar(): Unit = {}
> }
> val foo = new Foo{code}
> Should be
> {code:java}
> foo.bar()
> {code}
> not
> {code:java}
> foo.bar {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33322) Dataframe: data is wrongly presented because of column name

2020-11-02 Thread Mihaly Hazag (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihaly Hazag updated SPARK-33322:
-
Attachment: image-2020-11-03-14-57-09-433.png

> Dataframe: data is wrongly presented because of column name
> ---
>
> Key: SPARK-33322
> URL: https://issues.apache.org/jira/browse/SPARK-33322
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5
>Reporter: Mihaly Hazag
>Priority: Major
> Attachments: image-2020-11-03-14-57-09-433.png, 
> image-2020-11-03-14-57-37-308.png
>
>
> Consider the code below: `some_text` column got the `some_int` value, while 
> its value is null in the dataframe.
>   !image-2020-11-03-14-42-52-840.png!
>  
> Renaming the field from `some_text` to `some_apple`, fixed the problem! 
> !image-2020-11-03-14-43-13-528.png!
>  
> Here is the code to reproduce the problem
> {code:python}
> from datetime import datetime
> from pyspark.sql import Row
> from pyspark.sql.types import StructType, StructField, DateType, StringType, 
> IntegerType
>  
> schema = StructType(
>   [
>     StructField('dfdt', DateType(), True),
>     StructField('some_text', StringType(), True),
>     StructField('some_int', IntegerType(), True),
>   ]
> )
>  
> test_df = spark.createDataFrame([
>   Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', 
> some_int=100)
> ], schema)
>  
> display(test_df)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33322) Dataframe: data is wrongly presented because of column name

2020-11-02 Thread Mihaly Hazag (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihaly Hazag updated SPARK-33322:
-
Description: 
Consider the code below: `some_text` column got the `some_int` value, while its 
value is null in the dataframe.

   !image-2020-11-03-14-57-09-433.png!

 

Renaming the field from `some_text` to `some_apple`, fixed the problem! 

 

 

Here is the code to reproduce the problem
{code:python}
from datetime import datetime
from pyspark.sql import Row
from pyspark.sql.types import StructType, StructField, DateType, StringType, 
IntegerType
 
schema = StructType(
  [
    StructField('dfdt', DateType(), True),
    StructField('some_text', StringType(), True),
    StructField('some_int', IntegerType(), True),
  ]
)
 
test_df = spark.createDataFrame([
  Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', 
some_int=100)
], schema)
 
display(test_df)
{code}
 

  was:
Consider the code below: `some_text` column got the `some_int` value, while its 
value is null in the dataframe.

  !image-2020-11-03-14-42-52-840.png!

 

Renaming the field from `some_text` to `some_apple`, fixed the problem! 

!image-2020-11-03-14-43-13-528.png!

 

Here is the code to reproduce the problem
{code:python}
from datetime import datetime
from pyspark.sql import Row
from pyspark.sql.types import StructType, StructField, DateType, StringType, 
IntegerType
 
schema = StructType(
  [
    StructField('dfdt', DateType(), True),
    StructField('some_text', StringType(), True),
    StructField('some_int', IntegerType(), True),
  ]
)
 
test_df = spark.createDataFrame([
  Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', 
some_int=100)
], schema)
 
display(test_df)
{code}
 


> Dataframe: data is wrongly presented because of column name
> ---
>
> Key: SPARK-33322
> URL: https://issues.apache.org/jira/browse/SPARK-33322
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5
>Reporter: Mihaly Hazag
>Priority: Major
> Attachments: image-2020-11-03-14-57-09-433.png, 
> image-2020-11-03-14-57-37-308.png
>
>
> Consider the code below: `some_text` column got the `some_int` value, while 
> its value is null in the dataframe.
>    !image-2020-11-03-14-57-09-433.png!
>  
> Renaming the field from `some_text` to `some_apple`, fixed the problem! 
>  
>  
> Here is the code to reproduce the problem
> {code:python}
> from datetime import datetime
> from pyspark.sql import Row
> from pyspark.sql.types import StructType, StructField, DateType, StringType, 
> IntegerType
>  
> schema = StructType(
>   [
>     StructField('dfdt', DateType(), True),
>     StructField('some_text', StringType(), True),
>     StructField('some_int', IntegerType(), True),
>   ]
> )
>  
> test_df = spark.createDataFrame([
>   Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', 
> some_int=100)
> ], schema)
>  
> display(test_df)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33322) Dataframe: data is wrongly presented because of column name

2020-11-02 Thread Mihaly Hazag (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihaly Hazag updated SPARK-33322:
-
Description: 
Consider the code below: `some_text` column got the `some_int` value, while its 
value is null in the dataframe.

   !image-2020-11-03-14-57-09-433.png!

 

Renaming the field from `some_text` to `some_apple`, fixed the problem! 

!image-2020-11-03-14-57-37-308.png!

 

 

Here is the code to reproduce the problem
{code:python}
from datetime import datetime
from pyspark.sql import Row
from pyspark.sql.types import StructType, StructField, DateType, StringType, 
IntegerType
 
schema = StructType(
  [
    StructField('dfdt', DateType(), True),
    StructField('some_text', StringType(), True),
    StructField('some_int', IntegerType(), True),
  ]
)
 
test_df = spark.createDataFrame([
  Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', 
some_int=100)
], schema)
 
display(test_df)
{code}
 

  was:
Consider the code below: `some_text` column got the `some_int` value, while its 
value is null in the dataframe.

   !image-2020-11-03-14-57-09-433.png!

 

Renaming the field from `some_text` to `some_apple`, fixed the problem! 

 

 

Here is the code to reproduce the problem
{code:python}
from datetime import datetime
from pyspark.sql import Row
from pyspark.sql.types import StructType, StructField, DateType, StringType, 
IntegerType
 
schema = StructType(
  [
    StructField('dfdt', DateType(), True),
    StructField('some_text', StringType(), True),
    StructField('some_int', IntegerType(), True),
  ]
)
 
test_df = spark.createDataFrame([
  Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', 
some_int=100)
], schema)
 
display(test_df)
{code}
 


> Dataframe: data is wrongly presented because of column name
> ---
>
> Key: SPARK-33322
> URL: https://issues.apache.org/jira/browse/SPARK-33322
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5
>Reporter: Mihaly Hazag
>Priority: Major
> Attachments: image-2020-11-03-14-57-09-433.png, 
> image-2020-11-03-14-57-37-308.png
>
>
> Consider the code below: `some_text` column got the `some_int` value, while 
> its value is null in the dataframe.
>    !image-2020-11-03-14-57-09-433.png!
>  
> Renaming the field from `some_text` to `some_apple`, fixed the problem! 
> !image-2020-11-03-14-57-37-308.png!
>  
>  
> Here is the code to reproduce the problem
> {code:python}
> from datetime import datetime
> from pyspark.sql import Row
> from pyspark.sql.types import StructType, StructField, DateType, StringType, 
> IntegerType
>  
> schema = StructType(
>   [
>     StructField('dfdt', DateType(), True),
>     StructField('some_text', StringType(), True),
>     StructField('some_int', IntegerType(), True),
>   ]
> )
>  
> test_df = spark.createDataFrame([
>   Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', 
> some_int=100)
> ], schema)
>  
> display(test_df)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33322) Dataframe: data is wrongly presented because of column name

2020-11-02 Thread Mihaly Hazag (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihaly Hazag updated SPARK-33322:
-
Attachment: image-2020-11-03-14-57-37-308.png

> Dataframe: data is wrongly presented because of column name
> ---
>
> Key: SPARK-33322
> URL: https://issues.apache.org/jira/browse/SPARK-33322
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5
>Reporter: Mihaly Hazag
>Priority: Major
> Attachments: image-2020-11-03-14-57-09-433.png, 
> image-2020-11-03-14-57-37-308.png
>
>
> Consider the code below: `some_text` column got the `some_int` value, while 
> its value is null in the dataframe.
>    !image-2020-11-03-14-57-09-433.png!
>  
> Renaming the field from `some_text` to `some_apple`, fixed the problem! 
>  
>  
> Here is the code to reproduce the problem
> {code:python}
> from datetime import datetime
> from pyspark.sql import Row
> from pyspark.sql.types import StructType, StructField, DateType, StringType, 
> IntegerType
>  
> schema = StructType(
>   [
>     StructField('dfdt', DateType(), True),
>     StructField('some_text', StringType(), True),
>     StructField('some_int', IntegerType(), True),
>   ]
> )
>  
> test_df = spark.createDataFrame([
>   Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', 
> some_int=100)
> ], schema)
>  
> display(test_df)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33322) Dataframe: data is wrongly presented because of column name

2020-11-02 Thread Mihaly Hazag (Jira)
Mihaly Hazag created SPARK-33322:


 Summary: Dataframe: data is wrongly presented because of column 
name
 Key: SPARK-33322
 URL: https://issues.apache.org/jira/browse/SPARK-33322
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.4.5
Reporter: Mihaly Hazag


Consider the code below: `some_text` column got the `some_int` value, while its 
value is null in the dataframe.

  !image-2020-11-03-14-42-52-840.png!

 

Renaming the field from `some_text` to `some_apple`, fixed the problem! 

!image-2020-11-03-14-43-13-528.png!

 

Here is the code to reproduce the problem
{code:python}
from datetime import datetime
from pyspark.sql import Row
from pyspark.sql.types import StructType, StructField, DateType, StringType, 
IntegerType
 
schema = StructType(
  [
    StructField('dfdt', DateType(), True),
    StructField('some_text', StringType(), True),
    StructField('some_int', IntegerType(), True),
  ]
)
 
test_df = spark.createDataFrame([
  Row(dfdt=datetime.strptime('2020-12-18', '%Y-%m-%d'), some_text='cdsvg', 
some_int=100)
], schema)
 
display(test_df)
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33321) Migrate ANALYZE TABLE to new resolution framework

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225044#comment-17225044
 ] 

Apache Spark commented on SPARK-33321:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/30229

> Migrate ANALYZE TABLE to new resolution framework
> -
>
> Key: SPARK-33321
> URL: https://issues.apache.org/jira/browse/SPARK-33321
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Priority: Minor
>
> Migrate ANALYZE TABLE to new resolution framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33321) Migrate ANALYZE TABLE to new resolution framework

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33321:


Assignee: Apache Spark

> Migrate ANALYZE TABLE to new resolution framework
> -
>
> Key: SPARK-33321
> URL: https://issues.apache.org/jira/browse/SPARK-33321
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Minor
>
> Migrate ANALYZE TABLE to new resolution framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33321) Migrate ANALYZE TABLE to new resolution framework

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33321:


Assignee: (was: Apache Spark)

> Migrate ANALYZE TABLE to new resolution framework
> -
>
> Key: SPARK-33321
> URL: https://issues.apache.org/jira/browse/SPARK-33321
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Priority: Minor
>
> Migrate ANALYZE TABLE to new resolution framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33321) Migrate ANALYZE TABLE to new resolution framework

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225043#comment-17225043
 ] 

Apache Spark commented on SPARK-33321:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/30229

> Migrate ANALYZE TABLE to new resolution framework
> -
>
> Key: SPARK-33321
> URL: https://issues.apache.org/jira/browse/SPARK-33321
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Priority: Minor
>
> Migrate ANALYZE TABLE to new resolution framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33250) Migration to NumPy documentation style in SQL (pyspark.sql.*)

2020-11-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33250.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30181
[https://github.com/apache/spark/pull/30181]

> Migration to NumPy documentation style in SQL (pyspark.sql.*)
> -
>
> Key: SPARK-33250
> URL: https://issues.apache.org/jira/browse/SPARK-33250
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.1.0
>
>
> Migration to NumPy documentation style in SQL (pyspark.sql.*)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33250) Migration to NumPy documentation style in SQL (pyspark.sql.*)

2020-11-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33250:


Assignee: Hyukjin Kwon

> Migration to NumPy documentation style in SQL (pyspark.sql.*)
> -
>
> Key: SPARK-33250
> URL: https://issues.apache.org/jira/browse/SPARK-33250
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Migration to NumPy documentation style in SQL (pyspark.sql.*)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33321) Migrate ANALYZE TABLE to new resolution framework

2020-11-02 Thread Terry Kim (Jira)
Terry Kim created SPARK-33321:
-

 Summary: Migrate ANALYZE TABLE to new resolution framework
 Key: SPARK-33321
 URL: https://issues.apache.org/jira/browse/SPARK-33321
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Terry Kim


Migrate ANALYZE TABLE to new resolution framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33300) Rule SimplifyCasts will not work for nested columns

2020-11-02 Thread Aoyuan Liao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224937#comment-17224937
 ] 

Aoyuan Liao commented on SPARK-33300:
-

I would like to work on this.

> Rule SimplifyCasts will not work for nested columns
> ---
>
> Key: SPARK-33300
> URL: https://issues.apache.org/jira/browse/SPARK-33300
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.0
>Reporter: chendihao
>Priority: Minor
>
> We use SparkSQL and Catalyst to optimize the Spark job. We have read the 
> source code and test the rule of SimplifyCasts which will work for simple SQL 
> without nested cast.
> The SQL "select cast(string_date as string) from t1" will be optimized.
> {code:java}
> == Analyzed Logical Plan ==
> string_date: string
> Project [cast(string_date#12 as string) AS string_date#24]
> +- SubqueryAlias t1
>  +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, 
> string_timestamp#13, timestamp_field#14, bool_field#15], false
> == Optimized Logical Plan ==
> Project [string_date#12]
> +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, 
> string_timestamp#13, timestamp_field#14, bool_field#15], false
> {code}
> However, it fail to optimize with the nested cast like this "select 
> cast(cast(string_date as string) as string) from t1".
> {code:java}
> == Analyzed Logical Plan ==
> CAST(CAST(string_date AS STRING) AS STRING): string
> Project [cast(cast(string_date#12 as string) as string) AS 
> CAST(CAST(string_date AS STRING) AS STRING)#24]
> +- SubqueryAlias t1
>  +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, 
> string_timestamp#13, timestamp_field#14, bool_field#15], false
> == Optimized Logical Plan ==
> Project [string_date#12 AS CAST(CAST(string_date AS STRING) AS STRING)#24]
> +- LogicalRDD [name#8, c1#9, c2#10, c5#11L, string_date#12, 
> string_timestamp#13, timestamp_field#14, bool_field#15], false
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24432) Add support for dynamic resource allocation

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224905#comment-17224905
 ] 

Dongjoon Hyun commented on SPARK-24432:
---

BTW, FYI, for K8s environment, the followings are the current status.
- The initial K8s dynamic allocation is already shipped at Apache Spark 3.0.0 
with shuffle tracking.
- The K8s dynamic allocation with storage migration is already in `master` 
branch for Apache Spark 3.1.0.


> Add support for dynamic resource allocation
> ---
>
> Key: SPARK-24432
> URL: https://issues.apache.org/jira/browse/SPARK-24432
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Yinan Li
>Priority: Major
>
> This is an umbrella ticket for work on adding support for dynamic resource 
> allocation into the Kubernetes mode. This requires a Kubernetes-specific 
> external shuffle service. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24432) Add support for dynamic resource allocation

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224905#comment-17224905
 ] 

Dongjoon Hyun edited comment on SPARK-24432 at 11/2/20, 7:17 PM:
-

BTW, FYI, for K8s environment, the followings are the current status.
- The initial K8s dynamic allocation is already shipped at Apache Spark 3.0.0 
with shuffle tracking.
- The K8s dynamic allocation with storage migration between executors is 
already in `master` branch for Apache Spark 3.1.0.



was (Author: dongjoon):
BTW, FYI, for K8s environment, the followings are the current status.
- The initial K8s dynamic allocation is already shipped at Apache Spark 3.0.0 
with shuffle tracking.
- The K8s dynamic allocation with storage migration is already in `master` 
branch for Apache Spark 3.1.0.


> Add support for dynamic resource allocation
> ---
>
> Key: SPARK-24432
> URL: https://issues.apache.org/jira/browse/SPARK-24432
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Yinan Li
>Priority: Major
>
> This is an umbrella ticket for work on adding support for dynamic resource 
> allocation into the Kubernetes mode. This requires a Kubernetes-specific 
> external shuffle service. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24432) Add support for dynamic resource allocation

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224901#comment-17224901
 ] 

Dongjoon Hyun edited comment on SPARK-24432 at 11/2/20, 7:15 PM:
-

SPARK-30602 is focusing on YARN environment. I don't think that is targeting 
K8s yet.

But, I agree with [~aryaKetan] that this issue should be refreshed.


was (Author: dongjoon):
SPARK-30602 is focusing on YARN environment. I don't think that is targeting 
K8s yet.

> Add support for dynamic resource allocation
> ---
>
> Key: SPARK-24432
> URL: https://issues.apache.org/jira/browse/SPARK-24432
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Yinan Li
>Priority: Major
>
> This is an umbrella ticket for work on adding support for dynamic resource 
> allocation into the Kubernetes mode. This requires a Kubernetes-specific 
> external shuffle service. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24432) Add support for dynamic resource allocation

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224901#comment-17224901
 ] 

Dongjoon Hyun commented on SPARK-24432:
---

SPARK-30602 is focusing on YARN environment. I don't think that is targeting 
K8s yet.

> Add support for dynamic resource allocation
> ---
>
> Key: SPARK-24432
> URL: https://issues.apache.org/jira/browse/SPARK-24432
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Yinan Li
>Priority: Major
>
> This is an umbrella ticket for work on adding support for dynamic resource 
> allocation into the Kubernetes mode. This requires a Kubernetes-specific 
> external shuffle service. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33282) Replace Probot Autolabeler with Github Action

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224900#comment-17224900
 ] 

Dongjoon Hyun commented on SPARK-33282:
---

+1

> Replace Probot Autolabeler with Github Action
> -
>
> Key: SPARK-33282
> URL: https://issues.apache.org/jira/browse/SPARK-33282
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.0.1
>Reporter: Kyle Bendickson
>Priority: Major
>
> The Probot Autolabeler that we were using in both the Iceberg and the Spark 
> repo is no longer working. I've confirmed that with the devleper, github user 
> [at]mithro, who has indicated that the Probot Autolabeler is end of life and 
> will not be maintained moving forward.
> PRs have not been labeled for a few weeks now.
>  
> As I'm already interfacing with ASF Infra to have the probot permissions 
> revoked from the Iceberg repo, and I've already submitted a patch to switch 
> Iceberg to the standard github labeler action, I figured I would go ahead and 
> volunteer myself to switch the Spark repo as well.
> I will have a patch to switch to the new github labeler open within a few 
> days.
>  
> Also thank you [~blue] (or [~holden]) for shepherding this! I didn't exactly 
> ask, but it was understood in our group meeting for Iceberg that I'd be 
> converting our labeler there so I figured I'd tackle the spark issue while 
> I'm getting my hands into the labeling configs anyway =)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33317:
--
Priority: Major  (was: Blocker)

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Major
> Attachments: farmers.csv
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224896#comment-17224896
 ] 

Dongjoon Hyun commented on SPARK-33317:
---

In Apache Spark 2.4.7, the following is the result for me. Could you provide 
your procedure?

{code}
scala> spark.version
res0: String = 2.4.7

scala> spark.read.option("header", 
true).csv("/tmp/csv/farmers.csv").createOrReplaceTempView("farmers")

scala> sql("select fmid from farmers where fmid between ' 1000405134' and 
'1000772585' limit 3").show
+--+
|  fmid|
+--+
|1000405134|
|1000159765|
|1000489848|
+--+
{code}

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Blocker
> Attachments: farmers.csv
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32029) Check spark context is stoped when get active session

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32029.
---
Resolution: Won't Do

Please see the discussion on the closed PR.

> Check spark context is stoped when get active session
> -
>
> Key: SPARK-32029
> URL: https://issues.apache.org/jira/browse/SPARK-32029
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33299) Unify schema parsing in from_json/from_csv across all APIs

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33299:
-

Assignee: Maxim Gekk

> Unify schema parsing in from_json/from_csv across all APIs
> --
>
> Key: SPARK-33299
> URL: https://issues.apache.org/jira/browse/SPARK-33299
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Currently, from_json() has extra capability in Scala API. It accepts schema 
> in JSON format but other API (SQL, Python, R) lacks the feature. The ticket 
> aims to unify all APIs, and support schemas in JSON format everywhere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33299) Unify schema parsing in from_json/from_csv across all APIs

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33299.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30226
[https://github.com/apache/spark/pull/30226]

> Unify schema parsing in from_json/from_csv across all APIs
> --
>
> Key: SPARK-33299
> URL: https://issues.apache.org/jira/browse/SPARK-33299
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently, from_json() has extra capability in Scala API. It accepts schema 
> in JSON format but other API (SQL, Python, R) lacks the feature. The ticket 
> aims to unify all APIs, and support schemas in JSON format everywhere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Debadutta (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224829#comment-17224829
 ] 

Debadutta commented on SPARK-33317:
---

[^farmers.csv] Attached the farmers dataset. 
By hive connector I mean metastore-based default way to run sql query on hive 
within spark context. 

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Blocker
> Attachments: farmers.csv
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33312) Provide latest Spark 2.4.7 runnable distribution

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224827#comment-17224827
 ] 

Dongjoon Hyun commented on SPARK-33312:
---

Hi, [~dprateek]. Apache Spark 2.4.7 is already voted and released our website.
You should ask Apache Spark 2.4.8 release and Apache Spark has a release 
cadence.
2.4.8 will be released at early December 2020. So, please wait for a month.

> Provide latest Spark 2.4.7 runnable distribution
> 
>
> Key: SPARK-33312
> URL: https://issues.apache.org/jira/browse/SPARK-33312
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 2.4.7
>Reporter: Prateek Dubey
>Priority: Major
>
> Not sure if this is the right approach, however it would be great if latest 
> Spark 2.4.7 runnable distribution can be provided here - 
> [https://spark.apache.org/downloads.html]
> Currently it seems the last build was done on Sept 12th' 2020. 
> I'm working on running Spark workloads on EKS using EKS IRSA. I'm able to run 
> Spark workloads on EKS using IRSA with Spark 3.0/ Hadoop 3.2, however I want 
> to do the same with Spark 2.4.7/ Hadoop 2.7. 
> Recently this PR was merged with 2.4.x - 
> [https://github.com/apache/spark/pull/29877] and therefore I'm in need of 
> latest Spark distribution 
>  
> PS: I tried building latest Spark 2.4.7 myself as well using Maven, however 
> there are too many errors every-time when it reaches R, therefore it would be 
> great if Spark community itself can provide the latest build.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-33312) Provide latest Spark 2.4.7 runnable distribution

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-33312.
-

> Provide latest Spark 2.4.7 runnable distribution
> 
>
> Key: SPARK-33312
> URL: https://issues.apache.org/jira/browse/SPARK-33312
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 2.4.7
>Reporter: Prateek Dubey
>Priority: Major
>
> Not sure if this is the right approach, however it would be great if latest 
> Spark 2.4.7 runnable distribution can be provided here - 
> [https://spark.apache.org/downloads.html]
> Currently it seems the last build was done on Sept 12th' 2020. 
> I'm working on running Spark workloads on EKS using EKS IRSA. I'm able to run 
> Spark workloads on EKS using IRSA with Spark 3.0/ Hadoop 3.2, however I want 
> to do the same with Spark 2.4.7/ Hadoop 2.7. 
> Recently this PR was merged with 2.4.x - 
> [https://github.com/apache/spark/pull/29877] and therefore I'm in need of 
> latest Spark distribution 
>  
> PS: I tried building latest Spark 2.4.7 myself as well using Maven, however 
> there are too many errors every-time when it reaches R, therefore it would be 
> great if Spark community itself can provide the latest build.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33312) Provide latest Spark 2.4.7 runnable distribution

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33312.
---
Resolution: Not A Problem

> Provide latest Spark 2.4.7 runnable distribution
> 
>
> Key: SPARK-33312
> URL: https://issues.apache.org/jira/browse/SPARK-33312
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 2.4.7
>Reporter: Prateek Dubey
>Priority: Major
>
> Not sure if this is the right approach, however it would be great if latest 
> Spark 2.4.7 runnable distribution can be provided here - 
> [https://spark.apache.org/downloads.html]
> Currently it seems the last build was done on Sept 12th' 2020. 
> I'm working on running Spark workloads on EKS using EKS IRSA. I'm able to run 
> Spark workloads on EKS using IRSA with Spark 3.0/ Hadoop 3.2, however I want 
> to do the same with Spark 2.4.7/ Hadoop 2.7. 
> Recently this PR was merged with 2.4.x - 
> [https://github.com/apache/spark/pull/29877] and therefore I'm in need of 
> latest Spark distribution 
>  
> PS: I tried building latest Spark 2.4.7 myself as well using Maven, however 
> there are too many errors every-time when it reaches R, therefore it would be 
> great if Spark community itself can provide the latest build.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33312) Provide latest Spark 2.4.7 runnable distribution

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224827#comment-17224827
 ] 

Dongjoon Hyun edited comment on SPARK-33312 at 11/2/20, 5:30 PM:
-

Hi, [~dprateek]. Apache Spark 2.4.7 is already voted and released at our 
website.
You should ask Apache Spark 2.4.8 release and Apache Spark has a release 
cadence.
2.4.8 will be released at early December 2020. So, please wait for a month.


was (Author: dongjoon):
Hi, [~dprateek]. Apache Spark 2.4.7 is already voted and released our website.
You should ask Apache Spark 2.4.8 release and Apache Spark has a release 
cadence.
2.4.8 will be released at early December 2020. So, please wait for a month.

> Provide latest Spark 2.4.7 runnable distribution
> 
>
> Key: SPARK-33312
> URL: https://issues.apache.org/jira/browse/SPARK-33312
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 2.4.7
>Reporter: Prateek Dubey
>Priority: Major
>
> Not sure if this is the right approach, however it would be great if latest 
> Spark 2.4.7 runnable distribution can be provided here - 
> [https://spark.apache.org/downloads.html]
> Currently it seems the last build was done on Sept 12th' 2020. 
> I'm working on running Spark workloads on EKS using EKS IRSA. I'm able to run 
> Spark workloads on EKS using IRSA with Spark 3.0/ Hadoop 3.2, however I want 
> to do the same with Spark 2.4.7/ Hadoop 2.7. 
> Recently this PR was merged with 2.4.x - 
> [https://github.com/apache/spark/pull/29877] and therefore I'm in need of 
> latest Spark distribution 
>  
> PS: I tried building latest Spark 2.4.7 myself as well using Maven, however 
> there are too many errors every-time when it reaches R, therefore it would be 
> great if Spark community itself can provide the latest build.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Debadutta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Debadutta updated SPARK-33317:
--
Attachment: farmers.csv

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Blocker
> Attachments: farmers.csv
>
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224824#comment-17224824
 ] 

Dongjoon Hyun commented on SPARK-33317:
---

Hi, [~dgodnaik].
1. What is the data inside `farmers` table?
2. What is `hive connector` you are referring?

> Spark Hive SQL returning empty dataframe
> 
>
> Key: SPARK-33317
> URL: https://issues.apache.org/jira/browse/SPARK-33317
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.4.6
>Reporter: Debadutta
>Priority: Blocker
>
> I am trying to run a sql query on a hive table using hive connector in spark 
> but I am getting an empty dataframe. The query I am trying to run:-
> {{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
> and '1000772585'")}}
> This is failing but if I remove the leading whitespaces it works.
> {{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
> and '1000772585'")}}
> Currently, I am removing leading and trailing whitespaces as a workaround. 
> But the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33318) Ability to set Dynamodb table name while reading from Kinesis

2020-11-02 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224822#comment-17224822
 ] 

Dongjoon Hyun commented on SPARK-33318:
---

Thank you for filing a JIRA issue, [~chethan_g].
For new features and improvement, Apache Spark community deliver it in a new 
release like 3.1.0. 

> Ability to set Dynamodb table name while reading from Kinesis
> -
>
> Key: SPARK-33318
> URL: https://issues.apache.org/jira/browse/SPARK-33318
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: chethan gowda
>Priority: Minor
>
> * Need the ability to set dynamodb table while reading data from kinesis. The 
> KCL library provides the ability to set dynamodb table name. example: 
> [https://aws.amazon.com/premiumsupport/knowledge-center/kinesis-kcl-apps-dynamodb-table/]
>  . We would like to have a similar interface to pass the dynamodb table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33318) Ability to set Dynamodb table name while reading from Kinesis

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33318:
--
Affects Version/s: (was: 2.4.7)
   3.1.0

> Ability to set Dynamodb table name while reading from Kinesis
> -
>
> Key: SPARK-33318
> URL: https://issues.apache.org/jira/browse/SPARK-33318
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: chethan gowda
>Priority: Minor
>
> * Need the ability to set dynamodb table while reading data from kinesis. The 
> KCL library provides the ability to set dynamodb table name. example: 
> [https://aws.amazon.com/premiumsupport/knowledge-center/kinesis-kcl-apps-dynamodb-table/]
>  . We would like to have a similar interface to pass the dynamodb table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2020-11-02 Thread Itay Bittan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224799#comment-17224799
 ] 

Itay Bittan commented on SPARK-26365:
-

thanks [~oscar.bonilla].

We ended up with a temporary solution:
{code:java}
spark-submit .. 2>&1 | tee output.log ; grep -q \"exit code: 0\" 
output.log{code}

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Oscar Bonilla
>Priority: Minor
> Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, 
> spark-3.0.0-raise-exception-k8s-failure.patch
>
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33319) Add all built-in SerDes to HiveSerDeReadWriteSuite

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33319:
-

Assignee: Yuming Wang

> Add all built-in SerDes to HiveSerDeReadWriteSuite
> --
>
> Key: SPARK-33319
> URL: https://issues.apache.org/jira/browse/SPARK-33319
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33319) Add all built-in SerDes to HiveSerDeReadWriteSuite

2020-11-02 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33319.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30228
[https://github.com/apache/spark/pull/30228]

> Add all built-in SerDes to HiveSerDeReadWriteSuite
> --
>
> Key: SPARK-33319
> URL: https://issues.apache.org/jira/browse/SPARK-33319
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33320) ExecutorMetrics are not written to CSV and StatsD sinks

2020-11-02 Thread Peter Podlovics (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Podlovics updated SPARK-33320:

Affects Version/s: (was: 2.4.6)
   2.4.4
  Environment: 
I was using Spark 2.4.4 on EMR with YARN. The relevant part of the config is 
below:
{noformat}
spark.metrics.executorMetricsSource.enabled=true
spark.eventLog.logStageExecutorMetrics=true
spark.metrics.conf.*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink
spark.metrics.conf.*.sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet
spark.metrics.conf.*.sink.servlet.path=/home/hadoop/metrics/json
spark.metrics.conf.*.sink.statsd.class=org.apache.spark.metrics.sink.StatsdSink
spark.metrics.conf.*.sink.statsd.host=localhost
spark.metrics.conf.*.sink.statsd.port=8125
spark.metrics.conf.*.sink.statsd.period=10
spark.metrics.conf.*.sink.statsd.unit=seconds
spark.metrics.conf.*.sink.statsd.prefix=spark
master.sink.servlet.path=/home/hadoop/metrics/master/json
applications.sink.servlet.path=/home/hadoop/metrics/applications/json
{noformat}

  was:
I used the following configuration while running Spark on YARN:
{noformat}
spark.metrics.executorMetricsSource.enabled=true
spark.eventLog.logStageExecutorMetrics=true
spark.metrics.conf.*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink
spark.metrics.conf.*.sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet
spark.metrics.conf.*.sink.servlet.path=/home/hadoop/metrics/json
spark.metrics.conf.*.sink.statsd.class=org.apache.spark.metrics.sink.StatsdSink
spark.metrics.conf.*.sink.statsd.host=localhost
spark.metrics.conf.*.sink.statsd.port=8125
spark.metrics.conf.*.sink.statsd.period=10
spark.metrics.conf.*.sink.statsd.unit=seconds
spark.metrics.conf.*.sink.statsd.prefix=spark
master.sink.servlet.path=/home/hadoop/metrics/master/json
applications.sink.servlet.path=/home/hadoop/metrics/applications/json
{noformat}


> ExecutorMetrics are not written to CSV and StatsD sinks
> ---
>
> Key: SPARK-33320
> URL: https://issues.apache.org/jira/browse/SPARK-33320
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: I was using Spark 2.4.4 on EMR with YARN. The relevant 
> part of the config is below:
> {noformat}
> spark.metrics.executorMetricsSource.enabled=true
> spark.eventLog.logStageExecutorMetrics=true
> spark.metrics.conf.*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink
> spark.metrics.conf.*.sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet
> spark.metrics.conf.*.sink.servlet.path=/home/hadoop/metrics/json
> spark.metrics.conf.*.sink.statsd.class=org.apache.spark.metrics.sink.StatsdSink
> spark.metrics.conf.*.sink.statsd.host=localhost
> spark.metrics.conf.*.sink.statsd.port=8125
> spark.metrics.conf.*.sink.statsd.period=10
> spark.metrics.conf.*.sink.statsd.unit=seconds
> spark.metrics.conf.*.sink.statsd.prefix=spark
> master.sink.servlet.path=/home/hadoop/metrics/master/json
> applications.sink.servlet.path=/home/hadoop/metrics/applications/json
> {noformat}
>Reporter: Peter Podlovics
>Priority: Major
>
> Metrics from the {{ExecutorMetrics}} namespace are not written to the CSV and 
> StatsD sinks, even though some of them is available through the REST API 
> (e.g.: {{memoryMetrics.usedOnHeapStorageMemory}}).
> I couldn't find the {{ExecutorMetrics}} either on the driver or the workers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33320) ExecutorMetrics are not written to CSV and StatsD sinks

2020-11-02 Thread Peter Podlovics (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Podlovics updated SPARK-33320:

Affects Version/s: (was: 2.2.1)
   2.4.6

> ExecutorMetrics are not written to CSV and StatsD sinks
> ---
>
> Key: SPARK-33320
> URL: https://issues.apache.org/jira/browse/SPARK-33320
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
> Environment: I used the following configuration while running Spark 
> on YARN:
> {noformat}
> spark.metrics.executorMetricsSource.enabled=true
> spark.eventLog.logStageExecutorMetrics=true
> spark.metrics.conf.*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink
> spark.metrics.conf.*.sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet
> spark.metrics.conf.*.sink.servlet.path=/home/hadoop/metrics/json
> spark.metrics.conf.*.sink.statsd.class=org.apache.spark.metrics.sink.StatsdSink
> spark.metrics.conf.*.sink.statsd.host=localhost
> spark.metrics.conf.*.sink.statsd.port=8125
> spark.metrics.conf.*.sink.statsd.period=10
> spark.metrics.conf.*.sink.statsd.unit=seconds
> spark.metrics.conf.*.sink.statsd.prefix=spark
> master.sink.servlet.path=/home/hadoop/metrics/master/json
> applications.sink.servlet.path=/home/hadoop/metrics/applications/json
> {noformat}
>Reporter: Peter Podlovics
>Priority: Major
>
> Metrics from the {{ExecutorMetrics}} namespace are not written to the CSV and 
> StatsD sinks, even though some of them is available through the REST API 
> (e.g.: {{memoryMetrics.usedOnHeapStorageMemory}}).
> I couldn't find the {{ExecutorMetrics}} either on the driver or the workers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33320) ExecutorMetrics are not written to CSV and StatsD sinks

2020-11-02 Thread Peter Podlovics (Jira)
Peter Podlovics created SPARK-33320:
---

 Summary: ExecutorMetrics are not written to CSV and StatsD sinks
 Key: SPARK-33320
 URL: https://issues.apache.org/jira/browse/SPARK-33320
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.1
 Environment: I used the following configuration while running Spark on 
YARN:
{noformat}
spark.metrics.executorMetricsSource.enabled=true
spark.eventLog.logStageExecutorMetrics=true
spark.metrics.conf.*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink
spark.metrics.conf.*.sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet
spark.metrics.conf.*.sink.servlet.path=/home/hadoop/metrics/json
spark.metrics.conf.*.sink.statsd.class=org.apache.spark.metrics.sink.StatsdSink
spark.metrics.conf.*.sink.statsd.host=localhost
spark.metrics.conf.*.sink.statsd.port=8125
spark.metrics.conf.*.sink.statsd.period=10
spark.metrics.conf.*.sink.statsd.unit=seconds
spark.metrics.conf.*.sink.statsd.prefix=spark
master.sink.servlet.path=/home/hadoop/metrics/master/json
applications.sink.servlet.path=/home/hadoop/metrics/applications/json
{noformat}
Reporter: Peter Podlovics


Metrics from the {{ExecutorMetrics}} namespace are not written to the CSV and 
StatsD sinks, even though some of them is available through the REST API (e.g.: 
{{memoryMetrics.usedOnHeapStorageMemory}}).

I couldn't find the {{ExecutorMetrics}} either on the driver or the workers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33273) Fix Flaky Test: ThriftServerQueryTestSuite. subquery_scalar_subquery_scalar_subquery_select_sql

2020-11-02 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224705#comment-17224705
 ] 

Yuming Wang commented on SPARK-33273:
-

I have no idea. I cannot reproduce locally.

> Fix Flaky Test: ThriftServerQueryTestSuite. 
> subquery_scalar_subquery_scalar_subquery_select_sql
> ---
>
> Key: SPARK-33273
> URL: https://issues.apache.org/jira/browse/SPARK-33273
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>  Labels: correctness
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130369/testReport/org.apache.spark.sql.hive.thriftserver/ThriftServerQueryTestSuite/subquery_scalar_subquery_scalar_subquery_select_sql/
> {code}
> [info] - subquery/scalar-subquery/scalar-subquery-select.sql *** FAILED *** 
> (3 seconds, 877 milliseconds)
> [info]   Expected "[1]0   2017-05-04 01:01:0...", but got "[]0
> 2017-05-04 01:01:0..." Result did not match for query #3
> [info]   SELECT (SELECT min(t3d) FROM t3) min_t3d,
> [info]  (SELECT max(t2h) FROM t2) max_t2h
> [info]   FROM   t1
> [info]   WHERE  t1a = 'val1c' (ThriftServerQueryTestSuite.scala:197)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33319) Add all built-in SerDes to HiveSerDeReadWriteSuite

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33319:


Assignee: (was: Apache Spark)

> Add all built-in SerDes to HiveSerDeReadWriteSuite
> --
>
> Key: SPARK-33319
> URL: https://issues.apache.org/jira/browse/SPARK-33319
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33319) Add all built-in SerDes to HiveSerDeReadWriteSuite

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33319:


Assignee: Apache Spark

> Add all built-in SerDes to HiveSerDeReadWriteSuite
> --
>
> Key: SPARK-33319
> URL: https://issues.apache.org/jira/browse/SPARK-33319
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33319) Add all built-in SerDes to HiveSerDeReadWriteSuite

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224704#comment-17224704
 ] 

Apache Spark commented on SPARK-33319:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30228

> Add all built-in SerDes to HiveSerDeReadWriteSuite
> --
>
> Key: SPARK-33319
> URL: https://issues.apache.org/jira/browse/SPARK-33319
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33319) Add all built-in SerDes to HiveSerDeReadWriteSuite

2020-11-02 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-33319:
---

 Summary: Add all built-in SerDes to HiveSerDeReadWriteSuite
 Key: SPARK-33319
 URL: https://issues.apache.org/jira/browse/SPARK-33319
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33257) Support Column inputs in PySpark ordering functions (asc*, desc*)

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224666#comment-17224666
 ] 

Apache Spark commented on SPARK-33257:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/30227

> Support Column inputs in PySpark ordering functions (asc*, desc*)
> -
>
> Key: SPARK-33257
> URL: https://issues.apache.org/jira/browse/SPARK-33257
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.1.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> According to SPARK-26979, PySpark functions should support both {{Column}} 
> and {{str}} arguments, when possible.
> However, the following ordering support only {{str}}
> - {{asc}}
> - {{desc}}
> - {{asc_nulls_first}}
> - {{asc_nulls_last}}
> - {{desc_nulls_first}}
> - {{desc_nulls_last}}
> support only {{str}}. This is because Scala side doesn't provide {{Column => 
> Column}} variants.
> To fix this, we do one of the following:
> - Call corresponding {{Column}} methods as 
> [suggested|https://github.com/apache/spark/pull/30143#discussion_r512366978] 
> by  [~hyukjin.kwon]
> - Add missing signatures on Scala side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33257) Support Column inputs in PySpark ordering functions (asc*, desc*)

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224665#comment-17224665
 ] 

Apache Spark commented on SPARK-33257:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/30227

> Support Column inputs in PySpark ordering functions (asc*, desc*)
> -
>
> Key: SPARK-33257
> URL: https://issues.apache.org/jira/browse/SPARK-33257
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.1.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> According to SPARK-26979, PySpark functions should support both {{Column}} 
> and {{str}} arguments, when possible.
> However, the following ordering support only {{str}}
> - {{asc}}
> - {{desc}}
> - {{asc_nulls_first}}
> - {{asc_nulls_last}}
> - {{desc_nulls_first}}
> - {{desc_nulls_last}}
> support only {{str}}. This is because Scala side doesn't provide {{Column => 
> Column}} variants.
> To fix this, we do one of the following:
> - Call corresponding {{Column}} methods as 
> [suggested|https://github.com/apache/spark/pull/30143#discussion_r512366978] 
> by  [~hyukjin.kwon]
> - Add missing signatures on Scala side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33257) Support Column inputs in PySpark ordering functions (asc*, desc*)

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33257:


Assignee: (was: Apache Spark)

> Support Column inputs in PySpark ordering functions (asc*, desc*)
> -
>
> Key: SPARK-33257
> URL: https://issues.apache.org/jira/browse/SPARK-33257
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.1.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> According to SPARK-26979, PySpark functions should support both {{Column}} 
> and {{str}} arguments, when possible.
> However, the following ordering support only {{str}}
> - {{asc}}
> - {{desc}}
> - {{asc_nulls_first}}
> - {{asc_nulls_last}}
> - {{desc_nulls_first}}
> - {{desc_nulls_last}}
> support only {{str}}. This is because Scala side doesn't provide {{Column => 
> Column}} variants.
> To fix this, we do one of the following:
> - Call corresponding {{Column}} methods as 
> [suggested|https://github.com/apache/spark/pull/30143#discussion_r512366978] 
> by  [~hyukjin.kwon]
> - Add missing signatures on Scala side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33257) Support Column inputs in PySpark ordering functions (asc*, desc*)

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33257:


Assignee: Apache Spark

> Support Column inputs in PySpark ordering functions (asc*, desc*)
> -
>
> Key: SPARK-33257
> URL: https://issues.apache.org/jira/browse/SPARK-33257
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.1.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Major
>
> According to SPARK-26979, PySpark functions should support both {{Column}} 
> and {{str}} arguments, when possible.
> However, the following ordering support only {{str}}
> - {{asc}}
> - {{desc}}
> - {{asc_nulls_first}}
> - {{asc_nulls_last}}
> - {{desc_nulls_first}}
> - {{desc_nulls_last}}
> support only {{str}}. This is because Scala side doesn't provide {{Column => 
> Column}} variants.
> To fix this, we do one of the following:
> - Call corresponding {{Column}} methods as 
> [suggested|https://github.com/apache/spark/pull/30143#discussion_r512366978] 
> by  [~hyukjin.kwon]
> - Add missing signatures on Scala side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33318) Ability to set Dynamodb table name while reading from Kinesis

2020-11-02 Thread chethan gowda (Jira)
chethan gowda created SPARK-33318:
-

 Summary: Ability to set Dynamodb table name while reading from 
Kinesis
 Key: SPARK-33318
 URL: https://issues.apache.org/jira/browse/SPARK-33318
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 2.4.7
Reporter: chethan gowda


* Need the ability to set dynamodb table while reading data from kinesis. The 
KCL library provides the ability to set dynamodb table name. example: 
[https://aws.amazon.com/premiumsupport/knowledge-center/kinesis-kcl-apps-dynamodb-table/]
 . We would like to have a similar interface to pass the dynamodb table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2020-11-02 Thread Oscar Cassetti (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224648#comment-17224648
 ] 

Oscar Cassetti commented on SPARK-26365:


[~itayb]  I have a patch [^spark-3.0.0-raise-exception-k8s-failure.patch]   
which I tested for spark-3.0.0. It is not pretty but it does the job

I also have one for v2.4.5 [^spark-2.4.5-raise-exception-k8s-failure.patch] 
again the code is a bit ugly but I have been using it in production since June

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Oscar Bonilla
>Priority: Minor
> Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, 
> spark-3.0.0-raise-exception-k8s-failure.patch
>
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2020-11-02 Thread Oscar Cassetti (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar Cassetti updated SPARK-26365:
---
Attachment: spark-2.4.5-raise-exception-k8s-failure.patch

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Oscar Bonilla
>Priority: Minor
> Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, 
> spark-3.0.0-raise-exception-k8s-failure.patch
>
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2020-11-02 Thread Oscar Cassetti (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar Cassetti updated SPARK-26365:
---
Attachment: spark-3.0.0-raise-exception-k8s-failure.patch

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Oscar Bonilla
>Priority: Minor
> Attachments: spark-3.0.0-raise-exception-k8s-failure.patch
>
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33299) Unify schema parsing in from_json/from_csv across all APIs

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224602#comment-17224602
 ] 

Apache Spark commented on SPARK-33299:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30226

> Unify schema parsing in from_json/from_csv across all APIs
> --
>
> Key: SPARK-33299
> URL: https://issues.apache.org/jira/browse/SPARK-33299
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Currently, from_json() has extra capability in Scala API. It accepts schema 
> in JSON format but other API (SQL, Python, R) lacks the feature. The ticket 
> aims to unify all APIs, and support schemas in JSON format everywhere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28180) Encoding CSV to Pojo works with Encoders.bean on RDD but fail on asserts when attemtping it from a Dataset

2020-11-02 Thread Julien (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224601#comment-17224601
 ] 

Julien commented on SPARK-28180:


Not too sure what your issue is. It is to indicate that a meaningful error 
message should be produced?

BTW, stating that "_Scala_ costs a lot to _Spark"_ cannot be correct: spark is 
implemented in scala :)

However, the java API is not at the same level as the scala API. I myself have 
several issue with the Encoders.bean tool and not being able to construct 
own-made encoders for java POJO. Automatic is not always a good idea. In your 
case, the automatic parsing of the POJO getters to list the encoder fields is 
not helpful...

> Encoding CSV to Pojo works with Encoders.bean on RDD but fail on asserts when 
> attemtping it from a Dataset
> ---
>
> Key: SPARK-28180
> URL: https://issues.apache.org/jira/browse/SPARK-28180
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: Debian 9, Java 8.
>Reporter: Marc Le Bihan
>Priority: Major
>
> I am converting an _RDD_ spark program to a _Dataset_ one.
> Once, it was converting a CSV file mapped the help of a Jackson loader to a 
> RDD of Enterprise objects with Encoders.bean(Entreprise.class), and now it is 
> doing the conversion more simplier, by loading the CSV content into a 
> Dataset and applying the _Encoders.bean(Entreprise.class)_ on it.
> {code:java}
>   Dataset csv = this.session.read().format("csv")
>  .option("header","true").option("quote", "\"").option("escape", "\"")
>  .load(source.getAbsolutePath())
>  .selectExpr(
> "ActivitePrincipaleUniteLegale as ActivitePrincipale",
> "CAST(AnneeCategorieEntreprise as INTEGER) as 
> AnneeCategorieEntreprise",
> "CAST(AnneeEffectifsUniteLegale as INTEGER) as 
> AnneeValiditeEffectifSalarie",
> "CAST(CaractereEmployeurUniteLegale == 'O' as BOOLEAN) as 
> CaractereEmployeur",
> "CategorieEntreprise", 
> "CategorieJuridiqueUniteLegale as CategorieJuridique",
> "DateCreationUniteLegale as DateCreationEntreprise", "DateDebut 
> as DateDebutHistorisation", "DateDernierTraitementUniteLegale as 
> DateDernierTraitement",
> "DenominationUniteLegale as Denomination", 
> "DenominationUsuelle1UniteLegale as DenominationUsuelle1", 
> "DenominationUsuelle2UniteLegale as DenominationUsuelle2", 
> "DenominationUsuelle3UniteLegale as DenominationUsuelle3",
> "CAST(EconomieSocialeSolidaireUniteLegale == 'O' as BOOLEAN) as 
> EconomieSocialeSolidaire",
> "CAST(EtatAdministratifUniteLegale == 'A' as BOOLEAN) as Active",
> "IdentifiantAssociationUniteLegale as IdentifiantAssociation",
> "NicSiegeUniteLegale as NicSiege", 
> "CAST(NombrePeriodesUniteLegale as INTEGER) as NombrePeriodes",
> "NomenclatureActivitePrincipaleUniteLegale as 
> NomenclatureActivitePrincipale",
> "NomUniteLegale as NomNaissance", "NomUsageUniteLegale as 
> NomUsage",
> "Prenom1UniteLegale as Prenom1", "Prenom2UniteLegale as Prenom2", 
> "Prenom3UniteLegale as Prenom3", "Prenom4UniteLegale as Prenom4", 
> "PrenomUsuelUniteLegale as PrenomUsuel",
> "PseudonymeUniteLegale as Pseudonyme",
> "SexeUniteLegale as Sexe", 
> "SigleUniteLegale as Sigle", 
> "Siren", 
> "TrancheEffectifsUniteLegale as TrancheEffectifSalarie"
>  );
> {code}
> The _Dataset_ is succesfully created. But the following call of 
> _Encoders.bean(Enterprise.class)_ fails :
> {code:java}
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:208)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:87)
>   at org.apache.spark.sql.Encoders$.bean(Encoders.scala:142)
>   at org.apache.spark.sql.Encoders.bean(Encoders.scala)
>   at 
> fr.ecoemploi.spark.entreprise.EntrepriseService.dsEntreprises(EntrepriseService.java:178)
>   at 
> test.fr.ecoemploi.spark.entreprise.EntreprisesIT.datasetEntreprises(EntreprisesIT.java:72)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:532)
>   at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:115)
> 

[jira] [Commented] (SPARK-33060) approxSimilarityJoin in Structured Stream causes state to explode in size

2020-11-02 Thread Bram van den Akker (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224589#comment-17224589
 ] 

Bram van den Akker commented on SPARK-33060:


[~tdas] any idea how this could be addressed? 

> approxSimilarityJoin in Structured Stream causes state to explode in size
> -
>
> Key: SPARK-33060
> URL: https://issues.apache.org/jira/browse/SPARK-33060
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Bram van den Akker
>Priority: Major
> Attachments: Screenshot 2020-10-01 at 16.03.26.png
>
>
> I'm writing a PySpark application that joins a static and streaming dataframe 
> together using the approxSimilarityJoin function from the ML package. Because 
> of the high volume of data, we need to apply a watermark to make sure a 
> minimal amount of state is preserved. However, the [approxSimilarityJoin 
> scala code contains a `distinct` 
> action|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala#L289]
>   right after it joins the two datasets together. This call results in a 
> state being created to account for late arriving data. 
> Watermarks created in the PySpark code are being ignored and still lead to 
> the state accumulating in size. 
> My expectation is that the watermarking is lost in between the communication 
> from Python to Scala. 
> I've created [this Stackoverflow 
> question|https://stackoverflow.com/questions/64157104/stream-static-join-without-aggregation-still-results-in-accumulating-spark-state]
>  earlier this week, but after more investigation this really seem like a bug 
> rather than a user error.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2020-11-02 Thread Itay Bittan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224569#comment-17224569
 ] 

Itay Bittan commented on SPARK-26365:
-

Hi, we are having the same issue.

It's critical in a scenario that triggers another job based on the first app 
success/failure.

Any idea for a workaround meanwhile?

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Oscar Bonilla
>Priority: Minor
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33187) Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33187:


Assignee: (was: Apache Spark)

> Add a check on the number of returned partitions in the 
> HiveShim#getPartitionsByFilter method
> -
>
> Key: SPARK-33187
> URL: https://issues.apache.org/jira/browse/SPARK-33187
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: jinhai
>Priority: Major
>
> In the method Shim#getPartitionsByFilter, when filter is empty or when the 
> hive table has a large number of partitions, calling getAllPartitionsMethod 
> or getPartitionsByFilterMethod will results in Driver OOM.
> I think we need add a check on the number of returned partitions by calling 
> Hive#getNumPartitionsByFilter, and add SQLConf 
> spark.sql.hive.metastorePartitionLimit, default value is 100_000



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33187) Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33187:


Assignee: Apache Spark

> Add a check on the number of returned partitions in the 
> HiveShim#getPartitionsByFilter method
> -
>
> Key: SPARK-33187
> URL: https://issues.apache.org/jira/browse/SPARK-33187
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: jinhai
>Assignee: Apache Spark
>Priority: Major
>
> In the method Shim#getPartitionsByFilter, when filter is empty or when the 
> hive table has a large number of partitions, calling getAllPartitionsMethod 
> or getPartitionsByFilterMethod will results in Driver OOM.
> I think we need add a check on the number of returned partitions by calling 
> Hive#getNumPartitionsByFilter, and add SQLConf 
> spark.sql.hive.metastorePartitionLimit, default value is 100_000



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33187) Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224567#comment-17224567
 ] 

Apache Spark commented on SPARK-33187:
--

User 'manbuyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30225

> Add a check on the number of returned partitions in the 
> HiveShim#getPartitionsByFilter method
> -
>
> Key: SPARK-33187
> URL: https://issues.apache.org/jira/browse/SPARK-33187
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: jinhai
>Priority: Major
>
> In the method Shim#getPartitionsByFilter, when filter is empty or when the 
> hive table has a large number of partitions, calling getAllPartitionsMethod 
> or getPartitionsByFilterMethod will results in Driver OOM.
> I think we need add a check on the number of returned partitions by calling 
> Hive#getNumPartitionsByFilter, and add SQLConf 
> spark.sql.hive.metastorePartitionLimit, default value is 100_000



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33317) Spark Hive SQL returning empty dataframe

2020-11-02 Thread Debadutta (Jira)
Debadutta created SPARK-33317:
-

 Summary: Spark Hive SQL returning empty dataframe
 Key: SPARK-33317
 URL: https://issues.apache.org/jira/browse/SPARK-33317
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Spark Shell
Affects Versions: 2.4.6
Reporter: Debadutta


I am trying to run a sql query on a hive table using hive connector in spark 
but I am getting an empty dataframe. The query I am trying to run:-

{{sparkSession.sql("select fmid from farmers where fmid between ' 1000405134' 
and '1000772585'")}}

This is failing but if I remove the leading whitespaces it works.

{{sparkSession.sql("select fmid from farmers where fmid between '1000405134' 
and '1000772585'")}}

Currently, I am removing leading and trailing whitespaces as a workaround. But 
the same query with whitespaces works fine in hive console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33316) Support nullable Avro schemas for non-nullable data in Avro writing

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33316:


Assignee: Apache Spark

> Support nullable Avro schemas for non-nullable data in Avro writing
> ---
>
> Key: SPARK-33316
> URL: https://issues.apache.org/jira/browse/SPARK-33316
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0, 3.0.1
>Reporter: Bo Zhang
>Assignee: Apache Spark
>Priority: Major
>
> Currently when users try to use nullable Avro schemas for non-nullable data 
> in Avro writing, Spark will throw a IncompatibleSchemaException.
> There are some cases when users do not have full control over the nullability 
> of the data, or the nullability of the Avro schemas they have to use. We 
> should support nullable Avro schemas for non-nullable data in Avro writing 
> for better usability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33316) Support nullable Avro schemas for non-nullable data in Avro writing

2020-11-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33316:


Assignee: (was: Apache Spark)

> Support nullable Avro schemas for non-nullable data in Avro writing
> ---
>
> Key: SPARK-33316
> URL: https://issues.apache.org/jira/browse/SPARK-33316
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0, 3.0.1
>Reporter: Bo Zhang
>Priority: Major
>
> Currently when users try to use nullable Avro schemas for non-nullable data 
> in Avro writing, Spark will throw a IncompatibleSchemaException.
> There are some cases when users do not have full control over the nullability 
> of the data, or the nullability of the Avro schemas they have to use. We 
> should support nullable Avro schemas for non-nullable data in Avro writing 
> for better usability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33316) Support nullable Avro schemas for non-nullable data in Avro writing

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224566#comment-17224566
 ] 

Apache Spark commented on SPARK-33316:
--

User 'bozhang2820' has created a pull request for this issue:
https://github.com/apache/spark/pull/30224

> Support nullable Avro schemas for non-nullable data in Avro writing
> ---
>
> Key: SPARK-33316
> URL: https://issues.apache.org/jira/browse/SPARK-33316
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0, 3.0.1
>Reporter: Bo Zhang
>Priority: Major
>
> Currently when users try to use nullable Avro schemas for non-nullable data 
> in Avro writing, Spark will throw a IncompatibleSchemaException.
> There are some cases when users do not have full control over the nullability 
> of the data, or the nullability of the Avro schemas they have to use. We 
> should support nullable Avro schemas for non-nullable data in Avro writing 
> for better usability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33187) Add a check on the number of returned partitions in the HiveShim#getPartitionsByFilter method

2020-11-02 Thread jinhai (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jinhai updated SPARK-33187:
---
Description: 
In the method Shim#getPartitionsByFilter, when filter is empty or when the hive 
table has a large number of partitions, calling getAllPartitionsMethod or 
getPartitionsByFilterMethod will results in Driver OOM.

I think we need add a check on the number of returned partitions by calling 
Hive#getNumPartitionsByFilter, and add SQLConf 
spark.sql.hive.metastorePartitionLimit, default value is 100_000

  was:
In the method Shim#getPartitionsByFilter, when filter is empty or when the hive 
table has a large number of partitions, calling getAllPartitionsMethod or 
getPartitionsByFilterMethod will results in Driver OOM.

I think we need add a check on the number of returned partitions by calling 
Hive#getNumPartitionsByFilter, and add SQLConf 
spark.sql.hive.exceeded.partition.limit, default value is 100_000


> Add a check on the number of returned partitions in the 
> HiveShim#getPartitionsByFilter method
> -
>
> Key: SPARK-33187
> URL: https://issues.apache.org/jira/browse/SPARK-33187
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: jinhai
>Priority: Major
>
> In the method Shim#getPartitionsByFilter, when filter is empty or when the 
> hive table has a large number of partitions, calling getAllPartitionsMethod 
> or getPartitionsByFilterMethod will results in Driver OOM.
> I think we need add a check on the number of returned partitions by calling 
> Hive#getNumPartitionsByFilter, and add SQLConf 
> spark.sql.hive.metastorePartitionLimit, default value is 100_000



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33306) TimezoneID is needed when there cast from Date to String

2020-11-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224525#comment-17224525
 ] 

Apache Spark commented on SPARK-33306:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30223

> TimezoneID is needed when there cast from Date to String
> 
>
> Key: SPARK-33306
> URL: https://issues.apache.org/jira/browse/SPARK-33306
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.1.0
>Reporter: EdisonWang
>Assignee: EdisonWang
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>
> A simple way to reproduce this is 
> {code}
> spark-shell --conf spark.sql.legacy.typeCoercion.datetimeToString.enabled
> scala> sql("""
> select a.d1 from
>  (select to_date(concat('2000-01-0', id)) as d1 from range(1, 2)) a
>  join
>  (select concat('2000-01-0', id) as d2 from range(1, 2)) b
>  on a.d1 = b.d2
> """).show
> {code}
>  
> it will throw
> {code}
> java.util.NoSuchElementException: None.get
>  at scala.None$.get(Option.scala:529)
>  at scala.None$.get(Option.scala:527)
>  at 
> org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId(datetimeExpressions.scala:56)
>  at 
> org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId$(datetimeExpressions.scala:56)
>  at 
> org.apache.spark.sql.catalyst.expressions.CastBase.zoneId$lzycompute(Cast.scala:253)
>  at org.apache.spark.sql.catalyst.expressions.CastBase.zoneId(Cast.scala:253)
>  at 
> org.apache.spark.sql.catalyst.expressions.CastBase.dateFormatter$lzycompute(Cast.scala:287)
>  at 
> org.apache.spark.sql.catalyst.expressions.CastBase.dateFormatter(Cast.scala:287)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >