[jira] [Resolved] (SPARK-30007) Publish snapshot/release with `-Phive-2.3` only

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30007.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 26648
[https://github.com/apache/spark/pull/26648]

> Publish snapshot/release with `-Phive-2.3` only
> ---
>
> Key: SPARK-30007
> URL: https://issues.apache.org/jira/browse/SPARK-30007
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30007) Publish snapshot/release with `-Phive-2.3` only

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-30007:
-

Assignee: Dongjoon Hyun

> Publish snapshot/release with `-Phive-2.3` only
> ---
>
> Key: SPARK-30007
> URL: https://issues.apache.org/jira/browse/SPARK-30007
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30007) Publish snapshot/release with `-Phive-2.3` only

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30007:
--
Fix Version/s: (was: 3.1.0)
   3.0.0

> Publish snapshot/release with `-Phive-2.3` only
> ---
>
> Key: SPARK-30007
> URL: https://issues.apache.org/jira/browse/SPARK-30007
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30007) Publish snapshot/release with `-Phive-2.3` only

2019-11-23 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-30007:
-

 Summary: Publish snapshot/release with `-Phive-2.3` only
 Key: SPARK-30007
 URL: https://issues.apache.org/jira/browse/SPARK-30007
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-29453) Improve tooltip information for SQL tab

2019-11-23 Thread Ankit Raj Boudh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Raj Boudh reopened SPARK-29453:
-

Need to add Tooltips for Duration field

> Improve tooltip information for SQL tab
> ---
>
> Key: SPARK-29453
> URL: https://issues.apache.org/jira/browse/SPARK-29453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29992) Make access to getActive public

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29992.
---
Resolution: Won't Do

I'm resolving this issue as `Won't Do` for now.

> Make access to getActive public
> ---
>
> Key: SPARK-29992
> URL: https://issues.apache.org/jira/browse/SPARK-29992
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.4
>Reporter: Nishkam Ravi
>Priority: Major
>
> Currently a simple get on activeContext is missing. getOrCreate creates a new 
> Spark context in the absence of an existing one but exposes activeContext 
> anyway. For use cases that need access to an existing context if one exists 
> without necessarily wanting to create one would benefit from this being 
> publicly accessible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29992) Make access to getActive public

2019-11-23 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980959#comment-16980959
 ] 

Dongjoon Hyun commented on SPARK-29992:
---

Thank you for filing a JIRA and making a PR, [~nishkamravi2]. Although it's not 
merged because Apache Spark community recommends `SparkSession` over 
`SparkContext`, this issue is a good reference for the others who have similar 
questions.

BTW, there is a community guide for `Fix Version`.
- https://spark.apache.org/contributing.html
{code}
Do not set the following fields:
- Fix Version. This is assigned by committers only when resolved.
- Target Version. This is assigned by committers to indicate a PR has been 
accepted for possible fix by the target version.
{code}

> Make access to getActive public
> ---
>
> Key: SPARK-29992
> URL: https://issues.apache.org/jira/browse/SPARK-29992
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.4
>Reporter: Nishkam Ravi
>Priority: Major
>
> Currently a simple get on activeContext is missing. getOrCreate creates a new 
> Spark context in the absence of an existing one but exposes activeContext 
> anyway. For use cases that need access to an existing context if one exists 
> without necessarily wanting to create one would benefit from this being 
> publicly accessible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29992) Make access to getActive public

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29992:
--
Fix Version/s: (was: 2.4.5)
   (was: 3.0.0)

> Make access to getActive public
> ---
>
> Key: SPARK-29992
> URL: https://issues.apache.org/jira/browse/SPARK-29992
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.4
>Reporter: Nishkam Ravi
>Priority: Major
>
> Currently a simple get on activeContext is missing. getOrCreate creates a new 
> Spark context in the absence of an existing one but exposes activeContext 
> anyway. For use cases that need access to an existing context if one exists 
> without necessarily wanting to create one would benefit from this being 
> publicly accessible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28812) Document SHOW PARTITIONS in SQL Reference.

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28812.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26635
[https://github.com/apache/spark/pull/26635]

> Document SHOW PARTITIONS in SQL Reference.
> --
>
> Key: SPARK-28812
> URL: https://issues.apache.org/jira/browse/SPARK-28812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28812) Document SHOW PARTITIONS in SQL Reference.

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28812:
-

Assignee: Dilip Biswal

> Document SHOW PARTITIONS in SQL Reference.
> --
>
> Key: SPARK-28812
> URL: https://issues.apache.org/jira/browse/SPARK-28812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29681) Spark Application UI- environment tab field "value" sort is not working

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29681.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26638
[https://github.com/apache/spark/pull/26638]

> Spark Application UI- environment tab field "value" sort is not working 
> 
>
> Key: SPARK-29681
> URL: https://issues.apache.org/jira/browse/SPARK-29681
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: jobit mathew
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: Ascend.png, DESCEND.png
>
>
> Spark Application UI-
> In environment tab, field "value" sort is not working if we do sort in 
> ascending or descending order.
> !Ascend.png|width=854,height=475!
>  
> !DESCEND.png|width=854,height=470!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30004) Allow UserDefinedType to be merged into a standard DateType

2019-11-23 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980941#comment-16980941
 ] 

Takeshi Yamamuro commented on SPARK-30004:
--

I reset the target version; you don't need to set it cuz that will be filled 
when resolved.

> Allow UserDefinedType to be merged into a standard DateType
> ---
>
> Key: SPARK-30004
> URL: https://issues.apache.org/jira/browse/SPARK-30004
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Fokko Driesprong
>Priority: Major
>
> I've registered a custom type, namely XMLGregorianCalendar which is being 
> used by Scalaxb. A XML databinding tool, for generating case classes based on 
> a XSD. I want to convert the XMLGregorianCalendar to a regular TimestampType. 
> This works, but when I update the table (using Delta), I get an error:
> Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and 
> 'START_DATE_MAINTENANCE_FLPL'.
> Failed to merge incompatible data types TimestampType and 
> org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;;
> There are two ways of fixing this:
> * Adding a rule which compares the sqlType. 
> * Change the compare function, so it will check the rhs to the lhs, so I can 
> override the equals function on the UserDefinedType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30004) Allow UserDefinedType to be merged into a standard DateType

2019-11-23 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-30004:
-
Affects Version/s: (was: 2.4.4)
   3.0.0

> Allow UserDefinedType to be merged into a standard DateType
> ---
>
> Key: SPARK-30004
> URL: https://issues.apache.org/jira/browse/SPARK-30004
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Fokko Driesprong
>Priority: Major
>
> I've registered a custom type, namely XMLGregorianCalendar which is being 
> used by Scalaxb. A XML databinding tool, for generating case classes based on 
> a XSD. I want to convert the XMLGregorianCalendar to a regular TimestampType. 
> This works, but when I update the table (using Delta), I get an error:
> Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and 
> 'START_DATE_MAINTENANCE_FLPL'.
> Failed to merge incompatible data types TimestampType and 
> org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;;
> There are two ways of fixing this:
> * Adding a rule which compares the sqlType. 
> * Change the compare function, so it will check the rhs to the lhs, so I can 
> override the equals function on the UserDefinedType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30004) Allow UserDefinedType to be merged into a standard DateType

2019-11-23 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-30004:
-
Fix Version/s: (was: 2.4.5)

> Allow UserDefinedType to be merged into a standard DateType
> ---
>
> Key: SPARK-30004
> URL: https://issues.apache.org/jira/browse/SPARK-30004
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Fokko Driesprong
>Priority: Major
>
> I've registered a custom type, namely XMLGregorianCalendar which is being 
> used by Scalaxb. A XML databinding tool, for generating case classes based on 
> a XSD. I want to convert the XMLGregorianCalendar to a regular TimestampType. 
> This works, but when I update the table (using Delta), I get an error:
> Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and 
> 'START_DATE_MAINTENANCE_FLPL'.
> Failed to merge incompatible data types TimestampType and 
> org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;;
> There are two ways of fixing this:
> * Adding a rule which compares the sqlType. 
> * Change the compare function, so it will check the rhs to the lhs, so I can 
> override the equals function on the UserDefinedType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29970) open/close state is not preserved for Timelineview

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29970.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26607
[https://github.com/apache/spark/pull/26607]

> open/close state is not preserved for Timelineview
> --
>
> Key: SPARK-29970
> URL: https://issues.apache.org/jira/browse/SPARK-29970
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.0.0
>
>
> open/close state for Timelineview is intended to be preserved as well as any 
> other UI features like DAG Vis but it's not actually preserved. So if we go 
> back to All Jobs page/Job page/Stage page from another page, Timelineview is 
> always closed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30006) printSchema indeterministic output

2019-11-23 Thread Hasil Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hasil Sharma updated SPARK-30006:
-
Description: 
printSchema doesn't give a consistent output in following example.

 
{code:python}
from pyspark.sql import SparkSession
from pyspark.sql import Row

spark = SparkSession.builder.appName("new-session").getOrCreate()
l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
rdd = spark.sparkContext.parallelize(l)
people = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))

df1 = spark.createDataFrame(people)

print(df1.printSchema())

df2 = df1.select("name", "age")

print(df2.printSchema())
{code}
 

first print outputs
{noformat}
root
|– age: long (nullable = true)
|– name: string (nullable = true)
{noformat}
 

second print outputs
{noformat}
root
|– name: string (nullable = true)
|– age: long (nullable = true)
{noformat}
Expectation: The output should be same because the column names are same.

  was:
printSchema doesn't give a consistent output in following example.

 
{code:python}
from pyspark.sql import SparkSession
from pyspark.sql import Row

spark = SparkSession.builder.appName("new-session").getOrCreate()
 l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
 rdd = spark.sparkContext.parallelize(l)
 people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))

df1 = spark.createDataFrame(people_1)

print(df1.printSchema())

df2 = df1.select("name", "age")

print(df2.printSchema())
{code}
 

first print outputs
{noformat}
root
|– age: long (nullable = true)
|– name: string (nullable = true)
{noformat}
 

second print outputs
{noformat}
root
|– name: string (nullable = true)
|– age: long (nullable = true)
{noformat}
Expectation: The output should be same because the column names are same.


> printSchema indeterministic output
> --
>
> Key: SPARK-30006
> URL: https://issues.apache.org/jira/browse/SPARK-30006
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Hasil Sharma
>Priority: Minor
>
> printSchema doesn't give a consistent output in following example.
>  
> {code:python}
> from pyspark.sql import SparkSession
> from pyspark.sql import Row
> spark = SparkSession.builder.appName("new-session").getOrCreate()
> l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
> rdd = spark.sparkContext.parallelize(l)
> people = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))
> df1 = spark.createDataFrame(people)
> print(df1.printSchema())
> df2 = df1.select("name", "age")
> print(df2.printSchema())
> {code}
>  
> first print outputs
> {noformat}
> root
> |– age: long (nullable = true)
> |– name: string (nullable = true)
> {noformat}
>  
> second print outputs
> {noformat}
> root
> |– name: string (nullable = true)
> |– age: long (nullable = true)
> {noformat}
> Expectation: The output should be same because the column names are same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30006) printSchema indeterministic output

2019-11-23 Thread Hasil Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hasil Sharma updated SPARK-30006:
-
Description: 
printSchema doesn't give a consistent output in following example.

 
{code:python}
from pyspark.sql import SparkSession
from pyspark.sql import Row

spark = SparkSession.builder.appName("new-session").getOrCreate()
 l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
 rdd = spark.sparkContext.parallelize(l)
 people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))

df1 = spark.createDataFrame(people_1)

print(df1.printSchema())

df2 = df1.select("name", "age")

print(df2.printSchema())
{code}
 

first print outputs
{noformat}
root
|– age: long (nullable = true)
|– name: string (nullable = true)
{noformat}
 

second print outputs
{noformat}
root
|– name: string (nullable = true)
|– age: long (nullable = true)
{noformat}
Expectation: The output should be same because the column names are same.

  was:
printSchema doesn't give a consistent output in following example.

 

```python

from pyspark.sql import SparkSession
 from pyspark.sql import Row

spark = SparkSession.builder.appName("new-session").getOrCreate()
 l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
 rdd = spark.sparkContext.parallelize(l)
 people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))

df1 = spark.createDataFrame(people_1)

print(df1.printSchema())

df2 = df1.select("name", "age")

print(df2.printSchema())

```

 

first print outputs

```root
|– age: long (nullable = true)|
|– name: string (nullable = true)|```

 

second print outputs

```root
|– name: string (nullable = true)|
|– age: long (nullable = true)|```

Expectation: The output should be same because the column names are same.


> printSchema indeterministic output
> --
>
> Key: SPARK-30006
> URL: https://issues.apache.org/jira/browse/SPARK-30006
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Hasil Sharma
>Priority: Minor
>
> printSchema doesn't give a consistent output in following example.
>  
> {code:python}
> from pyspark.sql import SparkSession
> from pyspark.sql import Row
> spark = SparkSession.builder.appName("new-session").getOrCreate()
>  l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
>  rdd = spark.sparkContext.parallelize(l)
>  people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))
> df1 = spark.createDataFrame(people_1)
> print(df1.printSchema())
> df2 = df1.select("name", "age")
> print(df2.printSchema())
> {code}
>  
> first print outputs
> {noformat}
> root
> |– age: long (nullable = true)
> |– name: string (nullable = true)
> {noformat}
>  
> second print outputs
> {noformat}
> root
> |– name: string (nullable = true)
> |– age: long (nullable = true)
> {noformat}
> Expectation: The output should be same because the column names are same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30006) printSchema indeterministic output

2019-11-23 Thread Hasil Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hasil Sharma updated SPARK-30006:
-
Description: 
printSchema doesn't give a consistent output in following example.

 

```python

from pyspark.sql import SparkSession
 from pyspark.sql import Row

spark = SparkSession.builder.appName("new-session").getOrCreate()
 l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
 rdd = spark.sparkContext.parallelize(l)
 people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))

df1 = spark.createDataFrame(people_1)

print(df1.printSchema())

df2 = df1.select("name", "age")

print(df2.printSchema())

```

 

first print outputs

```root
|– age: long (nullable = true)|
|– name: string (nullable = true)|```

 

second print outputs

```root
|– name: string (nullable = true)|
|– age: long (nullable = true)|```

Expectation: The output should be same because the column names are same.

  was:
printSchema doesn't give a consistent output in following example.

 

```python

from pyspark.sql import SparkSession
 from pyspark.sql import Row

spark = SparkSession.builder.appName("new-session").getOrCreate()
 l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
 rdd = spark.sparkContext.parallelize(l)
 people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))

df1 = spark.createDataFrame(people_1)

print(df1.printSchema())

df2 = df1.select("name", "age")

print(df2.printSchema())```

 

first print outputs

```
 
```

 

second print outputs

```

root
|– name: string (nullable = true)|
|– age: long (nullable = true)|

```

Expectation: The output should be same because the column names are same.


> printSchema indeterministic output
> --
>
> Key: SPARK-30006
> URL: https://issues.apache.org/jira/browse/SPARK-30006
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Hasil Sharma
>Priority: Minor
>
> printSchema doesn't give a consistent output in following example.
>  
> ```python
> from pyspark.sql import SparkSession
>  from pyspark.sql import Row
> spark = SparkSession.builder.appName("new-session").getOrCreate()
>  l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
>  rdd = spark.sparkContext.parallelize(l)
>  people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))
> df1 = spark.createDataFrame(people_1)
> print(df1.printSchema())
> df2 = df1.select("name", "age")
> print(df2.printSchema())
> ```
>  
> first print outputs
> ```root
> |– age: long (nullable = true)|
> |– name: string (nullable = true)|```
>  
> second print outputs
> ```root
> |– name: string (nullable = true)|
> |– age: long (nullable = true)|```
> Expectation: The output should be same because the column names are same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30006) printSchema indeterministic output

2019-11-23 Thread Hasil Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hasil Sharma updated SPARK-30006:
-
Description: 
printSchema doesn't give a consistent output in following example.

 

```python

from pyspark.sql import SparkSession
 from pyspark.sql import Row

spark = SparkSession.builder.appName("new-session").getOrCreate()
 l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
 rdd = spark.sparkContext.parallelize(l)
 people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))

df1 = spark.createDataFrame(people_1)

print(df1.printSchema())

df2 = df1.select("name", "age")

print(df2.printSchema())```

 

first print outputs

```
 
```

 

second print outputs

```

root
|– name: string (nullable = true)|
|– age: long (nullable = true)|

```

Expectation: The output should be same because the column names are same.

  was:
printSchema doesn't give a consistent output in following example.

 

```python

from pyspark.sql import SparkSession
from pyspark.sql import Row

spark = SparkSession.builder.appName("new-session").getOrCreate()
l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
rdd = spark.sparkContext.parallelize(l)
people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))

df1 = spark.createDataFrame(people_1)

print(df1.printSchema())

df2 = df1.select("name", "age")

print(df2.printSchema())

```

 

first print outputs

```

root
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)

```

 

second print outputs

```

root
 |-- name: string (nullable = true)
 |-- age: long (nullable = true)

```

Expectation: The output should be same because the column names are same.


> printSchema indeterministic output
> --
>
> Key: SPARK-30006
> URL: https://issues.apache.org/jira/browse/SPARK-30006
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Hasil Sharma
>Priority: Minor
>
> printSchema doesn't give a consistent output in following example.
>  
> ```python
> from pyspark.sql import SparkSession
>  from pyspark.sql import Row
> spark = SparkSession.builder.appName("new-session").getOrCreate()
>  l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
>  rdd = spark.sparkContext.parallelize(l)
>  people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))
> df1 = spark.createDataFrame(people_1)
> print(df1.printSchema())
> df2 = df1.select("name", "age")
> print(df2.printSchema())```
>  
> first print outputs
> ```
>  
> ```
>  
> second print outputs
> ```
> root
> |– name: string (nullable = true)|
> |– age: long (nullable = true)|
> ```
> Expectation: The output should be same because the column names are same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30006) printSchema indeterministic output

2019-11-23 Thread Hasil Sharma (Jira)
Hasil Sharma created SPARK-30006:


 Summary: printSchema indeterministic output
 Key: SPARK-30006
 URL: https://issues.apache.org/jira/browse/SPARK-30006
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Hasil Sharma


printSchema doesn't give a consistent output in following example.

 

```python

from pyspark.sql import SparkSession
from pyspark.sql import Row

spark = SparkSession.builder.appName("new-session").getOrCreate()
l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)]
rdd = spark.sparkContext.parallelize(l)
people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))

df1 = spark.createDataFrame(people_1)

print(df1.printSchema())

df2 = df1.select("name", "age")

print(df2.printSchema())

```

 

first print outputs

```

root
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)

```

 

second print outputs

```

root
 |-- name: string (nullable = true)
 |-- age: long (nullable = true)

```

Expectation: The output should be same because the column names are same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29988) Adjust Jenkins jobs for `hive-1.2/2.3` combination

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29988:
--
Description: 
We need to rename the following Jenkins jobs first.

spark-master-test-sbt-hadoop-2.7 -> spark-master-test-sbt-hadoop-2.7-hive-1.2
spark-master-test-sbt-hadoop-3.2 -> spark-master-test-sbt-hadoop-3.2-hive-2.3
spark-master-test-maven-hadoop-2.7 -> 
spark-master-test-maven-hadoop-2.7-hive-1.2
spark-master-test-maven-hadoop-3.2 -> 
spark-master-test-maven-hadoop-3.2-hive-2.3

Also, we need to add `-Phive-1.2` for the existing `hadoop-2.7` jobs.
{code}
-Phive \
+-Phive-1.2 \
{code}

And, we need to add `-Phive-2.3` for the existing `hadoop-3.2` jobs.
{code}
-Phive \
+-Phive-2.3 \
{code}

Now now, I added the above `-Phive-1.2` and `-Phive-2.3` to the Jenkins 
manually. (This should be added to SCM of AmpLab Jenkins.)

After SPARK-29981, we need to create two new jobs.
- spark-master-test-sbt-hadoop-2.7-hive-2.3
- spark-master-test-maven-hadoop-2.7-hive-2.3

This is for preparation for Apache Spark 3.0.0.
We may drop all `*-hive-1.2` jobs at Apache Spark 3.1.0.

  was:
We need to rename the following Jenkins jobs first.

spark-master-test-sbt-hadoop-2.7 -> spark-master-test-sbt-hadoop-2.7-hive-1.2
spark-master-test-sbt-hadoop-3.2 -> spark-master-test-sbt-hadoop-3.2-hive-2.3
spark-master-test-maven-hadoop-2.7 -> 
spark-master-test-maven-hadoop-2.7-hive-1.2
spark-master-test-maven-hadoop-3.2 -> 
spark-master-test-maven-hadoop-3.2-hive-2.3

After SPARK-29981, we need to create two new jobs.
- spark-master-test-sbt-hadoop-2.7-hive-2.3
- spark-master-test-maven-hadoop-2.7-hive-2.3

This is for preparation for Apache Spark 3.0.0.
We may drop all `*-hive-1.2` jobs at Apache Spark 3.1.0.


> Adjust Jenkins jobs for `hive-1.2/2.3` combination
> --
>
> Key: SPARK-29988
> URL: https://issues.apache.org/jira/browse/SPARK-29988
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
>
> We need to rename the following Jenkins jobs first.
> spark-master-test-sbt-hadoop-2.7 -> spark-master-test-sbt-hadoop-2.7-hive-1.2
> spark-master-test-sbt-hadoop-3.2 -> spark-master-test-sbt-hadoop-3.2-hive-2.3
> spark-master-test-maven-hadoop-2.7 -> 
> spark-master-test-maven-hadoop-2.7-hive-1.2
> spark-master-test-maven-hadoop-3.2 -> 
> spark-master-test-maven-hadoop-3.2-hive-2.3
> Also, we need to add `-Phive-1.2` for the existing `hadoop-2.7` jobs.
> {code}
> -Phive \
> +-Phive-1.2 \
> {code}
> And, we need to add `-Phive-2.3` for the existing `hadoop-3.2` jobs.
> {code}
> -Phive \
> +-Phive-2.3 \
> {code}
> Now now, I added the above `-Phive-1.2` and `-Phive-2.3` to the Jenkins 
> manually. (This should be added to SCM of AmpLab Jenkins.)
> After SPARK-29981, we need to create two new jobs.
> - spark-master-test-sbt-hadoop-2.7-hive-2.3
> - spark-master-test-maven-hadoop-2.7-hive-2.3
> This is for preparation for Apache Spark 3.0.0.
> We may drop all `*-hive-1.2` jobs at Apache Spark 3.1.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30005) Update `test-dependencies.sh` to check `hive-1.2/2.3` profile

2019-11-23 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-30005:
-

 Summary: Update `test-dependencies.sh` to check `hive-1.2/2.3` 
profile
 Key: SPARK-30005
 URL: https://issues.apache.org/jira/browse/SPARK-30005
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30004) Allow UserDefinedType to be merged into a standard DateType

2019-11-23 Thread Fokko Driesprong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong updated SPARK-30004:
-
Component/s: (was: Spark Core)
 SQL

> Allow UserDefinedType to be merged into a standard DateType
> ---
>
> Key: SPARK-30004
> URL: https://issues.apache.org/jira/browse/SPARK-30004
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 2.4.5
>
>
> I've registered a custom type, namely XMLGregorianCalendar which is being 
> used by Scalaxb. A XML databinding tool, for generating case classes based on 
> a XSD. I want to convert the XMLGregorianCalendar to a regular TimestampType. 
> This works, but when I update the table (using Delta), I get an error:
> Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and 
> 'START_DATE_MAINTENANCE_FLPL'.
> Failed to merge incompatible data types TimestampType and 
> org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;;
> There are two ways of fixing this:
> * Adding a rule which compares the sqlType. 
> * Change the compare function, so it will check the rhs to the lhs, so I can 
> override the equals function on the UserDefinedType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29931.
---
Resolution: Later

> Declare all SQL legacy configs as will be removed in Spark 4.0
> --
>
> Key: SPARK-29931
> URL: https://issues.apache.org/jira/browse/SPARK-29931
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Add the sentence to descriptions of all legacy SQL configs existed before 
> Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of 
> such configs:
> * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName
> * spark.sql.legacy.literal.pickMinimumPrecision
> * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation
> * spark.sql.legacy.sizeOfNull
> * spark.sql.legacy.replaceDatabricksSparkAvro.enabled
> * spark.sql.legacy.setopsPrecedence.enabled
> * spark.sql.legacy.integralDivide.returnBigint
> * spark.sql.legacy.bucketedTableScan.outputOrdering
> * spark.sql.legacy.parser.havingWithoutGroupByAsWhere
> * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue
> * spark.sql.legacy.setCommandRejectsSparkCoreConfs
> * spark.sql.legacy.utcTimestampFunc.enabled
> * spark.sql.legacy.typeCoercion.datetimeToString
> * spark.sql.legacy.looseUpcast
> * spark.sql.legacy.ctePrecedence.enabled
> * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29981) Add `hive-1.2` and `hive-2.3` profiles

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29981:
-

Assignee: Dongjoon Hyun

> Add `hive-1.2` and `hive-2.3` profiles
> --
>
> Key: SPARK-29981
> URL: https://issues.apache.org/jira/browse/SPARK-29981
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29981) Add `hive-1.2` and `hive-2.3` profiles

2019-11-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29981.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26619
[https://github.com/apache/spark/pull/26619]

> Add `hive-1.2` and `hive-2.3` profiles
> --
>
> Key: SPARK-29981
> URL: https://issues.apache.org/jira/browse/SPARK-29981
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30004) Allow UserDefinedType to be merged into a standard DateType

2019-11-23 Thread Fokko Driesprong (Jira)
Fokko Driesprong created SPARK-30004:


 Summary: Allow UserDefinedType to be merged into a standard 
DateType
 Key: SPARK-30004
 URL: https://issues.apache.org/jira/browse/SPARK-30004
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: Fokko Driesprong
 Fix For: 2.4.5


I've registered a custom type, namely XMLGregorianCalendar which is being used 
by Scalaxb. A XML databinding tool, for generating case classes based on a XSD. 
I want to convert the XMLGregorianCalendar to a regular TimestampType. This 
works, but when I update the table (using Delta), I get an error:

Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and 
'START_DATE_MAINTENANCE_FLPL'.
Failed to merge incompatible data types TimestampType and 
org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;;

There are two ways of fixing this:

* Adding a rule which compares the sqlType. 
* Change the compare function, so it will check the rhs to the lhs, so I can 
override the equals function on the UserDefinedType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30003) Prevent stack over flow in unknown hint resolution

2019-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-30003.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26642
[https://github.com/apache/spark/pull/26642]

> Prevent stack over flow in unknown hint resolution
> --
>
> Key: SPARK-30003
> URL: https://issues.apache.org/jira/browse/SPARK-30003
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> This was caused by SPARK-28746. See 
> https://github.com/apache/spark/pull/25464/files#r349543286



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30003) Prevent stack over flow in unknown hint resolution

2019-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-30003:


Assignee: Hyukjin Kwon

> Prevent stack over flow in unknown hint resolution
> --
>
> Key: SPARK-30003
> URL: https://issues.apache.org/jira/browse/SPARK-30003
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> This was caused by SPARK-28746. See 
> https://github.com/apache/spark/pull/25464/files#r349543286



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org