[jira] [Updated] (SPARK-13410) unionAll AnalysisException with DataFrames containing UDT columns.

2016-02-23 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13410:

Fix Version/s: 1.6.1

> unionAll AnalysisException with DataFrames containing UDT columns.
> --
>
> Key: SPARK-13410
> URL: https://issues.apache.org/jira/browse/SPARK-13410
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Franklyn Dsouza
>Assignee: Franklyn Dsouza
>  Labels: patch
> Fix For: 1.6.1, 2.0.0
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Unioning two DataFrames that contain UDTs fails with 
> {quote}
> AnalysisException: u"unresolved operator 'Union;"
> {quote}
> I tracked this down to this line 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala#L202
> Which compares datatypes between the output attributes of both logical plans. 
> However for UDTs this will be a new instance of the UserDefinedType or 
> PythonUserDefinedType 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L158
>  
> So this equality check will check if the two instances are the same and since 
> they aren't references to a singleton this check fails. 
> *Note: this will work fine if you are unioning the dataframe with itself.*
> I have a proposed patch for this which overrides the equality operator on the 
> two classes here: https://github.com/apache/spark/pull/11279
> Reproduction steps
> {code}
> from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
> from pyspark.sql import types
> schema = types.StructType([types.StructField("point", PythonOnlyUDT(), True)])
> #note they need to be two separate dataframes
> a = sqlCtx.createDataFrame([[PythonOnlyPoint(1.0, 2.0)]], schema)
> b = sqlCtx.createDataFrame([[PythonOnlyPoint(3.0, 4.0)]], schema)
> c = a.unionAll(b)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13410) unionAll AnalysisException with DataFrames containing UDT columns.

2016-02-20 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-13410:
--
Target Version/s:   (was: 1.6.0)

> unionAll AnalysisException with DataFrames containing UDT columns.
> --
>
> Key: SPARK-13410
> URL: https://issues.apache.org/jira/browse/SPARK-13410
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Franklyn Dsouza
>  Labels: patch
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Unioning two DataFrames that contain UDTs fails with 
> {quote}
> AnalysisException: u"unresolved operator 'Union;"
> {quote}
> I tracked this down to this line 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala#L202
> Which compares datatypes between the output attributes of both logical plans. 
> However for UDTs this will be a new instance of the UserDefinedType or 
> PythonUserDefinedType 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L158
>  
> So this equality check will check if the two instances are the same and since 
> they aren't references to a singleton this check fails. 
> *Note: this will work fine if you are unioning the dataframe with itself.*
> I have a proposed patch for this which overrides the equality operator on the 
> two classes here: https://github.com/apache/spark/pull/11279
> Reproduction steps
> {code}
> from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
> from pyspark.sql import types
> schema = types.StructType([types.StructField("point", PythonOnlyUDT(), True)])
> #note they need to be two separate dataframes
> a = sqlCtx.createDataFrame([[PythonOnlyPoint(1.0, 2.0)]], schema)
> b = sqlCtx.createDataFrame([[PythonOnlyPoint(3.0, 4.0)]], schema)
> c = a.unionAll(b)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13410) unionAll AnalysisException with DataFrames containing UDT columns.

2016-02-19 Thread Franklyn Dsouza (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Franklyn Dsouza updated SPARK-13410:

Summary: unionAll AnalysisException with DataFrames containing UDT columns. 
 (was: unionAll throws error with DataFrames containing UDT columns.)

> unionAll AnalysisException with DataFrames containing UDT columns.
> --
>
> Key: SPARK-13410
> URL: https://issues.apache.org/jira/browse/SPARK-13410
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Franklyn Dsouza
>  Labels: patch
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Unioning two DataFrames that contain UDTs fails with 
> {quote}
> AnalysisException: u"unresolved operator 'Union;"
> {quote}
> I tracked this down to this line 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala#L202
> Which compares datatypes between the output attributes of both logical plans. 
> However for UDTs this will be a new instance of the UserDefinedType or 
> PythonUserDefinedType 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala#L158
>  
> So this equality check will check if the two instances are the same and since 
> they aren't references to a singleton this check fails. 
> *Note: this will work fine if you are unioning the dataframe with itself.*
> I have a proposed patch for this which overrides the equality operator on the 
> two classes here: https://github.com/apache/spark/pull/11279
> Reproduction steps
> {code}
> from pyspark.sql.tests import PythonOnlyPoint, PythonOnlyUDT
> from pyspark.sql import types
> schema = types.StructType([types.StructField("point", PythonOnlyUDT(), True)])
> #note they need to be two separate dataframes
> a = sqlCtx.createDataFrame([[PythonOnlyPoint(1.0, 2.0)]], schema)
> b = sqlCtx.createDataFrame([[PythonOnlyPoint(3.0, 4.0)]], schema)
> c = a.unionAll(b)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org