[jira] [Commented] (SPARK-13966) Regression using .withColumn() on a parquet

2016-04-07 Thread Federico Ponzi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230128#comment-15230128
 ] 

Federico Ponzi commented on SPARK-13966:


Seems working now to me too. Thanks

> Regression using .withColumn() on a parquet
> ---
>
> Key: SPARK-13966
> URL: https://issues.apache.org/jira/browse/SPARK-13966
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 
> (2016-01-17) x86_64 GNU/Linux
>Reporter: Federico Ponzi
>Assignee: Davies Liu
>Priority: Critical
>
> If we load a parquet, add a column with {{withcolumn()}} with a date type and 
> try to select a join of himself, we get a {{java.util.NoSuchElementException: 
> key not found: key#6}}
> Here is a simple program to reproduce it:
> {code}
> from pyspark.sql import SQLContext, Row
> from pyspark import SparkContext
> from pyspark.sql.functions import from_unixtime, lit
> sc = SparkContext()
> sqlContext = SQLContext(sc)
> df = sqlContext.createDataFrame(sc.parallelize([Row(x=123)]))
> df.write.parquet("/tmp/testcase", mode="overwrite")
> df = sqlContext.read.parquet("/tmp/testcase")
> # df = df.unionAll(df.limit(0)) # WORKAROUND
> df = df.withColumn("key", from_unixtime(lit(1457650800))) # also happens with 
> a .cast("timestamp") 
> df.registerTempTable("test")
> res = sqlContext.sql("SELECT COUNT(1) from test t1, test t2 where t1.key = 
> t2.key")
> res.show()
> {code}
> This only occurs when the added columns is of type timestamp, and dosen't 
> happen in Spark 1.6.x
> {noformat}
> Traceback (most recent call last):
>   File "/tmp/bug.py", line 17, in 
> res.show()
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", 
> line 217, in show
>   File "/usr/local/spark/python/lib/py4j-0.9.2-src.zip/py4j/java_gateway.py", 
> line 836, in __call__
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 
> 45, in deco
>   File "/usr/local/spark/python/lib/py4j-0.9.2-src.zip/py4j/protocol.py", 
> line 310, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o67.showString.
> : java.util.NoSuchElementException: key not found: key#6
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:38)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:38)
>   at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$35$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:566)
>   at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$35$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:565)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:67)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:301)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:350)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(Tr

[jira] [Commented] (SPARK-13966) Regression using .withColumn() on a parquet

2016-04-06 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229042#comment-15229042
 ] 

Davies Liu commented on SPARK-13966:


I checked this on latest master, it works, could you check this again?

> Regression using .withColumn() on a parquet
> ---
>
> Key: SPARK-13966
> URL: https://issues.apache.org/jira/browse/SPARK-13966
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 
> (2016-01-17) x86_64 GNU/Linux
>Reporter: Federico Ponzi
>Priority: Critical
>
> If we load a parquet, add a column with {{withcolumn()}} with a date type and 
> try to select a join of himself, we get a {{java.util.NoSuchElementException: 
> key not found: key#6}}
> Here is a simple program to reproduce it:
> {code}
> from pyspark.sql import SQLContext, Row
> from pyspark import SparkContext
> from pyspark.sql.functions import from_unixtime, lit
> sc = SparkContext()
> sqlContext = SQLContext(sc)
> df = sqlContext.createDataFrame(sc.parallelize([Row(x=123)]))
> df.write.parquet("/tmp/testcase", mode="overwrite")
> df = sqlContext.read.parquet("/tmp/testcase")
> # df = df.unionAll(df.limit(0)) # WORKAROUND
> df = df.withColumn("key", from_unixtime(lit(1457650800))) # also happens with 
> a .cast("timestamp") 
> df.registerTempTable("test")
> res = sqlContext.sql("SELECT COUNT(1) from test t1, test t2 where t1.key = 
> t2.key")
> res.show()
> {code}
> This only occurs when the added columns is of type timestamp, and dosen't 
> happen in Spark 1.6.x
> {noformat}
> Traceback (most recent call last):
>   File "/tmp/bug.py", line 17, in 
> res.show()
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", 
> line 217, in show
>   File "/usr/local/spark/python/lib/py4j-0.9.2-src.zip/py4j/java_gateway.py", 
> line 836, in __call__
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 
> 45, in deco
>   File "/usr/local/spark/python/lib/py4j-0.9.2-src.zip/py4j/protocol.py", 
> line 310, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o67.showString.
> : java.util.NoSuchElementException: key not found: key#6
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:38)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:38)
>   at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$35$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:566)
>   at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$35$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:565)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:67)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:301)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:350)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.