[ https://issues.apache.org/jira/browse/SPARK-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695307#comment-14695307 ]
Tzach Zohar commented on SPARK-9936: ------------------------------------ [~viirya] indeed! I've just located the problematic line and indeed in master it's [fixed|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L134].. I guess I'll stick to my workaround until 1.5 is released, thanks. Should I close this issue? Is it a duplicate of an existing issue that I failed to find? Not sure I know the procedure here... > decimal precision lost when loading DataFrame from RDD > ------------------------------------------------------ > > Key: SPARK-9936 > URL: https://issues.apache.org/jira/browse/SPARK-9936 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.0 > Reporter: Tzach Zohar > > It seems that when converting an RDD that contains BigDecimals into a > DataFrame (using SQLContext.createDataFrame without specifying schema), > precision info is lost, which means saving as Parquet file will fail (Parquet > tries to verify precision < 18, so fails if it's unset). > This seems to be similar to > [SPARK-7196|https://issues.apache.org/jira/browse/SPARK-7196], which fixed > the same issue for DataFrames created via JDBC. > To reproduce: > {code:none} > scala> val rdd: RDD[(String, BigDecimal)] = sc.parallelize(Seq(("a", > BigDecimal.valueOf(0.234)))) > rdd: org.apache.spark.rdd.RDD[(String, BigDecimal)] = > ParallelCollectionRDD[0] at parallelize at <console>:23 > scala> val df: DataFrame = new SQLContext(rdd.context).createDataFrame(rdd) > df: org.apache.spark.sql.DataFrame = [_1: string, _2: decimal(10,0)] > scala> df.write.parquet("/data/parquet-file") > 15/08/13 10:30:07 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 > (TID 0) > java.lang.RuntimeException: Unsupported datatype DecimalType() > {code} > To verify this is indeed caused by the precision being lost, I've tried > manually changing the schema to include precision (by traversing the > StructFields and replacing the DecimalTypes with altered DecimalTypes), > creating a new DataFrame using this updated schema - and indeed it fixes the > problem. > I'm using Spark 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org