subject:"Parquet and repartition"

Parquet and repartition

2015-03-16 Thread Masf

Hi all. When I specify the number of partitions and save this RDD in parquet format, my app fail. For example selectTest.coalesce(28).saveAsParquetFile(hdfs://vm-clusterOutput) However, it works well if I store data in text selectTest.coalesce(28).saveAsTextFile(hdfs://vm-clusterOutput) My

Re: Parquet and repartition

2015-03-16 Thread Masf

Thanks Sean, I forgot it The ouput error is the following: java.lang.ClassCastException: scala.math.BigDecimal cannot be cast to org.apache.spark.sql.catalyst.types.decimal.Decimal at org.apache.spark.sql.parquet.MutableRowWriteSupport.consumeType(ParquetTableSupport.scala:359) at

Re: Parquet and repartition

2015-03-16 Thread Cheng Lian

Hey Masf, I’ve created SPARK-6360 https://issues.apache.org/jira/browse/SPARK-6360 to track this issue. Detailed analysis is provided there. The TL;DR is, for Spark 1.1 and 1.2, if a SchemaRDD contains decimal or UDT column(s), after applying any traditional RDD transformations (e.g.