Awesome ! That would be great !! On Mon, Jan 26, 2015 at 3:18 PM, Michael Armbrust <mich...@databricks.com> wrote:
> I'm aiming for 1.3. > > On Mon, Jan 26, 2015 at 3:05 PM, Manoj Samel <manojsamelt...@gmail.com> > wrote: > >> Thanks Michael. I am sure there have been many requests for this support. >> >> Any release targeted for this? >> >> Thanks, >> >> On Sat, Jan 24, 2015 at 11:47 AM, Michael Armbrust < >> mich...@databricks.com> wrote: >> >>> Those annotations actually don't work because the timestamp is SQL has >>> optional nano-second precision. >>> >>> However, there is a PR to add support using parquets INT96 type: >>> https://github.com/apache/spark/pull/3820 >>> >>> On Fri, Jan 23, 2015 at 12:08 PM, Manoj Samel <manojsamelt...@gmail.com> >>> wrote: >>> >>>> Looking further at the trace and ParquetTypes.scala, it seems there is >>>> no support for Timestamp and Date in fromPrimitiveDataType(ctype: >>>> DataType): Option[ParquetTypeInfo]. Since Parquet supports these type >>>> with some decoration over Int ( >>>> https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md), >>>> any reason why Date / Timestamp are not supported right now ? >>>> >>>> Thanks, >>>> >>>> Manoj >>>> >>>> >>>> On Fri, Jan 23, 2015 at 11:40 AM, Manoj Samel <manojsamelt...@gmail.com >>>> > wrote: >>>> >>>>> Using Spark 1.2 >>>>> >>>>> Read a CSV file, apply schema to convert to SchemaRDD and then >>>>> schemaRdd.saveAsParquetFile >>>>> >>>>> If the schema includes Timestamptype, it gives following trace when >>>>> doing the save >>>>> >>>>> Exception in thread "main" java.lang.RuntimeException: Unsupported >>>>> datatype TimestampType >>>>> >>>>> at scala.sys.package$.error(package.scala:27) >>>>> >>>>> at >>>>> org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply( >>>>> ParquetTypes.scala:343) >>>>> >>>>> at >>>>> org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply( >>>>> ParquetTypes.scala:292) >>>>> >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> >>>>> at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType( >>>>> ParquetTypes.scala:291) >>>>> >>>>> at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply( >>>>> ParquetTypes.scala:363) >>>>> >>>>> at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply( >>>>> ParquetTypes.scala:362) >>>>> >>>>> at scala.collection.TraversableLike$$anonfun$map$1.apply( >>>>> TraversableLike.scala:244) >>>>> >>>>> at scala.collection.TraversableLike$$anonfun$map$1.apply( >>>>> TraversableLike.scala:244) >>>>> >>>>> at scala.collection.mutable.ResizableArray$class.foreach( >>>>> ResizableArray.scala:59) >>>>> >>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>>>> >>>>> at scala.collection.TraversableLike$class.map( >>>>> TraversableLike.scala:244) >>>>> >>>>> at scala.collection.AbstractTraversable.map(Traversable.scala:105) >>>>> >>>>> at >>>>> org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes( >>>>> ParquetTypes.scala:361) >>>>> >>>>> at org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData( >>>>> ParquetTypes.scala:407) >>>>> >>>>> at org.apache.spark.sql.parquet.ParquetRelation$.createEmpty( >>>>> ParquetRelation.scala:166) >>>>> >>>>> at org.apache.spark.sql.parquet.ParquetRelation$.create( >>>>> ParquetRelation.scala:145) >>>>> >>>>> at >>>>> org.apache.spark.sql.execution.SparkStrategies$ParquetOperations$.apply( >>>>> SparkStrategies.scala:204) >>>>> >>>>> at >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply( >>>>> QueryPlanner.scala:58) >>>>> >>>>> at >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply( >>>>> QueryPlanner.scala:58) >>>>> >>>>> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >>>>> >>>>> at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply( >>>>> QueryPlanner.scala:59) >>>>> >>>>> at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute( >>>>> SQLContext.scala:418) >>>>> >>>>> at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan( >>>>> SQLContext.scala:416) >>>>> >>>>> at >>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute( >>>>> SQLContext.scala:422) >>>>> >>>>> at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan( >>>>> SQLContext.scala:422) >>>>> >>>>> at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute( >>>>> SQLContext.scala:425) >>>>> >>>>> at org.apache.spark.sql.SQLContext$QueryExecution.toRdd( >>>>> SQLContext.scala:425) >>>>> >>>>> at org.apache.spark.sql.SchemaRDDLike$class.saveAsParquetFile( >>>>> SchemaRDDLike.scala:76) >>>>> >>>>> at org.apache.spark.sql.SchemaRDD.saveAsParquetFile( >>>>> SchemaRDD.scala:108) >>>>> >>>>> at bdrt.MyTest$.createParquetWithDate(MyTest.scala:88) >>>>> >>>>> at bdrt.MyTest$delayedInit$body.apply(MyTest.scala:54) >>>>> >>>>> at scala.Function0$class.apply$mcV$sp(Function0.scala:40) >>>>> >>>>> at scala.runtime.AbstractFunction0.apply$mcV$sp( >>>>> AbstractFunction0.scala:12) >>>>> >>>>> at scala.App$$anonfun$main$1.apply(App.scala:71) >>>>> >>>>> at scala.App$$anonfun$main$1.apply(App.scala:71) >>>>> >>>>> at scala.collection.immutable.List.foreach(List.scala:318) >>>>> >>>>> at scala.collection.generic.TraversableForwarder$class.foreach( >>>>> TraversableForwarder.scala:32) >>>>> >>>>> at scala.App$class.main(App.scala:71) >>>>> >>>>> at bdrt.MyTest$.main(MyTest.scala:10) >>>>> >>>>> >>>>> >>>> >>> >> >