Spark Version - 1.1.0 Scala - 2.10.4 I have loaded following type data from a parquet file, stored in a schemaRDD
[7654321,2015-01-01 00:00:00.007,0.49,THU] Since, in spark version 1.1.0, parquet format doesn't support saving timestamp valuues, I have saved the timestamp data as string. Can you please tell me how to iterate over the data in this schema RDD to retrieve the timestamp values and regsietr the mapped RDD as a Table and then be able to run queries like "Select * from table where time >= '2015-01-01 00:00:00.000' " . I wrote the following code : val sdf = new SimpleDateFormat("yyyy-mm-dd hh:mm:ss.SSS"); val calendar = Calendar.getInstance() val iddRDD = intf_ddRDD.map{ r => val end_time = sdf.parse(r(1).toString); calendar.setTime(end_time); val r1 = new java.sql.Timestamp(end_time.getTime); val hour: Long = calendar.get(Calendar.HOUR_OF_DAY); Row(r(0).toString.toInt, r1, hour, r(2).toString.toInt, r(3).toString) } This gives me * org.apache.spark.SparkException: Task not serializable* Please help !!! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Iterate-over-contents-of-schemaRDD-loaded-from-parquet-file-to-extract-timestamp-tp22089.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org