Iterate over contents of schemaRDD loaded from parquet file to extract timestamp

anu Mon, 16 Mar 2015 21:44:09 -0700

Spark Version - 1.1.0
Scala - 2.10.4

I have loaded following type data from a parquet file, stored in a schemaRDD


[7654321,2015-01-01 00:00:00.007,0.49,THU]

Since, in spark version 1.1.0, parquet format doesn't support saving
timestamp valuues, I have saved the timestamp data as string. Can you please
tell me how to iterate over the data in this schema RDD to retrieve the 
timestamp values and regsietr the mapped RDD as a Table and then be able to
run queries like "Select * from table where time >= '2015-01-01
00:00:00.000' " . I wrote the following code :

val sdf = new SimpleDateFormat("yyyy-mm-dd hh:mm:ss.SSS"); val calendar =
Calendar.getInstance()
val iddRDD = intf_ddRDD.map{ r => 

val end_time = sdf.parse(r(1).toString); 
calendar.setTime(end_time); 
val r1 = new java.sql.Timestamp(end_time.getTime); 

val hour: Long = calendar.get(Calendar.HOUR_OF_DAY); 

Row(r(0).toString.toInt, r1, hour, r(2).toString.toInt, r(3).toString)

}

This gives me * org.apache.spark.SparkException: Task not serializable*

Please help !!!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Iterate-over-contents-of-schemaRDD-loaded-from-parquet-file-to-extract-timestamp-tp22089.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Iterate over contents of schemaRDD loaded from parquet file to extract timestamp

Reply via email to