Creating a SchemaRDD from RDD of thrift classes

2014-10-30 Thread ankits
I have one job with spark that creates some RDDs of type X and persists them
in memory. The type X is an auto generated Thrift java class (not a case
class though). Now in another job, I want to convert the RDD to a SchemaRDD
using sqlContext.applySchema(). Can I derive a schema from the thrift
definitions to convert RDD[X] to SchemaRDD[X]?









--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Creating-a-SchemaRDD-from-RDD-of-thrift-classes-tp17766.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Creating a SchemaRDD from RDD of thrift classes

2014-10-30 Thread Michael Armbrust
That should be possible, although I'm not super familiar with thrift.
You'll probably need access to the generated metadata
http://people.apache.org/~thejas/thrift-0.9/javadoc/org/apache/thrift/meta_data/package-frame.html
.

Shameless plug If you find yourself reading a lot of thrift data you
might consider writing a library that goes against the new SQL Data Source
API https://github.com/apache/spark/pull/2475, which is about to be
merged in.  Its essentially applySchema on steroids.

This code for avro is possibly a useful reference:
https://github.com/marmbrus/sql-avro

On Thu, Oct 30, 2014 at 2:13 PM, ankits ankitso...@gmail.com wrote:

 I have one job with spark that creates some RDDs of type X and persists
 them
 in memory. The type X is an auto generated Thrift java class (not a case
 class though). Now in another job, I want to convert the RDD to a SchemaRDD
 using sqlContext.applySchema(). Can I derive a schema from the thrift
 definitions to convert RDD[X] to SchemaRDD[X]?









 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Creating-a-SchemaRDD-from-RDD-of-thrift-classes-tp17766.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org