Parse tab seperated file inc json efficent

matthes Mon, 14 Sep 2015 11:02:30 -0700

I try to parse a tab seperated file in Spark 1.5 with a json section as
efficent as possible.
The file looks like follows:


value1<tab>value2<tab>{json}

How can I parse all fields inc the json fields into a RDD directly?

If I use this peace of code:

val jsonCol = sc.textFile("/data/input").map(l => l.split("\t",3)).map(x =>
x(2).trim()).cache()
val json = sqlContext.read.json(jsonCol).rdd

I will loose value1 and value2!!!
I'm open for any idea!



-----
I'm using Spark 1.5
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Parse-tab-seperated-file-inc-json-efficent-tp24691.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Parse tab seperated file inc json efficent

Reply via email to