Re: reading a csv dynamically

Pankaj Narang Wed, 21 Jan 2015 18:48:21 -0800

Yes I think you need to create one map first which will keep the number of
values in every line. Now you can group all the records with same number of
values. Now you know how many types of arrays you will have.



val dataRDD = sc.textFile("file.csv") 
val dataLengthRDD =   dataRDD .map(line=>(_.split(",").length,line))
val groupedData = dataLengthRDD.groupByKey()

now you can process the groupedData as it will have arrays of length x in
one RDD.

groupByKey([numTasks])  When called on a dataset of (K, V) pairs, returns a
dataset of (K, Iterable<V>) pairs. 


I hope this helps

Regards
Pankaj 
Infoshore Software
India




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/reading-a-csv-dynamically-tp21304p21307.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: reading a csv dynamically

Reply via email to