Hi,

Is there any optimum way of splitting a dstream into components?

I am doing Spark streaming and this the dstream I get

val dstream = KafkaUtils.createDirectStream[String, String, StringDecoder,
StringDecoder](ssc, kafkaParams, topics)


Now that dstream consists of 10,00 price lines per second like below

ID, TIMESTAMP, PRICE
31,20160426-080924,93.53608929178084896656

The columns are separated by commas/

Now couple of questions:

val lines = dstream.map(_._2)

This maps the record into components? Is that the correct understanding of
it

The following splits the line into comma separated fields.

val words = lines.map(_.split(',').view(2))

I am interested in column three So view(2) returns the value.

I have also seen other ways like

val words = lines.map(_.split(',').map(line => (line(0), (line(1),line(2)
...

line(0), line(1) refer to the position of the fields?

Which one is the adopted one or the correct one?

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Reply via email to