Hi, Is there any optimum way of splitting a dstream into components?
I am doing Spark streaming and this the dstream I get val dstream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics) Now that dstream consists of 10,00 price lines per second like below ID, TIMESTAMP, PRICE 31,20160426-080924,93.53608929178084896656 The columns are separated by commas/ Now couple of questions: val lines = dstream.map(_._2) This maps the record into components? Is that the correct understanding of it The following splits the line into comma separated fields. val words = lines.map(_.split(',').view(2)) I am interested in column three So view(2) returns the value. I have also seen other ways like val words = lines.map(_.split(',').map(line => (line(0), (line(1),line(2) ... line(0), line(1) refer to the position of the fields? Which one is the adopted one or the correct one? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com