Re: multiple splits fails

2016-04-05 Thread Mich Talebzadeh
>>>> line.split("\n,")).map(word => (word, 1)).reduceByKey(_ + _) >>>>>>>>>>>> v: org.apache.spark.streaming.dstream.DStream[(String, Int)] = >>>>>>>>>>>> org.apache.spark.streaming.dstream.ShuffledDStream

Re: multiple splits fails

2016-04-05 Thread Mich Talebzadeh
>>>>>>>> :43: error: value collect is not a member of >>>>>>>>>>> org.apache.spark.streaming.dstream.DStream[(String, Int)] >>>>>>>>>>> val v = lines.filter(_.contains("ASE 15")).filter(_

Re: multiple splits fails

2016-04-05 Thread Sachin Aggarwal
adeh >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> LinkedIn * >>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8

Re: multiple splits fails

2016-04-05 Thread Mich Talebzadeh
ril 2016 at 16:01, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> bq. is not a member of (String, String) >>>>>>>>>> >>>>>>>>>> As shown above, conta

Re: multiple splits fails

2016-04-05 Thread Sachin Aggarwal
;>>>>> Thank you gents. >>>>>>>>>> >>>>>>>>>> That should "\n" as carriage return >>>>>>>>>> >>>>>>>>>> OK I am using spark streaming to analyse the mess

Re: multiple splits fails

2016-04-05 Thread Mich Talebzadeh
t;>>>>>>>> import org.apache.spark.streaming._ >>>>>>>>> import org.apache.spark.streaming.kafka.KafkaUtils >>>>>>>>> // >>>>>>>>> scala> val sparkConf = new SparkConf(). >>>>>>>>

Re: multiple splits fails

2016-04-03 Thread Mich Talebzadeh
quot; ) >>>>>>> kafkaParams: scala.collection.immutable.Map[String,String] = >>>>>>> Map(bootstrap.servers -> rhes564:9092, schema.registry.url -> >>>>>>> http://rhes564:8081, zookeeper.connect -> rhes564:2181, group.id -> >

Re: multiple splits fails

2016-04-03 Thread Ted Yu
ing, >>>>>> StringDecoder, StringDecoder](ssc, kafkaParams, topic) >>>>>> messages: org.apache.spark.streaming.dstream.InputDStream[(String, >>>>>> String)] = >>>>>> org.apache.spark.streaming.kafka.DirectKafkaInputDStream@5d8ccb6c >>

Re: multiple splits fails

2016-04-03 Thread Mich Talebzadeh
gt;>>> This part is tricky >>>>> >>>>> scala> val showlines = messages.filter(_ contains("ASE 15")).filter(_ >>>>> contains("UPDATE INDEX STATISTICS")).flatMap(line => >>>>> line.sp

Re: multiple splits fails

2016-04-03 Thread Ted Yu
")).filter(_ >>>> contains("UPDATE INDEX STATISTICS")).flatMap(line => >>>> line.split("\n,")).map(word => (word, 1)).reduceByKey(_ + >>>> _).collect.foreach(println) >>>> >>>> >>>> How does one refer to the c

Re: multiple splits fails

2016-04-03 Thread Mich Talebzadeh
gt;> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 3 April 2016 at 15:32, Ted Yu <yuzhih...@gmail.com> wrote: >>

Re: multiple splits fails

2016-04-03 Thread Ted Yu
split"\t," splits the filter by carriage return >> >> Minor correction: "\t" denotes tab character. >> >> On Sun, Apr 3, 2016 at 7:24 AM, Eliran Bivas <elir...@iguaz.io> wrote: >> >>> Hi Mich, >>> >>> 1. The first undersco

Re: multiple splits fails

2016-04-03 Thread Mich Talebzadeh
e() results in a collection of strings) >> 2. You're correct. No need for it. >> 3. Filter is expecting a Boolean result. So you can merge your contains >> filters to one with AND (&&) statement. >> 4. Correct. Each character in split() is used as a divider. >>

Re: multiple splits fails

2016-04-03 Thread Ted Yu
> > Eliran Bivas > > *From:* Mich Talebzadeh <mich.talebza...@gmail.com> > *Sent:* Apr 3, 2016 15:06 > *To:* Eliran Bivas > *Cc:* user @spark > *Subject:* Re: multiple splits fails > > Hi Eliran, > > Many thanks for your input on this. > > I thought about

Re: multiple splits fails

2016-04-03 Thread Eliran Bivas
Correct. Each character in split() is used as a divider. Eliran Bivas From: Mich Talebzadeh <mich.talebza...@gmail.com> Sent: Apr 3, 2016 15:06 To: Eliran Bivas Cc: user @spark Subject: Re: multiple splits fails Hi Eliran, Many thanks for your input on this. I thought about wha

Re: multiple splits fails

2016-04-03 Thread Mich Talebzadeh
Hi Eliran, Many thanks for your input on this. I thought about what I was trying to achieve so I rewrote the logic as follows: 1. Read the text file in 2. Filter out empty lines (well not really needed here) 3. Search for lines that contain "ASE 15" and further have sentence

Re: multiple splits fails

2016-04-03 Thread Eliran Bivas
Hi Mich, Few comments: When doing .filter(_ > “”) you’re actually doing a lexicographic comparison and not filtering for empty lines (which could be achieved with _.notEmpty or _.length > 0). I think that filtering with _.contains should be sufficient and the first filter can be omitted. As

multiple splits fails

2016-04-03 Thread Mich Talebzadeh
Hi, I am not sure this is the correct approach Read a text file in val f = sc.textFile("/tmp/ASE15UpgradeGuide.txt") Now I want to get rid of empty lines and filter only the lines that contain "ASE15" f.filter(_ > "").filter(_ contains("ASE15")). The above works but I am not sure whether I