Sure, that code looks like it does sort of what you describe but it's mixed up in a few ways. It looks like you only want to operate on words that start with SECRETWORD, but then you are prepending acct and _ in the code but expecting something appending in the result. You also seem like you want to sum by key so there needs to be a reduceByKeyAndWindow in here somewhere, or else a foreachRDD and reduceByKey. The result is not a sequence of (word,count), but a sequence of RDDs of (word,count).
On Wed, Oct 29, 2014 at 11:40 PM, Harold Nguyen <har...@nexgate.com> wrote: > Hi Sean, > > I'd just like to take the first "word" of every line, and use it as a > variable for later. Is there a way to do that? > > Here's the gist of what I want to do: > > val lines = KafkaUtils.createStream(ssc, "localhost:2181", "test", > Map("test" -> 10)).map(_._2) > val words = lines.flatMap(_.split(" ")) > val acct = words.filter(word => word.startsWith("SECRETWORD")) > val pairs = words.map(word => (acct+"_"+word, 1)) > > Take all lines coming into Kafka, and add the word 'acct' to each word. > > As an example, here is a line: > > "hello world you are SECRETWORDthebest hello world" > > And it should do this: > > (SECRETWORDthebest_hello, 2), (SECRETWORDthebest_world, 2), > (SECRETWORDthebest_you, 1), etc... > > Harold > > > On Wed, Oct 29, 2014 at 3:36 PM, Sean Owen <so...@cloudera.com> wrote: >> >> What would it mean to make a DStream into a String? it's inherently a >> sequence of things over time, each of which might be a string but >> which are usually RDDs of things. >> >> On Wed, Oct 29, 2014 at 11:15 PM, Harold Nguyen <har...@nexgate.com> >> wrote: >> > Hi all, >> > >> > How do I convert a DStream to a string ? >> > >> > For instance, I want to be able to: >> > >> > val myword = words.filter(word => word.startsWith("blah")) >> > >> > And use "myword" in other places, like tacking it onto (key, value) >> > pairs, >> > like so: >> > >> > val pairs = words.map(word => (myword+"_"+word, 1)) >> > >> > Thanks for any help, >> > >> > Harold >> > >> > >> > >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org