Re: How to restart Twitter spark stream

2015-07-24 Thread Akhil Das
Yes, that is correct, sorry for confusing you. But i guess this is what you are looking for, let me know if that doesn't help: val filtered_statuses = stream.transform(rdd ={ //Instead of hardcoding, you can fetch these from a MySQL or a file or whatever val sampleHashTags =

Re: How to restart Twitter spark stream

2015-07-24 Thread Zoran Jeremic
Hi Akhil, That's exactly what I needed. You saved my day :) Thanks a lot, Best, Zoran On Fri, Jul 24, 2015 at 12:28 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Yes, that is correct, sorry for confusing you. But i guess this is what you are looking for, let me know if that doesn't help:

Re: How to restart Twitter spark stream

2015-07-23 Thread Zoran Jeremic
Hi Akhil, Thank you for sending this code. My apologize if I will ask something that is obvious here, since I'm newbie in Scala, but I still don't see how I can use this code. Maybe my original question was not very clear. What I need is to get each Twitter Status that contains one of the

Re: How to restart Twitter spark stream

2015-07-22 Thread Akhil Das
That was a pseudo code, working version would look like this: val stream = TwitterUtils.createStream(ssc, None) val hashTags = stream.flatMap(status = status.getText.split( ).filter(_.startsWith(#))).map(x = (x.toLowerCase,1)) val topCounts10 = hashTags.map((_,

Re: How to restart Twitter spark stream

2015-07-21 Thread Zoran Jeremic
Hi Akhil and Jorn, I tried as you suggested to create some simple scenario, but I have an error on rdd.join(newRDD): value join is not a member of org.apache.spark.rdd.RDD[twitter4j.Status]. The code looks like: val stream = TwitterUtils.createStream(ssc, auth) val filteredStream=

Re: How to restart Twitter spark stream

2015-07-20 Thread Akhil Das
Jorn meant something like this: val filteredStream = twitterStream.transform(rdd ={ val newRDD = scc.sc.textFile(/this/file/will/be/updated/frequently).map(x = (x,1)) rdd.join(newRDD) }) ​newRDD will work like a filter when you do the join.​ Thanks Best Regards On Sun, Jul 19, 2015 at 9:32

Re: How to restart Twitter spark stream

2015-07-20 Thread Zoran Jeremic
Thanks for explanation. If I understand this correctly, in this approach I would actually stream everything from Twitter, and perform filtering in my application using Spark. Isn't this too much overhead if my application is interested in listening for couple of hundreds or thousands hashtags? On

Re: How to restart Twitter spark stream

2015-07-19 Thread Jörn Franke
Why do you even want to stop it? You can join it with a rdd loading the newest hash tags from disk in a regular interval Le dim. 19 juil. 2015 à 7:40, Zoran Jeremic zoran.jere...@gmail.com a écrit : Hi, I have a twitter spark stream initialized in the following way: val

Re: How to restart Twitter spark stream

2015-07-19 Thread Zoran Jeremic
Hi Jorn, I didn't know that it is possible to change filter without re-opening twitter stream. Actually, I already had that question earlier at the stackoverflow http://stackoverflow.com/questions/30960984/apache-spark-twitter-streaming and I got the answer that it's not possible, but it would be

How to restart Twitter spark stream

2015-07-18 Thread Zoran Jeremic
Hi, I have a twitter spark stream initialized in the following way: val ssc:StreamingContext = SparkLauncher.getSparkScalaStreamingContext() val config = getTwitterConfigurationBuilder.build() val auth: Option[twitter4j.auth.Authorization] = Some(new