Yes, that is correct, sorry for confusing you. But i guess this is what you
are looking for, let me know if that doesn't help:
val filtered_statuses = stream.transform(rdd ={
//Instead of hardcoding, you can fetch these from a MySQL or a file
or whatever
val sampleHashTags =
Hi Akhil,
That's exactly what I needed. You saved my day :)
Thanks a lot,
Best,
Zoran
On Fri, Jul 24, 2015 at 12:28 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Yes, that is correct, sorry for confusing you. But i guess this is what
you are looking for, let me know if that doesn't help:
Hi Akhil,
Thank you for sending this code. My apologize if I will ask something that
is obvious here, since I'm newbie in Scala, but I still don't see how I can
use this code. Maybe my original question was not very clear.
What I need is to get each Twitter Status that contains one of the
That was a pseudo code, working version would look like this:
val stream = TwitterUtils.createStream(ssc, None)
val hashTags = stream.flatMap(status = status.getText.split(
).filter(_.startsWith(#))).map(x = (x.toLowerCase,1))
val topCounts10 = hashTags.map((_,
Hi Akhil and Jorn,
I tried as you suggested to create some simple scenario, but I have an
error on rdd.join(newRDD): value join is not a member of
org.apache.spark.rdd.RDD[twitter4j.Status]. The code looks like:
val stream = TwitterUtils.createStream(ssc, auth)
val filteredStream=
Jorn meant something like this:
val filteredStream = twitterStream.transform(rdd ={
val newRDD = scc.sc.textFile(/this/file/will/be/updated/frequently).map(x
= (x,1))
rdd.join(newRDD)
})
newRDD will work like a filter when you do the join.
Thanks
Best Regards
On Sun, Jul 19, 2015 at 9:32
Thanks for explanation.
If I understand this correctly, in this approach I would actually stream
everything from Twitter, and perform filtering in my application using
Spark. Isn't this too much overhead if my application is interested in
listening for couple of hundreds or thousands hashtags?
On
Why do you even want to stop it? You can join it with a rdd loading the
newest hash tags from disk in a regular interval
Le dim. 19 juil. 2015 à 7:40, Zoran Jeremic zoran.jere...@gmail.com a
écrit :
Hi,
I have a twitter spark stream initialized in the following way:
val
Hi Jorn,
I didn't know that it is possible to change filter without re-opening
twitter stream. Actually, I already had that question earlier at the
stackoverflow
http://stackoverflow.com/questions/30960984/apache-spark-twitter-streaming
and I got the answer that it's not possible, but it would be
Hi,
I have a twitter spark stream initialized in the following way:
val ssc:StreamingContext =
SparkLauncher.getSparkScalaStreamingContext()
val config = getTwitterConfigurationBuilder.build()
val auth: Option[twitter4j.auth.Authorization] =
Some(new
10 matches
Mail list logo