Can you send me the subject of that email? I can't find any email suggesting solution to that problem. There is email "*Twitter4j streaming question*", but it doesn't have any sample code. It just confirms what I explained earlier that without filtering Twitter will limit to 1% of tweets, and if you use filter API, Twitter limits you to 400 hashtags you can follow.
Thanks, Zoran On Wed, Jul 29, 2015 at 8:40 AM, Peyman Mohajerian <mohaj...@gmail.com> wrote: > This question was answered with sample code a couple of days ago, please > look back. > > On Sat, Jul 25, 2015 at 11:43 PM, Zoran Jeremic <zoran.jere...@gmail.com> > wrote: > >> Hi, >> >> I discovered what is the problem here. Twitter public stream is limited >> to 1% of overall tweets (https://goo.gl/kDwnyS), so that's why I can't >> access all the tweets posted with specific hashtag using approach that I >> posted in previous email, so I guess this approach would not work for me. >> The other problem is that filtering has a limit of 400 hashtags ( >> https://goo.gl/BywrAk), so in order to follow more than 400 hashtags I >> need more parallel streams. >> >> This brings me back to my previous question (https://goo.gl/bVDkHx). In >> my application I need to follow more than 400 hashtags, and I need to >> collect each tweet having one of these hashtags. Another complication is >> that users could add new hashtags or remove old hashtags, so I have to >> update stream in the real-time. >> My earlier approach without Apache Spark was to create twitter4j user >> stream with initial filter, and each time new hashtag has to be added, stop >> stream, add new hashtag and run it again. When stream had 400 hashtags, I >> initialize new stream with new credentials. This was really complex, and I >> was hopping that Apache Spark would make it simpler. However, I'm trying >> for a days to find solution, and had no success. >> >> If I have to use the same approach I used with twitter4j, I have to solve >> 2 problems: >> - how to run multiple twitter streams in the same spark context >> - how to add new hashtags to the existing filter >> >> I hope that somebody will have some more elegant solution and idea, and >> tell me that I missed something obvious. >> >> Thanks, >> Zoran >> >> On Sat, Jul 25, 2015 at 8:44 PM, Zoran Jeremic <zoran.jere...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I've implemented Twitter streaming as in the code given at the bottom of >>> email. It finds some tweets based on the hashtags I'm following. However, >>> it seems that a large amount of tweets is missing. I've tried to post some >>> tweets that I'm following in the application, and none of them was received >>> in application. I also checked some hashtags (e.g. #android) on Twitter >>> using Live and I could see that almost each second something was posted >>> with that hashtag, and my application received only 3-4 posts in one minute. >>> >>> I didn't have this problem in earlier non-spark version of application >>> which used twitter4j to access user stream API. I guess this is some >>> trending stream, but I couldn't find anything that explains which Twitter >>> API is used in Spark Twitter Streaming and how to create stream that will >>> access everything posted on the Twitter. >>> >>> I hope somebody could explain what is the problem and how to solve this. >>> >>> Thanks, >>> Zoran >>> >>> >>> def initializeStreaming(){ >>>> val config = getTwitterConfigurationBuilder.build() >>>> val auth: Option[twitter4j.auth.Authorization] = Some(new >>>> twitter4j.auth.OAuthAuthorization(config)) >>>> val stream:DStream[Status] = TwitterUtils.createStream(ssc, auth) >>>> val filtered_statuses = stream.transform(rdd =>{ >>>> val filtered = rdd.filter(status =>{ >>>> var found = false >>>> for(tag <- hashTagsList){ >>>> if(status.getText.toLowerCase.contains(tag)) { >>>> found = true >>>> } >>>> } >>>> found >>>> }) >>>> filtered >>>> }) >>>> filtered_statuses.foreachRDD(rdd => { >>>> rdd.collect.foreach(t => { >>>> println(t) >>>> }) >>>> }) >>>> ssc.start() >>>> } >>>> >>> >> >> >> >> > -- ******************************************************************************* Zoran Jeremic, PhD Senior System Analyst & Programmer Athabasca University Tel: +1 604 92 89 944 E-mail: zoran.jere...@gmail.com <zoran.jere...@va.mod.gov.rs> Homepage: http://zoranjeremic.org **********************************************************************************