Can you send me the subject of that email? I can't find any email
suggesting solution to that problem. There is email "*Twitter4j streaming
question*", but it doesn't have any sample code. It just confirms what I
explained earlier that without filtering Twitter will limit to 1% of
tweets, and if you use filter API, Twitter limits you to 400 hashtags you
can follow.

Thanks,
Zoran

On Wed, Jul 29, 2015 at 8:40 AM, Peyman Mohajerian <mohaj...@gmail.com>
wrote:

> This question was answered with sample code a couple of days ago, please
> look back.
>
> On Sat, Jul 25, 2015 at 11:43 PM, Zoran Jeremic <zoran.jere...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I discovered what is the problem here. Twitter public stream is limited
>> to 1% of overall tweets (https://goo.gl/kDwnyS), so that's why I can't
>> access all the tweets posted with specific hashtag using approach that I
>> posted in previous email, so I guess this approach would not work for me.
>> The other problem is that filtering has a limit of 400 hashtags (
>> https://goo.gl/BywrAk), so in order to follow more than 400 hashtags I
>> need more parallel streams.
>>
>> This brings me back to my previous question (https://goo.gl/bVDkHx). In
>> my application I need to follow more than 400 hashtags, and I need to
>> collect each tweet having one of these hashtags. Another complication is
>> that users could add new hashtags or remove old hashtags, so I have to
>> update stream in the real-time.
>> My earlier approach without Apache Spark was to create twitter4j user
>> stream with initial filter, and each time new hashtag has to be added, stop
>> stream, add new hashtag and run it again. When stream had 400 hashtags, I
>> initialize new stream with new credentials. This was really complex, and I
>> was hopping that Apache Spark would make it simpler. However, I'm trying
>> for a days to find solution, and had no success.
>>
>> If I have to use the same approach I used with twitter4j, I have to solve
>> 2 problems:
>> - how to run multiple twitter streams in the same spark context
>> - how to add new hashtags to the existing filter
>>
>> I hope that somebody will have some more elegant solution and idea, and
>> tell me that I missed something obvious.
>>
>> Thanks,
>> Zoran
>>
>> On Sat, Jul 25, 2015 at 8:44 PM, Zoran Jeremic <zoran.jere...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I've implemented Twitter streaming as in the code given at the bottom of
>>> email. It finds some tweets based on the hashtags I'm following. However,
>>> it seems that a large amount of tweets is missing. I've tried to post some
>>> tweets that I'm following in the application, and none of them was received
>>> in application. I also checked some hashtags (e.g. #android) on Twitter
>>> using Live and I could see that almost each second something was posted
>>> with that hashtag, and my application received only 3-4 posts in one minute.
>>>
>>> I didn't have this problem in earlier non-spark version of application
>>> which used twitter4j to access user stream API. I guess this is some
>>> trending stream, but I couldn't find anything that explains which Twitter
>>> API is used in Spark Twitter Streaming and how to create stream that will
>>> access everything posted on the Twitter.
>>>
>>> I hope somebody could explain what is the problem and how to solve this.
>>>
>>> Thanks,
>>> Zoran
>>>
>>>
>>>  def initializeStreaming(){
>>>>    val config = getTwitterConfigurationBuilder.build()
>>>>    val auth: Option[twitter4j.auth.Authorization] = Some(new
>>>> twitter4j.auth.OAuthAuthorization(config))
>>>>    val stream:DStream[Status]  = TwitterUtils.createStream(ssc, auth)
>>>>    val filtered_statuses = stream.transform(rdd =>{
>>>>     val filtered = rdd.filter(status =>{
>>>>     var found = false
>>>>         for(tag <- hashTagsList){
>>>>           if(status.getText.toLowerCase.contains(tag)) {
>>>>             found = true
>>>>             }
>>>>         }
>>>>         found
>>>>       })
>>>>       filtered
>>>>     })
>>>>     filtered_statuses.foreachRDD(rdd => {
>>>>       rdd.collect.foreach(t => {
>>>>         println(t)
>>>>       })
>>>>    })
>>>>     ssc.start()
>>>>   }
>>>>
>>>
>>
>>
>>
>>
>


-- 
*******************************************************************************
Zoran Jeremic, PhD
Senior System Analyst & Programmer

Athabasca University
Tel: +1 604 92 89 944
E-mail: zoran.jere...@gmail.com <zoran.jere...@va.mod.gov.rs>
Homepage:  http://zoranjeremic.org
**********************************************************************************

Reply via email to