You could get all the tweets in the stream, and then apply filter
transformation on the DStream of tweets to filter away non-english
tweets. The tweets in the DStream is of type twitter4j.Status which
has a field describing the language. You can use that in the filter.
Though in practice, a lot
Thanks for the response. I tried the following :
tweets.filter(_.getLang()=en)
I get a compilation error:
value getLang is not a member of twitter4j.Status
But getLang() is one of the methods of twitter4j.Status since version 3.0.6
according to the doc at:
Small typo in my code in the previous post. That should be:
tweets.filter(_.getLang()==en)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/filtering-out-non-English-tweets-using-TwitterUtils-tp18614p18622.html
Sent from the Apache Spark User List
Hi,
On Wed, Nov 12, 2014 at 5:42 AM, SK skrishna...@gmail.com wrote:
But getLang() is one of the methods of twitter4j.Status since version 3.0.6
according to the doc at:
http://twitter4j.org/javadoc/twitter4j/Status.html#getLang--
What version of twitter4j does Spark Streaming use?
Fwiw if you do decide to handle language detection on your machine this
library works great on tweets https://github.com/carrotsearch/langid-java
On Tue, Nov 11, 2014, 7:52 PM Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
On Wed, Nov 12, 2014 at 5:42 AM, SK skrishna...@gmail.com wrote:
But