[
https://issues.apache.org/jira/browse/BAHIR-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016020#comment-16016020
]
ASF GitHub Bot commented on BAHIR-117:
--------------------------------------
Github user c-w commented on a diff in the pull request:
https://github.com/apache/bahir/pull/43#discussion_r117292632
--- Diff:
streaming-twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
---
@@ -85,10 +85,8 @@ class TwitterReceiver(
}
})
- val query = new FilterQuery
- if (filters.size > 0) {
- query.track(filters.mkString(","))
- newTwitterStream.filter(query)
+ if (query.isDefined) {
+ newTwitterStream.filter(query.get)
} else {
--- End diff --
As I mentioned in the PR description, the limitation of hiding the
FilterQuery from the user is that we are only able to filter the Twitter stream
via disjunctive keyword queries:
```scala
// this will give us any Tweet that contains "foo", "bar" or "baz"
val tweets = TwitterUtils.createStream(ssc, Seq("foo", "bar", "baz"));
```
However, the Twitter stream API also supports many other types of
filtering, including:
- Receive Tweets that are tagged at a particular location (ref:
[locations](https://dev.twitter.com/streaming/overview/request-parameters#locations))
- Receive Tweets created by specific users (ref:
[follow](https://dev.twitter.com/streaming/overview/request-parameters#follow))
- Receive Tweets that match a conjunction of keywords (ref: [track with
spaces](https://dev.twitter.com/streaming/overview/request-parameters#track))
Refer to Twitter's [official
documentation](https://dev.twitter.com/streaming/overview/request-parameters)
for a full list of all supported filters.
By exposing the FilterQuery, we enable users to make use of all of these
powerful filters and any future filters that Twitter may introduce.
> Expand filtering options for TwitterInputDStream
> ------------------------------------------------
>
> Key: BAHIR-117
> URL: https://issues.apache.org/jira/browse/BAHIR-117
> Project: Bahir
> Issue Type: Improvement
> Components: Spark Streaming Connectors
> Reporter: Clemens Wolff
>
> Currently, the TwitterInputDStream only supports filtering by keywords [1]
> which corresponds to the "track" option in the Twitter API [2]. The Twitter
> API supports many more ways to receive a filtered stream (e.g. get Tweets in
> a particular location [3]). It would be very useful to expose these
> additional filtering options in this library.
> Proposal: add a new public method to TwitterUtils which follows the same
> interface as createStream [4] but which takes a FilterQuery [5] object as
> argument. In this way, we give full filtering flexibility to our users.
> I'm currently working on Project Fortis, a social data analysis platform for
> the United Nations [6]. The extra filtering options would be very useful for
> my project so I'm happy to implement this and create a pull request.
> [1]
> https://github.com/apache/bahir/blob/fd4c35fc9f7ebb57464d231cf5d66e7bc4096a1b/streaming-twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L44
> [2] https://dev.twitter.com/streaming/overview/request-parameters#track
> [3] https://dev.twitter.com/streaming/overview/request-parameters#locations
> [4]
> https://github.com/apache/bahir/blob/fd4c35fc9f7ebb57464d231cf5d66e7bc4096a1b/streaming-twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala#L39
> [5] http://twitter4j.org/javadoc/twitter4j/FilterQuery.html
> [6] https://fortis-web.azurewebsites.net/#/site/ocha/
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)