[ https://issues.apache.org/jira/browse/BAHIR-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016020#comment-16016020 ]
ASF GitHub Bot commented on BAHIR-117: -------------------------------------- Github user c-w commented on a diff in the pull request: https://github.com/apache/bahir/pull/43#discussion_r117292632 --- Diff: streaming-twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala --- @@ -85,10 +85,8 @@ class TwitterReceiver( } }) - val query = new FilterQuery - if (filters.size > 0) { - query.track(filters.mkString(",")) - newTwitterStream.filter(query) + if (query.isDefined) { + newTwitterStream.filter(query.get) } else { --- End diff -- As I mentioned in the PR description, the limitation of hiding the FilterQuery from the user is that we are only able to filter the Twitter stream via disjunctive keyword queries: ```scala // this will give us any Tweet that contains "foo", "bar" or "baz" val tweets = TwitterUtils.createStream(ssc, Seq("foo", "bar", "baz")); ``` However, the Twitter stream API also supports many other types of filtering, including: - Receive Tweets that are tagged at a particular location (ref: [locations](https://dev.twitter.com/streaming/overview/request-parameters#locations)) - Receive Tweets created by specific users (ref: [follow](https://dev.twitter.com/streaming/overview/request-parameters#follow)) - Receive Tweets that match a conjunction of keywords (ref: [track with spaces](https://dev.twitter.com/streaming/overview/request-parameters#track)) Refer to Twitter's [official documentation](https://dev.twitter.com/streaming/overview/request-parameters) for a full list of all supported filters. By exposing the FilterQuery, we enable users to make use of all of these powerful filters and any future filters that Twitter may introduce. > Expand filtering options for TwitterInputDStream > ------------------------------------------------ > > Key: BAHIR-117 > URL: https://issues.apache.org/jira/browse/BAHIR-117 > Project: Bahir > Issue Type: Improvement > Components: Spark Streaming Connectors > Reporter: Clemens Wolff > > Currently, the TwitterInputDStream only supports filtering by keywords [1] > which corresponds to the "track" option in the Twitter API [2]. The Twitter > API supports many more ways to receive a filtered stream (e.g. get Tweets in > a particular location [3]). It would be very useful to expose these > additional filtering options in this library. > Proposal: add a new public method to TwitterUtils which follows the same > interface as createStream [4] but which takes a FilterQuery [5] object as > argument. In this way, we give full filtering flexibility to our users. > I'm currently working on Project Fortis, a social data analysis platform for > the United Nations [6]. The extra filtering options would be very useful for > my project so I'm happy to implement this and create a pull request. > [1] > https://github.com/apache/bahir/blob/fd4c35fc9f7ebb57464d231cf5d66e7bc4096a1b/streaming-twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L44 > [2] https://dev.twitter.com/streaming/overview/request-parameters#track > [3] https://dev.twitter.com/streaming/overview/request-parameters#locations > [4] > https://github.com/apache/bahir/blob/fd4c35fc9f7ebb57464d231cf5d66e7bc4096a1b/streaming-twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala#L39 > [5] http://twitter4j.org/javadoc/twitter4j/FilterQuery.html > [6] https://fortis-web.azurewebsites.net/#/site/ocha/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)