[ 
https://issues.apache.org/jira/browse/BAHIR-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016020#comment-16016020
 ] 

ASF GitHub Bot commented on BAHIR-117:
--------------------------------------

Github user c-w commented on a diff in the pull request:

    https://github.com/apache/bahir/pull/43#discussion_r117292632
  
    --- Diff: 
streaming-twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
 ---
    @@ -85,10 +85,8 @@ class TwitterReceiver(
             }
           })
     
    -      val query = new FilterQuery
    -      if (filters.size > 0) {
    -        query.track(filters.mkString(","))
    -        newTwitterStream.filter(query)
    +      if (query.isDefined) {
    +        newTwitterStream.filter(query.get)
           } else {
    --- End diff --
    
    As I mentioned in the PR description, the limitation of hiding the 
FilterQuery from the user is that we are only able to filter the Twitter stream 
via disjunctive keyword queries:
    
    ```scala
    // this will give us any Tweet that contains "foo", "bar" or "baz"
    val tweets = TwitterUtils.createStream(ssc, Seq("foo", "bar", "baz"));
    ```
    
    However, the Twitter stream API also supports many other types of 
filtering, including:
    - Receive Tweets that are tagged at a particular location (ref: 
[locations](https://dev.twitter.com/streaming/overview/request-parameters#locations))
    - Receive Tweets created by specific users (ref: 
[follow](https://dev.twitter.com/streaming/overview/request-parameters#follow))
    - Receive Tweets that match a conjunction of keywords (ref: [track with 
spaces](https://dev.twitter.com/streaming/overview/request-parameters#track))
    
    Refer to Twitter's [official 
documentation](https://dev.twitter.com/streaming/overview/request-parameters) 
for a full list of all supported filters.
    
    By exposing the FilterQuery, we enable users to make use of all of these 
powerful filters and any future filters that Twitter may introduce.


> Expand filtering options for TwitterInputDStream
> ------------------------------------------------
>
>                 Key: BAHIR-117
>                 URL: https://issues.apache.org/jira/browse/BAHIR-117
>             Project: Bahir
>          Issue Type: Improvement
>          Components: Spark Streaming Connectors
>            Reporter: Clemens Wolff
>
> Currently, the TwitterInputDStream only supports filtering by keywords [1] 
> which corresponds to the "track" option in the Twitter API [2]. The Twitter 
> API supports many more ways to receive a filtered stream (e.g. get Tweets in 
> a particular location [3]). It would be very useful to expose these 
> additional filtering options in this library.
> Proposal: add a new public method to TwitterUtils which follows the same 
> interface as createStream [4] but which takes a FilterQuery [5] object as 
> argument. In this way, we give full filtering flexibility to our users.
> I'm currently working on Project Fortis, a social data analysis platform for 
> the United Nations [6]. The extra filtering options would be very useful for 
> my project so I'm happy to implement this and create a pull request.
> [1] 
> https://github.com/apache/bahir/blob/fd4c35fc9f7ebb57464d231cf5d66e7bc4096a1b/streaming-twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L44
> [2] https://dev.twitter.com/streaming/overview/request-parameters#track
> [3] https://dev.twitter.com/streaming/overview/request-parameters#locations
> [4] 
> https://github.com/apache/bahir/blob/fd4c35fc9f7ebb57464d231cf5d66e7bc4096a1b/streaming-twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala#L39
> [5] http://twitter4j.org/javadoc/twitter4j/FilterQuery.html
> [6] https://fortis-web.azurewebsites.net/#/site/ocha/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to