[jira] [Commented] (SPARK-13065) streaming-twitter pass twitter4j.FilterQuery argument to TwitterUtils.createStream()

2016-02-03 Thread Andrew Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130923#comment-15130923
 ] 

Andrew Davidson commented on SPARK-13065:
-

The code looks really nice

well done

Andy

> streaming-twitter pass twitter4j.FilterQuery argument to 
> TwitterUtils.createStream()
> 
>
> Key: SPARK-13065
> URL: https://issues.apache.org/jira/browse/SPARK-13065
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: all
>Reporter: Andrew Davidson
>Priority: Minor
>  Labels: twitter
> Attachments: twitterFilterQueryPatch.tar.gz
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The twitter stream api is very powerful provides a lot of support for 
> twitter.com side filtering of status objects. When ever possible we want to 
> let twitter do as much work as possible for us.
> currently the spark twitter api only allows you to configure a small sub set 
> of possible filters 
> String{} filters = {"tag1", tag2"}
> JavaDStream tweets =TwitterUtils.createStream(ssc, twitterAuth, 
> filters);
> The current implemenation does 
> private[streaming]
> class TwitterReceiver(
> twitterAuth: Authorization,
> filters: Seq[String],
> storageLevel: StorageLevel
>   ) extends Receiver[Status](storageLevel) with Logging {
> . . .
>   val query = new FilterQuery
>   if (filters.size > 0) {
> query.track(filters.mkString(","))
> newTwitterStream.filter(query)
>   } else {
> newTwitterStream.sample()
>   }
> ...
> rather than construct the FilterQuery object in TwitterReceiver.onStart(). we 
> should be able to pass a FilterQueryObject
> looks like an easy fix. See source code links bellow
> kind regards
> Andy
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L60
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89
> $ 2/2/16
> attached is my java implementation for this problem. Feel free to reuse it 
> how ever you like. In my streaming spark app main() I have the following code
>FilterQuery query = config.getFilterQuery().fetch();
> if (query != null) {
> // TODO https://issues.apache.org/jira/browse/SPARK-13065
> tweets = TwitterFilterQueryUtils.createStream(ssc, twitterAuth, 
> query);
> } /*else 
> spark native api
> String[] filters = {"tag1", tag2"}
> tweets = TwitterUtils.createStream(ssc, twitterAuth, filters);
> 
> see 
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89
> 
> causes
>  val query = new FilterQuery
>   if (filters.size > 0) {
> query.track(filters.mkString(","))
> newTwitterStream.filter(query)
> } */



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13065) streaming-twitter pass twitter4j.FilterQuery argument to TwitterUtils.createStream()

2016-02-02 Thread sachin aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129802#comment-15129802
 ] 

sachin aggarwal commented on SPARK-13065:
-

happy to see, thats exactly what I have added have a look at this file to see 
how to use new API for java use case :-
https://github.com/agsachin/spark/blob/SPARK-13065/external/twitter/src/test/java/org/apache/spark/streaming/twitter/JavaTwitterStreamSuite.java
 
and for scala check this out 
https://github.com/agsachin/spark/blob/SPARK-13065/external/twitter/src/test/scala/org/apache/spark/streaming/twitter/TwitterStreamSuite.scala

> streaming-twitter pass twitter4j.FilterQuery argument to 
> TwitterUtils.createStream()
> 
>
> Key: SPARK-13065
> URL: https://issues.apache.org/jira/browse/SPARK-13065
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: all
>Reporter: Andrew Davidson
>Priority: Minor
>  Labels: twitter
> Attachments: twitterFilterQueryPatch.tar.gz
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The twitter stream api is very powerful provides a lot of support for 
> twitter.com side filtering of status objects. When ever possible we want to 
> let twitter do as much work as possible for us.
> currently the spark twitter api only allows you to configure a small sub set 
> of possible filters 
> String{} filters = {"tag1", tag2"}
> JavaDStream tweets =TwitterUtils.createStream(ssc, twitterAuth, 
> filters);
> The current implemenation does 
> private[streaming]
> class TwitterReceiver(
> twitterAuth: Authorization,
> filters: Seq[String],
> storageLevel: StorageLevel
>   ) extends Receiver[Status](storageLevel) with Logging {
> . . .
>   val query = new FilterQuery
>   if (filters.size > 0) {
> query.track(filters.mkString(","))
> newTwitterStream.filter(query)
>   } else {
> newTwitterStream.sample()
>   }
> ...
> rather than construct the FilterQuery object in TwitterReceiver.onStart(). we 
> should be able to pass a FilterQueryObject
> looks like an easy fix. See source code links bellow
> kind regards
> Andy
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L60
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89
> $ 2/2/16
> attached is my java implementation for this problem. Feel free to reuse it 
> how ever you like. In my streaming spark app main() I have the following code
>FilterQuery query = config.getFilterQuery().fetch();
> if (query != null) {
> // TODO https://issues.apache.org/jira/browse/SPARK-13065
> tweets = TwitterFilterQueryUtils.createStream(ssc, twitterAuth, 
> query);
> } /*else 
> spark native api
> String[] filters = {"tag1", tag2"}
> tweets = TwitterUtils.createStream(ssc, twitterAuth, filters);
> 
> see 
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89
> 
> causes
>  val query = new FilterQuery
>   if (filters.size > 0) {
> query.track(filters.mkString(","))
> newTwitterStream.filter(query)
> } */



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13065) streaming-twitter pass twitter4j.FilterQuery argument to TwitterUtils.createStream()

2016-02-02 Thread sachin aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127925#comment-15127925
 ] 

sachin aggarwal commented on SPARK-13065:
-

List of changes:
1) Added support for passing FilterQuery object instead of just Seq of keywords
2) Java had more flexible Api syntax than that of Scala so added similar Api 
syntax for Scala also 
3) added test cases for the all the new Api's 


> streaming-twitter pass twitter4j.FilterQuery argument to 
> TwitterUtils.createStream()
> 
>
> Key: SPARK-13065
> URL: https://issues.apache.org/jira/browse/SPARK-13065
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: all
>Reporter: Andrew Davidson
>Priority: Minor
>  Labels: twitter
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The twitter stream api is very powerful provides a lot of support for 
> twitter.com side filtering of status objects. When ever possible we want to 
> let twitter do as much work as possible for us.
> currently the spark twitter api only allows you to configure a small sub set 
> of possible filters 
> String{} filters = {"tag1", tag2"}
> JavaDStream tweets =TwitterUtils.createStream(ssc, twitterAuth, 
> filters);
> The current implemenation does 
> private[streaming]
> class TwitterReceiver(
> twitterAuth: Authorization,
> filters: Seq[String],
> storageLevel: StorageLevel
>   ) extends Receiver[Status](storageLevel) with Logging {
> . . .
>   val query = new FilterQuery
>   if (filters.size > 0) {
> query.track(filters.mkString(","))
> newTwitterStream.filter(query)
>   } else {
> newTwitterStream.sample()
>   }
> ...
> rather than construct the FilterQuery object in TwitterReceiver.onStart(). we 
> should be able to pass a FilterQueryObject
> looks like an easy fix. See source code links bellow
> kind regards
> Andy
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L60
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13065) streaming-twitter pass twitter4j.FilterQuery argument to TwitterUtils.createStream()

2016-02-02 Thread sachin aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127935#comment-15127935
 ] 

sachin aggarwal commented on SPARK-13065:
-

[~aedwip] 
I got a doubt after reading ur last comment you mentioned FilterQuery in 
description and here you are addressing twitter4j.query, please clarify.  

> streaming-twitter pass twitter4j.FilterQuery argument to 
> TwitterUtils.createStream()
> 
>
> Key: SPARK-13065
> URL: https://issues.apache.org/jira/browse/SPARK-13065
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: all
>Reporter: Andrew Davidson
>Priority: Minor
>  Labels: twitter
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The twitter stream api is very powerful provides a lot of support for 
> twitter.com side filtering of status objects. When ever possible we want to 
> let twitter do as much work as possible for us.
> currently the spark twitter api only allows you to configure a small sub set 
> of possible filters 
> String{} filters = {"tag1", tag2"}
> JavaDStream tweets =TwitterUtils.createStream(ssc, twitterAuth, 
> filters);
> The current implemenation does 
> private[streaming]
> class TwitterReceiver(
> twitterAuth: Authorization,
> filters: Seq[String],
> storageLevel: StorageLevel
>   ) extends Receiver[Status](storageLevel) with Logging {
> . . .
>   val query = new FilterQuery
>   if (filters.size > 0) {
> query.track(filters.mkString(","))
> newTwitterStream.filter(query)
>   } else {
> newTwitterStream.sample()
>   }
> ...
> rather than construct the FilterQuery object in TwitterReceiver.onStart(). we 
> should be able to pass a FilterQueryObject
> looks like an easy fix. See source code links bellow
> kind regards
> Andy
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L60
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13065) streaming-twitter pass twitter4j.FilterQuery argument to TwitterUtils.createStream()

2016-02-01 Thread Andrew Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126931#comment-15126931
 ] 

Andrew Davidson commented on SPARK-13065:
-

Thanks Sachin

I do not know Scala. As a temporary work around I would up rewriting the 
twitter stuff in Java so I could pass twitter4j.query object. It would be much 
better to improve the spark api

Andy

> streaming-twitter pass twitter4j.FilterQuery argument to 
> TwitterUtils.createStream()
> 
>
> Key: SPARK-13065
> URL: https://issues.apache.org/jira/browse/SPARK-13065
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: all
>Reporter: Andrew Davidson
>Priority: Minor
>  Labels: twitter
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The twitter stream api is very powerful provides a lot of support for 
> twitter.com side filtering of status objects. When ever possible we want to 
> let twitter do as much work as possible for us.
> currently the spark twitter api only allows you to configure a small sub set 
> of possible filters 
> String{} filters = {"tag1", tag2"}
> JavaDStream tweets =TwitterUtils.createStream(ssc, twitterAuth, 
> filters);
> The current implemenation does 
> private[streaming]
> class TwitterReceiver(
> twitterAuth: Authorization,
> filters: Seq[String],
> storageLevel: StorageLevel
>   ) extends Receiver[Status](storageLevel) with Logging {
> . . .
>   val query = new FilterQuery
>   if (filters.size > 0) {
> query.track(filters.mkString(","))
> newTwitterStream.filter(query)
>   } else {
> newTwitterStream.sample()
>   }
> ...
> rather than construct the FilterQuery object in TwitterReceiver.onStart(). we 
> should be able to pass a FilterQueryObject
> looks like an easy fix. See source code links bellow
> kind regards
> Andy
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L60
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13065) streaming-twitter pass twitter4j.FilterQuery argument to TwitterUtils.createStream()

2016-02-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126227#comment-15126227
 ] 

Apache Spark commented on SPARK-13065:
--

User 'agsachin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11003

> streaming-twitter pass twitter4j.FilterQuery argument to 
> TwitterUtils.createStream()
> 
>
> Key: SPARK-13065
> URL: https://issues.apache.org/jira/browse/SPARK-13065
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: all
>Reporter: Andrew Davidson
>Priority: Minor
>  Labels: twitter
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The twitter stream api is very powerful provides a lot of support for 
> twitter.com side filtering of status objects. When ever possible we want to 
> let twitter do as much work as possible for us.
> currently the spark twitter api only allows you to configure a small sub set 
> of possible filters 
> String{} filters = {"tag1", tag2"}
> JavaDStream tweets =TwitterUtils.createStream(ssc, twitterAuth, 
> filters);
> The current implemenation does 
> private[streaming]
> class TwitterReceiver(
> twitterAuth: Authorization,
> filters: Seq[String],
> storageLevel: StorageLevel
>   ) extends Receiver[Status](storageLevel) with Logging {
> . . .
>   val query = new FilterQuery
>   if (filters.size > 0) {
> query.track(filters.mkString(","))
> newTwitterStream.filter(query)
>   } else {
> newTwitterStream.sample()
>   }
> ...
> rather than construct the FilterQuery object in TwitterReceiver.onStart(). we 
> should be able to pass a FilterQueryObject
> looks like an easy fix. See source code links bellow
> kind regards
> Andy
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L60
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13065) streaming-twitter pass twitter4j.FilterQuery argument to TwitterUtils.createStream()

2016-01-31 Thread sachin aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125834#comment-15125834
 ] 

sachin aggarwal commented on SPARK-13065:
-

I would like to work on this , will issue a pull request soon..

> streaming-twitter pass twitter4j.FilterQuery argument to 
> TwitterUtils.createStream()
> 
>
> Key: SPARK-13065
> URL: https://issues.apache.org/jira/browse/SPARK-13065
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: all
>Reporter: Andrew Davidson
>Priority: Minor
>  Labels: twitter
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The twitter stream api is very powerful provides a lot of support for 
> twitter.com side filtering of status objects. When ever possible we want to 
> let twitter do as much work as possible for us.
> currently the spark twitter api only allows you to configure a small sub set 
> of possible filters 
> String{} filters = {"tag1", tag2"}
> JavaDStream tweets =TwitterUtils.createStream(ssc, twitterAuth, 
> filters);
> The current implemenation does 
> private[streaming]
> class TwitterReceiver(
> twitterAuth: Authorization,
> filters: Seq[String],
> storageLevel: StorageLevel
>   ) extends Receiver[Status](storageLevel) with Logging {
> . . .
>   val query = new FilterQuery
>   if (filters.size > 0) {
> query.track(filters.mkString(","))
> newTwitterStream.filter(query)
>   } else {
> newTwitterStream.sample()
>   }
> ...
> rather than construct the FilterQuery object in TwitterReceiver.onStart(). we 
> should be able to pass a FilterQueryObject
> looks like an easy fix. See source code links bellow
> kind regards
> Andy
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L60
> https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org