[jira] [Commented] (SPARK-13009) spark-streaming-twitter_2.10 does not make it possible to access the raw twitter json

2016-02-02 Thread Andrew Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128843#comment-15128843
 ] 

Andrew Davidson commented on SPARK-13009:
-

I Sean

I total agree with you. The Twitter4j people asked me to file a RFE with spark. 
I agree it is their problem.  I just looking for some sort of work around. My 
down stream systems will not be able to process the data I am capturing.

I guess in the short term I create the wrapper object and modify the spark 
twitter source code

kind regards

Andy

> spark-streaming-twitter_2.10 does not make it possible to access the raw 
> twitter json
> -
>
> Key: SPARK-13009
> URL: https://issues.apache.org/jira/browse/SPARK-13009
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Andrew Davidson
>Priority: Minor
>
> The Streaming-twitter package makes it easy for Java programmers to work with 
> twitter. The implementation returns the raw twitter data in JSON formate as a 
> twitter4J StatusJSONImpl object
> JavaDStream tweets = TwitterUtils.createStream(ssc, twitterAuth);
> The status class is different then the raw JSON. I.E. serializing the status 
> object will be the same as the original json. I have down stream systems that 
> can only process raw tweets not twitter4J Status objects. 
> Here is my bug/RFE request made to Twitter4J . 
> They asked  I create a spark tracking issue.
> On Thursday, January 21, 2016 at 6:27:25 PM UTC, Andy Davidson wrote:
> Hi All
> Quick problem summary:
> My system uses the Status objects to do some analysis how ever I need to 
> store the raw JSON. There are other systems that process that data that are 
> not written in Java.
> Currently we are serializing the Status Object. The JSON is going to break 
> down stream systems.
> I am using the Apache Spark Streaming spark-streaming-twitter_2.10  
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#advanced-sources
> Request For Enhancement:
> I imagine easy access to the raw JSON is a common requirement. Would it be 
> possible to add a member function to StatusJSONImpl getRawJson(). By default 
> the returned value would be null unless jsonStoreEnabled=True  is set in the 
> config.
> Alternative implementations:
>  
> It should be possible to modify the spark-streaming-twitter_2.10 to provide 
> this support. The solutions is not very clean
> It would required apache spark to define their own Status Pojo. The current 
> StatusJSONImpl class is marked final
> The Wrapper is not going to work nicely with existing code.
> spark-streaming-twitter_2.10  does not expose all of the twitter streaming 
> API so many developers are writing their implementations of 
> org.apache.park.streaming.twitter.TwitterInputDStream. This make maintenance 
> difficult. Its not easy to know when the spark implementation for twitter has 
> changed. 
> Code listing for 
> spark-1.6.0/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
> private[streaming]
> class TwitterReceiver(
> twitterAuth: Authorization,
> filters: Seq[String],
> storageLevel: StorageLevel
>   ) extends Receiver[Status](storageLevel) with Logging {
>   @volatile private var twitterStream: TwitterStream = _
>   @volatile private var stopped = false
>   def onStart() {
> try {
>   val newTwitterStream = new 
> TwitterStreamFactory().getInstance(twitterAuth)
>   newTwitterStream.addListener(new StatusListener {
> def onStatus(status: Status): Unit = {
>   store(status)
> }
> Ref: 
> https://forum.processing.org/one/topic/saving-json-data-from-twitter4j.html
> What do people think?
> Kind regards
> Andy
> From:  on behalf of Igor Brigadir 
> 
> Reply-To: 
> Date: Tuesday, January 19, 2016 at 5:55 AM
> To: Twitter4J 
> Subject: Re: [Twitter4J] trouble writing unit test
> Main issue is that the Json object is in the wrong json format.
> eg: "createdAt": 1449775664000 should be "created_at": "Thu Dec 10 19:27:44 
> + 2015", ...
> It looks like the json you have was serialized from a java Status object, 
> which makes json objects different to what you get from the API, 
> TwitterObjectFactory expects json from Twitter (I haven't had any problems 
> using TwitterObjectFactory instead of the Deprecated DataObjectFactory).
> You could "fix" it by matching the keys & values you have with the correct, 
> twitter API json - it should look like the example here: 
> https://dev.twitter.com/rest/reference/get/statuses/show/%3Aid
> But it might be easier to download the tweets again, but this time use 
> TwitterObjectFactory.getRawJSON(status) to get the Original Json from the 
> Twitter API, and save tha

[jira] [Commented] (SPARK-13009) spark-streaming-twitter_2.10 does not make it possible to access the raw twitter json

2016-02-01 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127718#comment-15127718
 ] 

Mario Briggs commented on SPARK-13009:
--

IMHO even if the StatusJSONImpl was not final, still should be done in Twitter4J

> spark-streaming-twitter_2.10 does not make it possible to access the raw 
> twitter json
> -
>
> Key: SPARK-13009
> URL: https://issues.apache.org/jira/browse/SPARK-13009
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Andrew Davidson
>Priority: Blocker
>  Labels: twitter
>
> The Streaming-twitter package makes it easy for Java programmers to work with 
> twitter. The implementation returns the raw twitter data in JSON formate as a 
> twitter4J StatusJSONImpl object
> JavaDStream tweets = TwitterUtils.createStream(ssc, twitterAuth);
> The status class is different then the raw JSON. I.E. serializing the status 
> object will be the same as the original json. I have down stream systems that 
> can only process raw tweets not twitter4J Status objects. 
> Here is my bug/RFE request made to Twitter4J . 
> They asked  I create a spark tracking issue.
> On Thursday, January 21, 2016 at 6:27:25 PM UTC, Andy Davidson wrote:
> Hi All
> Quick problem summary:
> My system uses the Status objects to do some analysis how ever I need to 
> store the raw JSON. There are other systems that process that data that are 
> not written in Java.
> Currently we are serializing the Status Object. The JSON is going to break 
> down stream systems.
> I am using the Apache Spark Streaming spark-streaming-twitter_2.10  
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#advanced-sources
> Request For Enhancement:
> I imagine easy access to the raw JSON is a common requirement. Would it be 
> possible to add a member function to StatusJSONImpl getRawJson(). By default 
> the returned value would be null unless jsonStoreEnabled=True  is set in the 
> config.
> Alternative implementations:
>  
> It should be possible to modify the spark-streaming-twitter_2.10 to provide 
> this support. The solutions is not very clean
> It would required apache spark to define their own Status Pojo. The current 
> StatusJSONImpl class is marked final
> The Wrapper is not going to work nicely with existing code.
> spark-streaming-twitter_2.10  does not expose all of the twitter streaming 
> API so many developers are writing their implementations of 
> org.apache.park.streaming.twitter.TwitterInputDStream. This make maintenance 
> difficult. Its not easy to know when the spark implementation for twitter has 
> changed. 
> Code listing for 
> spark-1.6.0/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
> private[streaming]
> class TwitterReceiver(
> twitterAuth: Authorization,
> filters: Seq[String],
> storageLevel: StorageLevel
>   ) extends Receiver[Status](storageLevel) with Logging {
>   @volatile private var twitterStream: TwitterStream = _
>   @volatile private var stopped = false
>   def onStart() {
> try {
>   val newTwitterStream = new 
> TwitterStreamFactory().getInstance(twitterAuth)
>   newTwitterStream.addListener(new StatusListener {
> def onStatus(status: Status): Unit = {
>   store(status)
> }
> Ref: 
> https://forum.processing.org/one/topic/saving-json-data-from-twitter4j.html
> What do people think?
> Kind regards
> Andy
> From:  on behalf of Igor Brigadir 
> 
> Reply-To: 
> Date: Tuesday, January 19, 2016 at 5:55 AM
> To: Twitter4J 
> Subject: Re: [Twitter4J] trouble writing unit test
> Main issue is that the Json object is in the wrong json format.
> eg: "createdAt": 1449775664000 should be "created_at": "Thu Dec 10 19:27:44 
> + 2015", ...
> It looks like the json you have was serialized from a java Status object, 
> which makes json objects different to what you get from the API, 
> TwitterObjectFactory expects json from Twitter (I haven't had any problems 
> using TwitterObjectFactory instead of the Deprecated DataObjectFactory).
> You could "fix" it by matching the keys & values you have with the correct, 
> twitter API json - it should look like the example here: 
> https://dev.twitter.com/rest/reference/get/statuses/show/%3Aid
> But it might be easier to download the tweets again, but this time use 
> TwitterObjectFactory.getRawJSON(status) to get the Original Json from the 
> Twitter API, and save that for later. (You must have jsonStoreEnabled=True in 
> your config, and call getRawJSON in the same thread as .showStatus() or 
> lookup() or whatever you're using to load tweets.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---

[jira] [Commented] (SPARK-13009) spark-streaming-twitter_2.10 does not make it possible to access the raw twitter json

2016-02-01 Thread Andrew Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126936#comment-15126936
 ] 

Andrew Davidson commented on SPARK-13009:
-

Agreed. they ask me to file a spark RFE. I post your comment back to them and 
see what they say. 

If the StatusJSONImpl class was not marked final it would be easy for spark to 
create a the wrapper.

Andhy

> spark-streaming-twitter_2.10 does not make it possible to access the raw 
> twitter json
> -
>
> Key: SPARK-13009
> URL: https://issues.apache.org/jira/browse/SPARK-13009
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Andrew Davidson
>Priority: Blocker
>  Labels: twitter
>
> The Streaming-twitter package makes it easy for Java programmers to work with 
> twitter. The implementation returns the raw twitter data in JSON formate as a 
> twitter4J StatusJSONImpl object
> JavaDStream tweets = TwitterUtils.createStream(ssc, twitterAuth);
> The status class is different then the raw JSON. I.E. serializing the status 
> object will be the same as the original json. I have down stream systems that 
> can only process raw tweets not twitter4J Status objects. 
> Here is my bug/RFE request made to Twitter4J . 
> They asked  I create a spark tracking issue.
> On Thursday, January 21, 2016 at 6:27:25 PM UTC, Andy Davidson wrote:
> Hi All
> Quick problem summary:
> My system uses the Status objects to do some analysis how ever I need to 
> store the raw JSON. There are other systems that process that data that are 
> not written in Java.
> Currently we are serializing the Status Object. The JSON is going to break 
> down stream systems.
> I am using the Apache Spark Streaming spark-streaming-twitter_2.10  
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#advanced-sources
> Request For Enhancement:
> I imagine easy access to the raw JSON is a common requirement. Would it be 
> possible to add a member function to StatusJSONImpl getRawJson(). By default 
> the returned value would be null unless jsonStoreEnabled=True  is set in the 
> config.
> Alternative implementations:
>  
> It should be possible to modify the spark-streaming-twitter_2.10 to provide 
> this support. The solutions is not very clean
> It would required apache spark to define their own Status Pojo. The current 
> StatusJSONImpl class is marked final
> The Wrapper is not going to work nicely with existing code.
> spark-streaming-twitter_2.10  does not expose all of the twitter streaming 
> API so many developers are writing their implementations of 
> org.apache.park.streaming.twitter.TwitterInputDStream. This make maintenance 
> difficult. Its not easy to know when the spark implementation for twitter has 
> changed. 
> Code listing for 
> spark-1.6.0/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
> private[streaming]
> class TwitterReceiver(
> twitterAuth: Authorization,
> filters: Seq[String],
> storageLevel: StorageLevel
>   ) extends Receiver[Status](storageLevel) with Logging {
>   @volatile private var twitterStream: TwitterStream = _
>   @volatile private var stopped = false
>   def onStart() {
> try {
>   val newTwitterStream = new 
> TwitterStreamFactory().getInstance(twitterAuth)
>   newTwitterStream.addListener(new StatusListener {
> def onStatus(status: Status): Unit = {
>   store(status)
> }
> Ref: 
> https://forum.processing.org/one/topic/saving-json-data-from-twitter4j.html
> What do people think?
> Kind regards
> Andy
> From:  on behalf of Igor Brigadir 
> 
> Reply-To: 
> Date: Tuesday, January 19, 2016 at 5:55 AM
> To: Twitter4J 
> Subject: Re: [Twitter4J] trouble writing unit test
> Main issue is that the Json object is in the wrong json format.
> eg: "createdAt": 1449775664000 should be "created_at": "Thu Dec 10 19:27:44 
> + 2015", ...
> It looks like the json you have was serialized from a java Status object, 
> which makes json objects different to what you get from the API, 
> TwitterObjectFactory expects json from Twitter (I haven't had any problems 
> using TwitterObjectFactory instead of the Deprecated DataObjectFactory).
> You could "fix" it by matching the keys & values you have with the correct, 
> twitter API json - it should look like the example here: 
> https://dev.twitter.com/rest/reference/get/statuses/show/%3Aid
> But it might be easier to download the tweets again, but this time use 
> TwitterObjectFactory.getRawJSON(status) to get the Original Json from the 
> Twitter API, and save that for later. (You must have jsonStoreEnabled=True in 
> your config, and call getRawJSON in the same thread as .show

[jira] [Commented] (SPARK-13009) spark-streaming-twitter_2.10 does not make it possible to access the raw twitter json

2016-02-01 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125940#comment-15125940
 ] 

Mario Briggs commented on SPARK-13009:
--

Andrew,
 as you yourself noted, i think it is more appropriate that Twitter4J API add 
the method to retrieve the raw JSON, than Spark API go through hoops and 
furthermore this problem exists for all consumers of Twitter4J (nothing 
specific to Spark)

> spark-streaming-twitter_2.10 does not make it possible to access the raw 
> twitter json
> -
>
> Key: SPARK-13009
> URL: https://issues.apache.org/jira/browse/SPARK-13009
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Andrew Davidson
>Priority: Blocker
>  Labels: twitter
>
> The Streaming-twitter package makes it easy for Java programmers to work with 
> twitter. The implementation returns the raw twitter data in JSON formate as a 
> twitter4J StatusJSONImpl object
> JavaDStream tweets = TwitterUtils.createStream(ssc, twitterAuth);
> The status class is different then the raw JSON. I.E. serializing the status 
> object will be the same as the original json. I have down stream systems that 
> can only process raw tweets not twitter4J Status objects. 
> Here is my bug/RFE request made to Twitter4J . 
> They asked  I create a spark tracking issue.
> On Thursday, January 21, 2016 at 6:27:25 PM UTC, Andy Davidson wrote:
> Hi All
> Quick problem summary:
> My system uses the Status objects to do some analysis how ever I need to 
> store the raw JSON. There are other systems that process that data that are 
> not written in Java.
> Currently we are serializing the Status Object. The JSON is going to break 
> down stream systems.
> I am using the Apache Spark Streaming spark-streaming-twitter_2.10  
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#advanced-sources
> Request For Enhancement:
> I imagine easy access to the raw JSON is a common requirement. Would it be 
> possible to add a member function to StatusJSONImpl getRawJson(). By default 
> the returned value would be null unless jsonStoreEnabled=True  is set in the 
> config.
> Alternative implementations:
>  
> It should be possible to modify the spark-streaming-twitter_2.10 to provide 
> this support. The solutions is not very clean
> It would required apache spark to define their own Status Pojo. The current 
> StatusJSONImpl class is marked final
> The Wrapper is not going to work nicely with existing code.
> spark-streaming-twitter_2.10  does not expose all of the twitter streaming 
> API so many developers are writing their implementations of 
> org.apache.park.streaming.twitter.TwitterInputDStream. This make maintenance 
> difficult. Its not easy to know when the spark implementation for twitter has 
> changed. 
> Code listing for 
> spark-1.6.0/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
> private[streaming]
> class TwitterReceiver(
> twitterAuth: Authorization,
> filters: Seq[String],
> storageLevel: StorageLevel
>   ) extends Receiver[Status](storageLevel) with Logging {
>   @volatile private var twitterStream: TwitterStream = _
>   @volatile private var stopped = false
>   def onStart() {
> try {
>   val newTwitterStream = new 
> TwitterStreamFactory().getInstance(twitterAuth)
>   newTwitterStream.addListener(new StatusListener {
> def onStatus(status: Status): Unit = {
>   store(status)
> }
> Ref: 
> https://forum.processing.org/one/topic/saving-json-data-from-twitter4j.html
> What do people think?
> Kind regards
> Andy
> From:  on behalf of Igor Brigadir 
> 
> Reply-To: 
> Date: Tuesday, January 19, 2016 at 5:55 AM
> To: Twitter4J 
> Subject: Re: [Twitter4J] trouble writing unit test
> Main issue is that the Json object is in the wrong json format.
> eg: "createdAt": 1449775664000 should be "created_at": "Thu Dec 10 19:27:44 
> + 2015", ...
> It looks like the json you have was serialized from a java Status object, 
> which makes json objects different to what you get from the API, 
> TwitterObjectFactory expects json from Twitter (I haven't had any problems 
> using TwitterObjectFactory instead of the Deprecated DataObjectFactory).
> You could "fix" it by matching the keys & values you have with the correct, 
> twitter API json - it should look like the example here: 
> https://dev.twitter.com/rest/reference/get/statuses/show/%3Aid
> But it might be easier to download the tweets again, but this time use 
> TwitterObjectFactory.getRawJSON(status) to get the Original Json from the 
> Twitter API, and save that for later. (You must have jsonStoreEnabled=True in 
> your config, and call