[ 
https://issues.apache.org/jira/browse/SPARK-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126936#comment-15126936
 ] 

Andrew Davidson commented on SPARK-13009:
-----------------------------------------

Agreed. they ask me to file a spark RFE. I post your comment back to them and 
see what they say. 

If the StatusJSONImpl class was not marked final it would be easy for spark to 
create a the wrapper.

Andhy

> spark-streaming-twitter_2.10 does not make it possible to access the raw 
> twitter json
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-13009
>                 URL: https://issues.apache.org/jira/browse/SPARK-13009
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.6.0
>            Reporter: Andrew Davidson
>            Priority: Blocker
>              Labels: twitter
>
> The Streaming-twitter package makes it easy for Java programmers to work with 
> twitter. The implementation returns the raw twitter data in JSON formate as a 
> twitter4J StatusJSONImpl object
> JavaDStream<Status> tweets = TwitterUtils.createStream(ssc, twitterAuth);
> The status class is different then the raw JSON. I.E. serializing the status 
> object will be the same as the original json. I have down stream systems that 
> can only process raw tweets not twitter4J Status objects. 
> Here is my bug/RFE request made to Twitter4J <twitte...@googlegroups.com>. 
> They asked  I create a spark tracking issue.
> On Thursday, January 21, 2016 at 6:27:25 PM UTC, Andy Davidson wrote:
> Hi All
> Quick problem summary:
> My system uses the Status objects to do some analysis how ever I need to 
> store the raw JSON. There are other systems that process that data that are 
> not written in Java.
> Currently we are serializing the Status Object. The JSON is going to break 
> down stream systems.
> I am using the Apache Spark Streaming spark-streaming-twitter_2.10  
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#advanced-sources
> Request For Enhancement:
> I imagine easy access to the raw JSON is a common requirement. Would it be 
> possible to add a member function to StatusJSONImpl getRawJson(). By default 
> the returned value would be null unless jsonStoreEnabled=True  is set in the 
> config.
> Alternative implementations:
>  
> It should be possible to modify the spark-streaming-twitter_2.10 to provide 
> this support. The solutions is not very clean
> It would required apache spark to define their own Status Pojo. The current 
> StatusJSONImpl class is marked final
> The Wrapper is not going to work nicely with existing code.
> spark-streaming-twitter_2.10  does not expose all of the twitter streaming 
> API so many developers are writing their implementations of 
> org.apache.park.streaming.twitter.TwitterInputDStream. This make maintenance 
> difficult. Its not easy to know when the spark implementation for twitter has 
> changed. 
> Code listing for 
> spark-1.6.0/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala
> private[streaming]
> class TwitterReceiver(
>     twitterAuth: Authorization,
>     filters: Seq[String],
>     storageLevel: StorageLevel
>   ) extends Receiver[Status](storageLevel) with Logging {
>   @volatile private var twitterStream: TwitterStream = _
>   @volatile private var stopped = false
>   def onStart() {
>     try {
>       val newTwitterStream = new 
> TwitterStreamFactory().getInstance(twitterAuth)
>       newTwitterStream.addListener(new StatusListener {
>         def onStatus(status: Status): Unit = {
>           store(status)
>         }
> Ref: 
> https://forum.processing.org/one/topic/saving-json-data-from-twitter4j.html
> What do people think?
> Kind regards
> Andy
> From: <twit...@googlegroups.com> on behalf of Igor Brigadir 
> <igor.b...@ucdconnect.ie>
> Reply-To: <twit...@googlegroups.com>
> Date: Tuesday, January 19, 2016 at 5:55 AM
> To: Twitter4J <twit...@googlegroups.com>
> Subject: Re: [Twitter4J] trouble writing unit test
> Main issue is that the Json object is in the wrong json format.
> eg: "createdAt": 1449775664000 should be "created_at": "Thu Dec 10 19:27:44 
> +0000 2015", ...
> It looks like the json you have was serialized from a java Status object, 
> which makes json objects different to what you get from the API, 
> TwitterObjectFactory expects json from Twitter (I haven't had any problems 
> using TwitterObjectFactory instead of the Deprecated DataObjectFactory).
> You could "fix" it by matching the keys & values you have with the correct, 
> twitter API json - it should look like the example here: 
> https://dev.twitter.com/rest/reference/get/statuses/show/%3Aid
> But it might be easier to download the tweets again, but this time use 
> TwitterObjectFactory.getRawJSON(status) to get the Original Json from the 
> Twitter API, and save that for later. (You must have jsonStoreEnabled=True in 
> your config, and call getRawJSON in the same thread as .showStatus() or 
> lookup() or whatever you're using to load tweets.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to