+1 On Tue, Feb 23, 2016 at 8:26 PM, Chen Li <[email protected]> wrote:
> If the fields provided by twitter4j are good enough, I prefer option 1. It > would be good to avoid a separate request to Twitter due to the overhead. > > Chen > > On Tue, Feb 23, 2016 at 12:13 AM, Jianfeng Jia <[email protected]> > wrote: > > > Good to know there is another request inside twitter4j. > > I think given the popularity of twitter4j, if we can parse all the fields > > in list 1 to ADM then it will be good enough. > > > > > On Feb 23, 2016, at 12:00 AM, abdullah alamoudi <[email protected]> > > wrote: > > > > > > Jianfeng, > > > We are using twitter4j api to get tweets as Status objects. I believe > > that > > > twitter4j itself discards the original JSON when creating Status > objects. > > > They provide a method to get the full json: > > > > > > String rawJSON = DataObjectFactory.getRawJSON(status); > > > > > > This method however sends another request to Twitter to get the > original > > > JSON. > > > We have a few choices: > > > 1. be okay with what twitter4j keeps {CreatedAt, Id, Text, Source, > > > isTruncated, InReplyToStatusId, InReplyToUserId, InReplyToScreenName, > > > GeoLocation, Place, isFavorited, isRetweeted, FavoriteCount, User, > > > isRetweet, RetweetedStatus, Contributors, RetweetCount, > isRetweetedByMe, > > > CurrentUserRetweetId, PossiblySensitive, Lang,Scopes, > > WithheldInCountries}. > > > However this means that we will not get additional feeds in case the > > actual > > > data structure change. We can actually change this into JSON object > using > > > the method above and then we can use our ADM parser to parse it. > > > > > > 2. Instead of relying on twitter4j, we should be able to get the JSON > > > objects directly using http requests to twitter. This way always gives > us > > > the complete JSON object as it comes from twitter.com and we will get > > new > > > fields the moment they are added. > > > > > > I think either way should be fine and I actually think that we should > > stick > > > to twitter4j for now and still use a specialized tweet parser which > will > > > simply transform the objects fields into ADM fields unless there is a > > > strong need for fields that are not covered by the list in (1). > > > > > > My 2c, > > > Abdullah. > > > > > > > > > > > > On Tue, Feb 23, 2016 at 3:46 AM, Jianfeng Jia <[email protected]> > > > wrote: > > > > > >> Dear devs, > > >> > > >> TwitterFeedAdapter is nice, but the internal TweetParser have some > > >> limitations. > > >> 1. We only pick a few JSON field, e.g. user, geolocation, message > > field. I > > >> need the place field. Also there are also some other fields the other > > >> application may also interested in. > > >> 2. The text fields always call getNormalizedString() to filter out the > > >> non-ascii chars, which is a big loss of information. Even for the > > English > > >> txt there are emojis which are not “nomal” > > >> > > >> Apparently we can add the entire twitter structure into this parser. > I’m > > >> wondering if the current one-to-one mapping between Adapter and Parser > > >> design is the best approach? The twitter data itself changes. Also > there > > >> are a lot of interesting open data resources, e.g. Instagram,FaceBook, > > >> Weibo, Reddit …. Could we have a general approach for all these data > > >> sources? > > >> > > >> I’m thinking to have some field level JSON to ADM parsers > > >> (int,double,string,binary,point,time,polygon…). Then by given the > schema > > >> option through Adapter we can easily assemble the field into one > record. > > >> The schema option could be a field mapping between original JSON id > and > > the > > >> ADM type, e.g. { “id”:Int64, “user”: { “userid”: int64,..} }. As such, > > we > > >> don’t have to write the specific parser for different data source. > > >> > > >> Another thoughts is to just give the JSON object as it is, and rely on > > the > > >> user’s UDF to parse the data. Again, even in this case, user can > > >> selectively override several field parsers that are different from > ours. > > >> > > >> Any thoughts? > > >> > > >> > > >> Best, > > >> > > >> Jianfeng Jia > > >> PhD Candidate of Computer Science > > >> University of California, Irvine > > >> > > >> > > > > > > > > Best, > > > > Jianfeng Jia > > PhD Candidate of Computer Science > > University of California, Irvine > > > > >
