>> As for the cast-record, if we can add advanced type converting that will be great.
I guess the flow could be a top-level JSON object (tuple) --> fully open Asterix Record --> record with a required type. To change the cast-record function, you can take a look at the code here: https://github.com/apache/incubator-asterixdb/tree/master/asterix-om/src/main/java/org/apache/asterix/om/pointables/cast Best, Yingyi On Mon, Feb 22, 2016 at 10:40 PM, Jianfeng Jia <[email protected]> wrote: > I’ve created an issue 1318 < > https://issues.apache.org/jira/browse/ASTERIXDB-1318> wrt recovering the > missing fields from the Twitter Stream JSON. > > As for the cast-record, if we can add advanced type converting that will > be great. > > > On Feb 22, 2016, at 10:06 PM, Yingyi Bu <[email protected]> wrote: > > > >>> Maybe something we'd need for extra credit would be - if the data is > > targeted at a dataset with "more schema" then the incoming wide open > > records - >> the ability to do field level type conversions at the point > of > > entry into a dataset by calling the appropriate constructors with the > > incoming string values? > > > > I guess we can have an enhanced version of the cast-record function to do > > that? It already considers the combination of complex types, > > open-closeness, and type promotions. Maybe we can to enhance that with > > temporal/spatial constructors? > > > > Best, > > Yingyi > > > > > > On Mon, Feb 22, 2016 at 8:50 PM, Mike Carey <[email protected]> wrote: > > > >> We should definitely not be pulling in a subset of fields at the entry > >> point - that's what the UDF is for (it can trim off or add or convert > >> fields) - agreed. Why not have the out-of-the-box adaptor simply keep > all > >> of the fields in their incoming form? Maybe something we'd need for > extra > >> credit would be - if the data is targeted at a dataset with "more > schema" > >> then the incoming wide open records - the ability to do field level type > >> conversions at the point of entry into a dataset by calling the > appropriate > >> constructors with the incoming string values? > >> > >> > >> On 2/22/16 4:46 PM, Jianfeng Jia wrote: > >> > >>> Dear devs, > >>> > >>> TwitterFeedAdapter is nice, but the internal TweetParser have some > >>> limitations. > >>> 1. We only pick a few JSON field, e.g. user, geolocation, message > field. > >>> I need the place field. Also there are also some other fields the other > >>> application may also interested in. > >>> 2. The text fields always call getNormalizedString() to filter out the > >>> non-ascii chars, which is a big loss of information. Even for the > English > >>> txt there are emojis which are not “nomal” > >>> > >>> Apparently we can add the entire twitter structure into this parser. > I’m > >>> wondering if the current one-to-one mapping between Adapter and Parser > >>> design is the best approach? The twitter data itself changes. Also > there > >>> are a lot of interesting open data resources, e.g. Instagram,FaceBook, > >>> Weibo, Reddit …. Could we have a general approach for all these data > >>> sources? > >>> > >>> I’m thinking to have some field level JSON to ADM parsers > >>> (int,double,string,binary,point,time,polygon…). Then by given the > schema > >>> option through Adapter we can easily assemble the field into one > record. > >>> The schema option could be a field mapping between original JSON id > and the > >>> ADM type, e.g. { “id”:Int64, “user”: { “userid”: int64,..} }. As such, > we > >>> don’t have to write the specific parser for different data source. > >>> > >>> Another thoughts is to just give the JSON object as it is, and rely on > >>> the user’s UDF to parse the data. Again, even in this case, user can > >>> selectively override several field parsers that are different from > ours. > >>> > >>> Any thoughts? > >>> > >>> > >>> Best, > >>> > >>> Jianfeng Jia > >>> PhD Candidate of Computer Science > >>> University of California, Irvine > >>> > >>> > >>> > >> > > > > Best, > > Jianfeng Jia > PhD Candidate of Computer Science > University of California, Irvine > >
