Hi Mike, Thanks for pointing that out. I think I misunderstood the working mechanism and misused the terms of 'dataset' and 'datatype'. Sorry about that.
Best, Xikui On Sat, Apr 30, 2016 at 4:30 PM, Mike Carey <[email protected]> wrote: > One nit: This has nothing to do with any dataset definition, on the parser > side of things - it's the type parameter on the create feed DDL statement > that should be the parser's guide. (In general the optional function on > the feed may change the type by the time the data reaches a dataset.) > On Apr 30, 2016 3:26 PM, "Xikui Wang" <[email protected]> wrote: > > > Hi Abdullah, > > > > Actually I also have the concern that adding null-check for general cases > > will bring extra > > overheads. Thus I plan to add the checking procedure after parser, but > > before addTuple, > > i.e.FeedRecordDataFlowController. But based on what I have seen so far, > it > > seems RecordType > > is transparent to FeedRecordDataFlowController. So I am still > investigating > > that... > > > > I saw the null check in ADM parser. That's actually a viable way to > handle > > that within the > > parser scope. But I am looking for a slightly different solution. In my > > perspective, > > ADM parser assumes the input adm should conform with the dataset > > definition. > > Thus it's reasonable for it to throw a exception. For Tweetparser, if I > saw > > null value on non-null attribute, I will > > discard the whole tweet directly, and may not even log it(as too many > > tweets with null). > > That's the reason why I want to put that in FeedRecordDataFlowController, > > since I didn't see > > there is a good way to prevent record insert in parser except for throw > > exception. > > > > Not sure my opinion makes sense or not. Feel free to comment. :) > > > > Best, > > Xikui > > > > On Sat, Apr 30, 2016 at 1:52 PM, abdullah alamoudi <[email protected]> > > wrote: > > > > > Adding a few points here: > > > > > > My feeling is SerializerDeserializer offers another level of > abstraction > > > but with output I can write value directly without construct AType > > object. > > > I am wondering if there are any preferences over these two? > > > > > > - Using The SerializerDeserializer option, you will only create a > single > > > object regardless of the number of parsed records, so I wouldn't worry > > > about it. Code maintainability takes precedence here IMO. > > > - In addition to records and lists, UTF8StringSerializerDeserializer > can > > be > > > stateful for the same reason (avoid creating lost of un-needed > objects). > > In > > > fact, our parsers use the stateful UTF8StringSerializerDeserializer > > since I > > > noticed that using the stateless one creates lots of byte[] and > triggers > > GC > > > over and over. > > > - Right now, we parse missing values as null. Should that change? > > > - There is definitely a check for nulls on non-nullable values at least > > in > > > the ADM parser. There might be a bug however that makes it accept > > explicit > > > null values and that should be fixed. > > > > > > I am for NOT using the cast record solution for the overhead it will > add. > > > but that is just me :) > > > ~Abdullah. > > > > > > > > > On Sat, Apr 30, 2016 at 6:48 AM, Xikui Wang <[email protected]> wrote: > > > > > > > Thank you Yingyi. I will try to figure out a solution from that > > > direction. > > > > > > > > Best, > > > > Xikui > > > > > > > > On Fri, Apr 29, 2016 at 3:48 PM, Yingyi Bu <[email protected]> > wrote: > > > > > > > > > Yeah, I think so:-) > > > > > > > > > > Best, > > > > > Yingyi > > > > > > > > > > On Fri, Apr 29, 2016 at 3:46 PM, Mike Carey <[email protected]> > > wrote: > > > > > > > > > > > This indeed might be cleaner? > > > > > > > > > > > > > > > > > > On 4/29/16 3:28 PM, Yingyi Bu wrote: > > > > > > > > > > > >> I'm guessing that you can do similar things to > > CastRecordDescriptor > > > > > >>>> if you want to handle general cases in that region. > > > > > >>>> > > > > > >>> Or, you can inject a cast-record function in the loading > pipeline > > > > > >> so that you can defer the runtime-type-check/cast to that > function > > > > > instead > > > > > >> of doing that in the parser. > > > > > >> > > > > > >> > > > > > >> On Fri, Apr 29, 2016 at 3:25 PM, Yingyi Bu <[email protected]> > > > > wrote: > > > > > >> > > > > > >> My answer is inlined. > > > > > >>> > > > > > >>> My feeling is SerializerDeserializer offers another level of > > > > > abstraction > > > > > >>>>> but with output I can write value directly without construct > > > AType > > > > > >>>>> > > > > > >>>> object. > > > > > >>> > > > > > >>>> I am wondering if there are any preferences over these two? > > > > > >>>>> > > > > > >>>> I agree with you. However, a SerializerDeserializer has to be > > > > > stateless, > > > > > >>> hence it cannot be used at runtime for complex type objects > such > > as > > > > > >>> records and lists, > > > > > >>> because it will create a lot Java objects. > > > > > >>> > > > > > >>> in other words, parser has to guarantee that the > > > > > >>>>> processed records has to match the dataset > > > definition(non-optional > > > > > >>>>> attribute cannot have null value). I tried to assign null > value > > > to > > > > > >>>>> > > > > > >>>> non-null > > > > > >>> > > > > > >>>> attributes. It will be inserted successfully but read records > > will > > > > > have > > > > > >>>>> problem. > > > > > >>>>> > > > > > >>>> That sounds right to me. Please file a JIRA issue and assign > to > > > > you ( > > > > > >>> if you're working on that). > > > > > >>> I'm guessing that you can do similar things to > > CastRecordDescriptor > > > > > >>> if you want to handle general cases in that region. > > > > > >>> > > > > > >>> 3. Set to null or skip > > > > > >>>>> For optional(nullable) attributes, if I want to insert a > record > > > > with > > > > > >>>>> > > > > > >>>> null > > > > > >>> > > > > > >>>> value on that attribute. Should I assign null value or should > I > > > just > > > > > >>>>> > > > > > >>>> skip > > > > > >>> > > > > > >>>> it? (Probably this is related to the missing attribute that > > Yingyi > > > > > >>>>> mentioned today?) > > > > > >>>>> > > > > > >>>> Assign null value. > > > > > >>> Missing means the field doesn't exist in a record at all. > > > > > >>> > > > > > >>> Best, > > > > > >>> Yingyi > > > > > >>> > > > > > >>> > > > > > >>> On Fri, Apr 29, 2016 at 2:06 PM, Xikui Wang <[email protected]> > > > wrote: > > > > > >>> > > > > > >>> Hi devs, > > > > > >>>> > > > > > >>>> I came across several questions while I was constructing > records > > > in > > > > > >>>> AsterixDB. Hope someone can help me clear the confusion. :) > > > > > >>>> > > > > > >>>> 1. Write directly to data output or use SerializerDeserializer > > > > > >>>> I am working with AbstractDataParser now. I see people using > > > > different > > > > > >>>> ways > > > > > >>>> to append attributes to data output. Either use: > > > > > >>>> output.Write(typetag.serialize()); > > > > > >>>> output.WriteInt(0); > > > > > >>>> to write into data output directly, or > > > > > >>>> use AInt8SerializerDeserializer.serialize(int8Serde) to > > serialize > > > a > > > > > >>>> AINT8 > > > > > >>>> instance to output. *SerializerDeserializer uses writeByte to > > > write > > > > > >>>> output. > > > > > >>>> > > > > > >>>> My feeling is SerializerDeserializer offers another level of > > > > > abstraction > > > > > >>>> but with output I can write value directly without construct > > AType > > > > > >>>> object. > > > > > >>>> I am wondering if there are any preferences over these two? > > > > > >>>> > > > > > >>>> 2. RecordType validation after parser but before add to frame? > > > > > >>>> My observation is after parser finish writing the output and > > pass > > > it > > > > > to > > > > > >>>> next level, there is no such validation that checks whether > > > > > non-optional > > > > > >>>> field is null or not. In other words, parser has to guarantee > > that > > > > the > > > > > >>>> processed records has to match the dataset > > definition(non-optional > > > > > >>>> attribute cannot have null value). I tried to assign null > value > > to > > > > > >>>> non-null > > > > > >>>> attributes. It will be inserted successfully but read records > > will > > > > > have > > > > > >>>> problem. > > > > > >>>> > > > > > >>>> 3. Set to null or skip > > > > > >>>> For optional(nullable) attributes, if I want to insert a > record > > > with > > > > > >>>> null > > > > > >>>> value on that attribute. Should I assign null value or should > I > > > just > > > > > >>>> skip > > > > > >>>> it? (Probably this is related to the missing attribute that > > Yingyi > > > > > >>>> mentioned today?) > > > > > >>>> > > > > > >>>> Thanks for your help. > > > > > >>>> > > > > > >>>> Best, > > > > > >>>> Xikui > > > > > >>>> > > > > > >>>> > > > > > >>> > > > > > > > > > > > > > > > > > > > > >
