>> - Using The SerializerDeserializer option, you will only create a single >> object regardless of the number of parsed records, so I wouldn't worry >> about it. Code maintainability takes precedence here IMO.
For scalar types, you're right. But for complex types, there are objects created in each serialize/deserialize call: https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/serde/ARecordSerializerDeserializer.java#L176 https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/serde/AOrderedListSerializerDeserializer.java#L106 https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/serde/AUnorderedListSerializerDeserializer.java#L107 >> - Right now, we parse missing values as null. Should that change? I'll take care of that in my change: -- if missing is in the closed part, it becomes null; -- if missing is in the open part, it does not exist in the record. >> I am for NOT using the cast record solution for the overhead it will add. >> but that is just me :) Yes, there would be some overhead if we parse every record as a type-less open record first. But I think correctness is more important. I'm not sure whether the current AdmParser can handle all general type-castable cases. Best, Yingyi On Sat, Apr 30, 2016 at 6:10 PM, Xikui Wang <[email protected]> wrote: > Hi Mike, > > Thanks for pointing that out. I think I misunderstood the working mechanism > and misused the terms of 'dataset' and 'datatype'. Sorry about that. > > Best, > Xikui > > On Sat, Apr 30, 2016 at 4:30 PM, Mike Carey <[email protected]> wrote: > > > One nit: This has nothing to do with any dataset definition, on the > parser > > side of things - it's the type parameter on the create feed DDL statement > > that should be the parser's guide. (In general the optional function on > > the feed may change the type by the time the data reaches a dataset.) > > On Apr 30, 2016 3:26 PM, "Xikui Wang" <[email protected]> wrote: > > > > > Hi Abdullah, > > > > > > Actually I also have the concern that adding null-check for general > cases > > > will bring extra > > > overheads. Thus I plan to add the checking procedure after parser, but > > > before addTuple, > > > i.e.FeedRecordDataFlowController. But based on what I have seen so far, > > it > > > seems RecordType > > > is transparent to FeedRecordDataFlowController. So I am still > > investigating > > > that... > > > > > > I saw the null check in ADM parser. That's actually a viable way to > > handle > > > that within the > > > parser scope. But I am looking for a slightly different solution. In my > > > perspective, > > > ADM parser assumes the input adm should conform with the dataset > > > definition. > > > Thus it's reasonable for it to throw a exception. For Tweetparser, if I > > saw > > > null value on non-null attribute, I will > > > discard the whole tweet directly, and may not even log it(as too many > > > tweets with null). > > > That's the reason why I want to put that in > FeedRecordDataFlowController, > > > since I didn't see > > > there is a good way to prevent record insert in parser except for throw > > > exception. > > > > > > Not sure my opinion makes sense or not. Feel free to comment. :) > > > > > > Best, > > > Xikui > > > > > > On Sat, Apr 30, 2016 at 1:52 PM, abdullah alamoudi <[email protected] > > > > > wrote: > > > > > > > Adding a few points here: > > > > > > > > My feeling is SerializerDeserializer offers another level of > > abstraction > > > > but with output I can write value directly without construct AType > > > object. > > > > I am wondering if there are any preferences over these two? > > > > > > > > - Using The SerializerDeserializer option, you will only create a > > single > > > > object regardless of the number of parsed records, so I wouldn't > worry > > > > about it. Code maintainability takes precedence here IMO. > > > > - In addition to records and lists, UTF8StringSerializerDeserializer > > can > > > be > > > > stateful for the same reason (avoid creating lost of un-needed > > objects). > > > In > > > > fact, our parsers use the stateful UTF8StringSerializerDeserializer > > > since I > > > > noticed that using the stateless one creates lots of byte[] and > > triggers > > > GC > > > > over and over. > > > > - Right now, we parse missing values as null. Should that change? > > > > - There is definitely a check for nulls on non-nullable values at > least > > > in > > > > the ADM parser. There might be a bug however that makes it accept > > > explicit > > > > null values and that should be fixed. > > > > > > > > I am for NOT using the cast record solution for the overhead it will > > add. > > > > but that is just me :) > > > > ~Abdullah. > > > > > > > > > > > > On Sat, Apr 30, 2016 at 6:48 AM, Xikui Wang <[email protected]> wrote: > > > > > > > > > Thank you Yingyi. I will try to figure out a solution from that > > > > direction. > > > > > > > > > > Best, > > > > > Xikui > > > > > > > > > > On Fri, Apr 29, 2016 at 3:48 PM, Yingyi Bu <[email protected]> > > wrote: > > > > > > > > > > > Yeah, I think so:-) > > > > > > > > > > > > Best, > > > > > > Yingyi > > > > > > > > > > > > On Fri, Apr 29, 2016 at 3:46 PM, Mike Carey <[email protected]> > > > wrote: > > > > > > > > > > > > > This indeed might be cleaner? > > > > > > > > > > > > > > > > > > > > > On 4/29/16 3:28 PM, Yingyi Bu wrote: > > > > > > > > > > > > > >> I'm guessing that you can do similar things to > > > CastRecordDescriptor > > > > > > >>>> if you want to handle general cases in that region. > > > > > > >>>> > > > > > > >>> Or, you can inject a cast-record function in the loading > > pipeline > > > > > > >> so that you can defer the runtime-type-check/cast to that > > function > > > > > > instead > > > > > > >> of doing that in the parser. > > > > > > >> > > > > > > >> > > > > > > >> On Fri, Apr 29, 2016 at 3:25 PM, Yingyi Bu < > [email protected]> > > > > > wrote: > > > > > > >> > > > > > > >> My answer is inlined. > > > > > > >>> > > > > > > >>> My feeling is SerializerDeserializer offers another level of > > > > > > abstraction > > > > > > >>>>> but with output I can write value directly without > construct > > > > AType > > > > > > >>>>> > > > > > > >>>> object. > > > > > > >>> > > > > > > >>>> I am wondering if there are any preferences over these two? > > > > > > >>>>> > > > > > > >>>> I agree with you. However, a SerializerDeserializer has to > be > > > > > > stateless, > > > > > > >>> hence it cannot be used at runtime for complex type objects > > such > > > as > > > > > > >>> records and lists, > > > > > > >>> because it will create a lot Java objects. > > > > > > >>> > > > > > > >>> in other words, parser has to guarantee that the > > > > > > >>>>> processed records has to match the dataset > > > > definition(non-optional > > > > > > >>>>> attribute cannot have null value). I tried to assign null > > value > > > > to > > > > > > >>>>> > > > > > > >>>> non-null > > > > > > >>> > > > > > > >>>> attributes. It will be inserted successfully but read > records > > > will > > > > > > have > > > > > > >>>>> problem. > > > > > > >>>>> > > > > > > >>>> That sounds right to me. Please file a JIRA issue and > assign > > to > > > > > you ( > > > > > > >>> if you're working on that). > > > > > > >>> I'm guessing that you can do similar things to > > > CastRecordDescriptor > > > > > > >>> if you want to handle general cases in that region. > > > > > > >>> > > > > > > >>> 3. Set to null or skip > > > > > > >>>>> For optional(nullable) attributes, if I want to insert a > > record > > > > > with > > > > > > >>>>> > > > > > > >>>> null > > > > > > >>> > > > > > > >>>> value on that attribute. Should I assign null value or > should > > I > > > > just > > > > > > >>>>> > > > > > > >>>> skip > > > > > > >>> > > > > > > >>>> it? (Probably this is related to the missing attribute that > > > Yingyi > > > > > > >>>>> mentioned today?) > > > > > > >>>>> > > > > > > >>>> Assign null value. > > > > > > >>> Missing means the field doesn't exist in a record at all. > > > > > > >>> > > > > > > >>> Best, > > > > > > >>> Yingyi > > > > > > >>> > > > > > > >>> > > > > > > >>> On Fri, Apr 29, 2016 at 2:06 PM, Xikui Wang <[email protected]> > > > > wrote: > > > > > > >>> > > > > > > >>> Hi devs, > > > > > > >>>> > > > > > > >>>> I came across several questions while I was constructing > > records > > > > in > > > > > > >>>> AsterixDB. Hope someone can help me clear the confusion. :) > > > > > > >>>> > > > > > > >>>> 1. Write directly to data output or use > SerializerDeserializer > > > > > > >>>> I am working with AbstractDataParser now. I see people using > > > > > different > > > > > > >>>> ways > > > > > > >>>> to append attributes to data output. Either use: > > > > > > >>>> output.Write(typetag.serialize()); > > > > > > >>>> output.WriteInt(0); > > > > > > >>>> to write into data output directly, or > > > > > > >>>> use AInt8SerializerDeserializer.serialize(int8Serde) to > > > serialize > > > > a > > > > > > >>>> AINT8 > > > > > > >>>> instance to output. *SerializerDeserializer uses writeByte > to > > > > write > > > > > > >>>> output. > > > > > > >>>> > > > > > > >>>> My feeling is SerializerDeserializer offers another level of > > > > > > abstraction > > > > > > >>>> but with output I can write value directly without construct > > > AType > > > > > > >>>> object. > > > > > > >>>> I am wondering if there are any preferences over these two? > > > > > > >>>> > > > > > > >>>> 2. RecordType validation after parser but before add to > frame? > > > > > > >>>> My observation is after parser finish writing the output and > > > pass > > > > it > > > > > > to > > > > > > >>>> next level, there is no such validation that checks whether > > > > > > non-optional > > > > > > >>>> field is null or not. In other words, parser has to > guarantee > > > that > > > > > the > > > > > > >>>> processed records has to match the dataset > > > definition(non-optional > > > > > > >>>> attribute cannot have null value). I tried to assign null > > value > > > to > > > > > > >>>> non-null > > > > > > >>>> attributes. It will be inserted successfully but read > records > > > will > > > > > > have > > > > > > >>>> problem. > > > > > > >>>> > > > > > > >>>> 3. Set to null or skip > > > > > > >>>> For optional(nullable) attributes, if I want to insert a > > record > > > > with > > > > > > >>>> null > > > > > > >>>> value on that attribute. Should I assign null value or > should > > I > > > > just > > > > > > >>>> skip > > > > > > >>>> it? (Probably this is related to the missing attribute that > > > Yingyi > > > > > > >>>> mentioned today?) > > > > > > >>>> > > > > > > >>>> Thanks for your help. > > > > > > >>>> > > > > > > >>>> Best, > > > > > > >>>> Xikui > > > > > > >>>> > > > > > > >>>> > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
