Adding a few points here: My feeling is SerializerDeserializer offers another level of abstraction but with output I can write value directly without construct AType object. I am wondering if there are any preferences over these two?
- Using The SerializerDeserializer option, you will only create a single object regardless of the number of parsed records, so I wouldn't worry about it. Code maintainability takes precedence here IMO. - In addition to records and lists, UTF8StringSerializerDeserializer can be stateful for the same reason (avoid creating lost of un-needed objects). In fact, our parsers use the stateful UTF8StringSerializerDeserializer since I noticed that using the stateless one creates lots of byte[] and triggers GC over and over. - Right now, we parse missing values as null. Should that change? - There is definitely a check for nulls on non-nullable values at least in the ADM parser. There might be a bug however that makes it accept explicit null values and that should be fixed. I am for NOT using the cast record solution for the overhead it will add. but that is just me :) ~Abdullah. On Sat, Apr 30, 2016 at 6:48 AM, Xikui Wang <[email protected]> wrote: > Thank you Yingyi. I will try to figure out a solution from that direction. > > Best, > Xikui > > On Fri, Apr 29, 2016 at 3:48 PM, Yingyi Bu <[email protected]> wrote: > > > Yeah, I think so:-) > > > > Best, > > Yingyi > > > > On Fri, Apr 29, 2016 at 3:46 PM, Mike Carey <[email protected]> wrote: > > > > > This indeed might be cleaner? > > > > > > > > > On 4/29/16 3:28 PM, Yingyi Bu wrote: > > > > > >> I'm guessing that you can do similar things to CastRecordDescriptor > > >>>> if you want to handle general cases in that region. > > >>>> > > >>> Or, you can inject a cast-record function in the loading pipeline > > >> so that you can defer the runtime-type-check/cast to that function > > instead > > >> of doing that in the parser. > > >> > > >> > > >> On Fri, Apr 29, 2016 at 3:25 PM, Yingyi Bu <[email protected]> > wrote: > > >> > > >> My answer is inlined. > > >>> > > >>> My feeling is SerializerDeserializer offers another level of > > abstraction > > >>>>> but with output I can write value directly without construct AType > > >>>>> > > >>>> object. > > >>> > > >>>> I am wondering if there are any preferences over these two? > > >>>>> > > >>>> I agree with you. However, a SerializerDeserializer has to be > > stateless, > > >>> hence it cannot be used at runtime for complex type objects such as > > >>> records and lists, > > >>> because it will create a lot Java objects. > > >>> > > >>> in other words, parser has to guarantee that the > > >>>>> processed records has to match the dataset definition(non-optional > > >>>>> attribute cannot have null value). I tried to assign null value to > > >>>>> > > >>>> non-null > > >>> > > >>>> attributes. It will be inserted successfully but read records will > > have > > >>>>> problem. > > >>>>> > > >>>> That sounds right to me. Please file a JIRA issue and assign to > you ( > > >>> if you're working on that). > > >>> I'm guessing that you can do similar things to CastRecordDescriptor > > >>> if you want to handle general cases in that region. > > >>> > > >>> 3. Set to null or skip > > >>>>> For optional(nullable) attributes, if I want to insert a record > with > > >>>>> > > >>>> null > > >>> > > >>>> value on that attribute. Should I assign null value or should I just > > >>>>> > > >>>> skip > > >>> > > >>>> it? (Probably this is related to the missing attribute that Yingyi > > >>>>> mentioned today?) > > >>>>> > > >>>> Assign null value. > > >>> Missing means the field doesn't exist in a record at all. > > >>> > > >>> Best, > > >>> Yingyi > > >>> > > >>> > > >>> On Fri, Apr 29, 2016 at 2:06 PM, Xikui Wang <[email protected]> wrote: > > >>> > > >>> Hi devs, > > >>>> > > >>>> I came across several questions while I was constructing records in > > >>>> AsterixDB. Hope someone can help me clear the confusion. :) > > >>>> > > >>>> 1. Write directly to data output or use SerializerDeserializer > > >>>> I am working with AbstractDataParser now. I see people using > different > > >>>> ways > > >>>> to append attributes to data output. Either use: > > >>>> output.Write(typetag.serialize()); > > >>>> output.WriteInt(0); > > >>>> to write into data output directly, or > > >>>> use AInt8SerializerDeserializer.serialize(int8Serde) to serialize a > > >>>> AINT8 > > >>>> instance to output. *SerializerDeserializer uses writeByte to write > > >>>> output. > > >>>> > > >>>> My feeling is SerializerDeserializer offers another level of > > abstraction > > >>>> but with output I can write value directly without construct AType > > >>>> object. > > >>>> I am wondering if there are any preferences over these two? > > >>>> > > >>>> 2. RecordType validation after parser but before add to frame? > > >>>> My observation is after parser finish writing the output and pass it > > to > > >>>> next level, there is no such validation that checks whether > > non-optional > > >>>> field is null or not. In other words, parser has to guarantee that > the > > >>>> processed records has to match the dataset definition(non-optional > > >>>> attribute cannot have null value). I tried to assign null value to > > >>>> non-null > > >>>> attributes. It will be inserted successfully but read records will > > have > > >>>> problem. > > >>>> > > >>>> 3. Set to null or skip > > >>>> For optional(nullable) attributes, if I want to insert a record with > > >>>> null > > >>>> value on that attribute. Should I assign null value or should I just > > >>>> skip > > >>>> it? (Probably this is related to the missing attribute that Yingyi > > >>>> mentioned today?) > > >>>> > > >>>> Thanks for your help. > > >>>> > > >>>> Best, > > >>>> Xikui > > >>>> > > >>>> > > >>> > > > > > >
