Re: Questions of building record in AsterixDB

abdullah alamoudi Sat, 30 Apr 2016 13:52:53 -0700

Adding a few points here:

My feeling is SerializerDeserializer offers another level of abstraction
but with output I can write value directly without construct AType object.
I am wondering if there are any preferences over these two?


- Using The SerializerDeserializer option, you will only create a single
object regardless of the number of parsed records, so I wouldn't worry
about it. Code maintainability takes precedence here IMO.
- In addition to records and lists, UTF8StringSerializerDeserializer can be
stateful for the same reason (avoid creating lost of un-needed objects). In
fact, our parsers use the stateful UTF8StringSerializerDeserializer since I
noticed that using the stateless one creates lots of byte[] and triggers GC
over and over.
- Right now, we parse missing values as null. Should that change?
- There is definitely a check for nulls on non-nullable values at least in
the ADM parser. There might be a bug however that makes it accept explicit
null values and that should be fixed.

I am for NOT using the cast record solution for the overhead it will add.
but that is just me :)
~Abdullah.


On Sat, Apr 30, 2016 at 6:48 AM, Xikui Wang <[email protected]> wrote:

> Thank you Yingyi. I will try to figure out a solution from that direction.
>
> Best,
> Xikui
>
> On Fri, Apr 29, 2016 at 3:48 PM, Yingyi Bu <[email protected]> wrote:
>
> > Yeah, I think so:-)
> >
> > Best,
> > Yingyi
> >
> > On Fri, Apr 29, 2016 at 3:46 PM, Mike Carey <[email protected]> wrote:
> >
> > > This indeed might be cleaner?
> > >
> > >
> > > On 4/29/16 3:28 PM, Yingyi Bu wrote:
> > >
> > >> I'm guessing that you can do similar things to CastRecordDescriptor
> > >>>> if you want to handle general cases in that region.
> > >>>>
> > >>> Or, you can inject a cast-record function in the loading pipeline
> > >> so that you can defer the runtime-type-check/cast to that function
> > instead
> > >> of doing that in the parser.
> > >>
> > >>
> > >> On Fri, Apr 29, 2016 at 3:25 PM, Yingyi Bu <[email protected]>
> wrote:
> > >>
> > >> My answer is inlined.
> > >>>
> > >>> My feeling is SerializerDeserializer offers another level of
> > abstraction
> > >>>>> but with output I can write value directly without construct AType
> > >>>>>
> > >>>> object.
> > >>>
> > >>>> I am wondering if there are any preferences over these two?
> > >>>>>
> > >>>> I agree with you. However, a SerializerDeserializer has to be
> > stateless,
> > >>> hence it cannot be used at runtime for complex type objects such as
> > >>> records and lists,
> > >>> because it will create a lot Java objects.
> > >>>
> > >>> in other words, parser has to guarantee that the
> > >>>>> processed records has to match the dataset definition(non-optional
> > >>>>> attribute cannot have null value). I tried to assign null value to
> > >>>>>
> > >>>> non-null
> > >>>
> > >>>> attributes. It will be inserted successfully but read records will
> > have
> > >>>>> problem.
> > >>>>>
> > >>>> That sounds right to me.  Please file a JIRA issue and assign to
> you (
> > >>> if you're working on that).
> > >>> I'm guessing that you can do similar things to CastRecordDescriptor
> > >>> if you want to handle general cases in that region.
> > >>>
> > >>> 3. Set to null or skip
> > >>>>> For optional(nullable) attributes, if I want to insert a record
> with
> > >>>>>
> > >>>> null
> > >>>
> > >>>> value on that attribute. Should I assign null value or should I just
> > >>>>>
> > >>>> skip
> > >>>
> > >>>> it? (Probably this is related to the missing attribute that Yingyi
> > >>>>> mentioned today?)
> > >>>>>
> > >>>> Assign null value.
> > >>> Missing means the field doesn't exist in a record at all.
> > >>>
> > >>> Best,
> > >>> Yingyi
> > >>>
> > >>>
> > >>> On Fri, Apr 29, 2016 at 2:06 PM, Xikui Wang <[email protected]> wrote:
> > >>>
> > >>> Hi devs,
> > >>>>
> > >>>> I came across several questions while I was constructing records in
> > >>>> AsterixDB.  Hope someone can help me clear the confusion. :)
> > >>>>
> > >>>> 1. Write directly to data output or use SerializerDeserializer
> > >>>> I am working with AbstractDataParser now. I see people using
> different
> > >>>> ways
> > >>>> to append attributes to data output. Either use:
> > >>>> output.Write(typetag.serialize());
> > >>>> output.WriteInt(0);
> > >>>> to write into data output directly, or
> > >>>> use AInt8SerializerDeserializer.serialize(int8Serde) to serialize a
> > >>>> AINT8
> > >>>> instance to output. *SerializerDeserializer uses writeByte to write
> > >>>> output.
> > >>>>
> > >>>> My feeling is SerializerDeserializer offers another level of
> > abstraction
> > >>>> but with output I can write value directly without construct AType
> > >>>> object.
> > >>>> I am wondering if there are any preferences over these two?
> > >>>>
> > >>>> 2. RecordType validation after parser but before add to frame?
> > >>>> My observation is after parser finish writing the output and pass it
> > to
> > >>>> next level, there is no such validation that checks whether
> > non-optional
> > >>>> field is null or not. In other words, parser has to guarantee that
> the
> > >>>> processed records has to match the dataset definition(non-optional
> > >>>> attribute cannot have null value). I tried to assign null value to
> > >>>> non-null
> > >>>> attributes. It will be inserted successfully but read records will
> > have
> > >>>> problem.
> > >>>>
> > >>>> 3. Set to null or skip
> > >>>> For optional(nullable) attributes, if I want to insert a record with
> > >>>> null
> > >>>> value on that attribute. Should I assign null value or should I just
> > >>>> skip
> > >>>> it? (Probably this is related to the missing attribute that Yingyi
> > >>>> mentioned today?)
> > >>>>
> > >>>> Thanks for your help.
> > >>>>
> > >>>> Best,
> > >>>> Xikui
> > >>>>
> > >>>>
> > >>>
> > >
> >
>

Re: Questions of building record in AsterixDB

Reply via email to