Re: Questions of building record in AsterixDB

Xikui Wang Sat, 30 Apr 2016 18:11:00 -0700

Hi Mike,

Thanks for pointing that out. I think I misunderstood the working mechanism
and misused the terms of 'dataset' and 'datatype'. Sorry about that.


Best,
Xikui

On Sat, Apr 30, 2016 at 4:30 PM, Mike Carey <[email protected]> wrote:

> One nit:  This has nothing to do with any dataset definition, on the parser
> side of things - it's the type parameter on the create feed DDL statement
> that should be the parser's guide.  (In general the optional function on
> the feed may change the type by the time the data reaches a dataset.)
> On Apr 30, 2016 3:26 PM, "Xikui Wang" <[email protected]> wrote:
>
> > Hi Abdullah,
> >
> > Actually I also have the concern that adding null-check for general cases
> > will bring extra
> > overheads. Thus I plan to add the checking procedure after parser, but
> > before addTuple,
> > i.e.FeedRecordDataFlowController. But based on what I have seen so far,
> it
> > seems RecordType
> > is transparent to FeedRecordDataFlowController. So I am still
> investigating
> > that...
> >
> > I saw the null check in ADM parser. That's actually a viable way to
> handle
> > that within the
> > parser scope. But I am looking for a slightly different solution. In my
> > perspective,
> > ADM parser assumes the input adm should conform with the dataset
> > definition.
> > Thus it's reasonable for it to throw a exception. For Tweetparser, if I
> saw
> > null value on non-null attribute, I will
> > discard the whole tweet directly, and may not even log it(as too many
> > tweets with null).
> > That's the reason why I want to put that in FeedRecordDataFlowController,
> > since I didn't see
> > there is a good way to prevent record insert in parser except for throw
> > exception.
> >
> > Not sure my opinion makes sense or not. Feel free to comment. :)
> >
> > Best,
> > Xikui
> >
> > On Sat, Apr 30, 2016 at 1:52 PM, abdullah alamoudi <[email protected]>
> > wrote:
> >
> > > Adding a few points here:
> > >
> > > My feeling is SerializerDeserializer offers another level of
> abstraction
> > > but with output I can write value directly without construct AType
> > object.
> > > I am wondering if there are any preferences over these two?
> > >
> > > - Using The SerializerDeserializer option, you will only create a
> single
> > > object regardless of the number of parsed records, so I wouldn't worry
> > > about it. Code maintainability takes precedence here IMO.
> > > - In addition to records and lists, UTF8StringSerializerDeserializer
> can
> > be
> > > stateful for the same reason (avoid creating lost of un-needed
> objects).
> > In
> > > fact, our parsers use the stateful UTF8StringSerializerDeserializer
> > since I
> > > noticed that using the stateless one creates lots of byte[] and
> triggers
> > GC
> > > over and over.
> > > - Right now, we parse missing values as null. Should that change?
> > > - There is definitely a check for nulls on non-nullable values at least
> > in
> > > the ADM parser. There might be a bug however that makes it accept
> > explicit
> > > null values and that should be fixed.
> > >
> > > I am for NOT using the cast record solution for the overhead it will
> add.
> > > but that is just me :)
> > > ~Abdullah.
> > >
> > >
> > > On Sat, Apr 30, 2016 at 6:48 AM, Xikui Wang <[email protected]> wrote:
> > >
> > > > Thank you Yingyi. I will try to figure out a solution from that
> > > direction.
> > > >
> > > > Best,
> > > > Xikui
> > > >
> > > > On Fri, Apr 29, 2016 at 3:48 PM, Yingyi Bu <[email protected]>
> wrote:
> > > >
> > > > > Yeah, I think so:-)
> > > > >
> > > > > Best,
> > > > > Yingyi
> > > > >
> > > > > On Fri, Apr 29, 2016 at 3:46 PM, Mike Carey <[email protected]>
> > wrote:
> > > > >
> > > > > > This indeed might be cleaner?
> > > > > >
> > > > > >
> > > > > > On 4/29/16 3:28 PM, Yingyi Bu wrote:
> > > > > >
> > > > > >> I'm guessing that you can do similar things to
> > CastRecordDescriptor
> > > > > >>>> if you want to handle general cases in that region.
> > > > > >>>>
> > > > > >>> Or, you can inject a cast-record function in the loading
> pipeline
> > > > > >> so that you can defer the runtime-type-check/cast to that
> function
> > > > > instead
> > > > > >> of doing that in the parser.
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Apr 29, 2016 at 3:25 PM, Yingyi Bu <[email protected]>
> > > > wrote:
> > > > > >>
> > > > > >> My answer is inlined.
> > > > > >>>
> > > > > >>> My feeling is SerializerDeserializer offers another level of
> > > > > abstraction
> > > > > >>>>> but with output I can write value directly without construct
> > > AType
> > > > > >>>>>
> > > > > >>>> object.
> > > > > >>>
> > > > > >>>> I am wondering if there are any preferences over these two?
> > > > > >>>>>
> > > > > >>>> I agree with you. However, a SerializerDeserializer has to be
> > > > > stateless,
> > > > > >>> hence it cannot be used at runtime for complex type objects
> such
> > as
> > > > > >>> records and lists,
> > > > > >>> because it will create a lot Java objects.
> > > > > >>>
> > > > > >>> in other words, parser has to guarantee that the
> > > > > >>>>> processed records has to match the dataset
> > > definition(non-optional
> > > > > >>>>> attribute cannot have null value). I tried to assign null
> value
> > > to
> > > > > >>>>>
> > > > > >>>> non-null
> > > > > >>>
> > > > > >>>> attributes. It will be inserted successfully but read records
> > will
> > > > > have
> > > > > >>>>> problem.
> > > > > >>>>>
> > > > > >>>> That sounds right to me.  Please file a JIRA issue and assign
> to
> > > > you (
> > > > > >>> if you're working on that).
> > > > > >>> I'm guessing that you can do similar things to
> > CastRecordDescriptor
> > > > > >>> if you want to handle general cases in that region.
> > > > > >>>
> > > > > >>> 3. Set to null or skip
> > > > > >>>>> For optional(nullable) attributes, if I want to insert a
> record
> > > > with
> > > > > >>>>>
> > > > > >>>> null
> > > > > >>>
> > > > > >>>> value on that attribute. Should I assign null value or should
> I
> > > just
> > > > > >>>>>
> > > > > >>>> skip
> > > > > >>>
> > > > > >>>> it? (Probably this is related to the missing attribute that
> > Yingyi
> > > > > >>>>> mentioned today?)
> > > > > >>>>>
> > > > > >>>> Assign null value.
> > > > > >>> Missing means the field doesn't exist in a record at all.
> > > > > >>>
> > > > > >>> Best,
> > > > > >>> Yingyi
> > > > > >>>
> > > > > >>>
> > > > > >>> On Fri, Apr 29, 2016 at 2:06 PM, Xikui Wang <[email protected]>
> > > wrote:
> > > > > >>>
> > > > > >>> Hi devs,
> > > > > >>>>
> > > > > >>>> I came across several questions while I was constructing
> records
> > > in
> > > > > >>>> AsterixDB.  Hope someone can help me clear the confusion. :)
> > > > > >>>>
> > > > > >>>> 1. Write directly to data output or use SerializerDeserializer
> > > > > >>>> I am working with AbstractDataParser now. I see people using
> > > > different
> > > > > >>>> ways
> > > > > >>>> to append attributes to data output. Either use:
> > > > > >>>> output.Write(typetag.serialize());
> > > > > >>>> output.WriteInt(0);
> > > > > >>>> to write into data output directly, or
> > > > > >>>> use AInt8SerializerDeserializer.serialize(int8Serde) to
> > serialize
> > > a
> > > > > >>>> AINT8
> > > > > >>>> instance to output. *SerializerDeserializer uses writeByte to
> > > write
> > > > > >>>> output.
> > > > > >>>>
> > > > > >>>> My feeling is SerializerDeserializer offers another level of
> > > > > abstraction
> > > > > >>>> but with output I can write value directly without construct
> > AType
> > > > > >>>> object.
> > > > > >>>> I am wondering if there are any preferences over these two?
> > > > > >>>>
> > > > > >>>> 2. RecordType validation after parser but before add to frame?
> > > > > >>>> My observation is after parser finish writing the output and
> > pass
> > > it
> > > > > to
> > > > > >>>> next level, there is no such validation that checks whether
> > > > > non-optional
> > > > > >>>> field is null or not. In other words, parser has to guarantee
> > that
> > > > the
> > > > > >>>> processed records has to match the dataset
> > definition(non-optional
> > > > > >>>> attribute cannot have null value). I tried to assign null
> value
> > to
> > > > > >>>> non-null
> > > > > >>>> attributes. It will be inserted successfully but read records
> > will
> > > > > have
> > > > > >>>> problem.
> > > > > >>>>
> > > > > >>>> 3. Set to null or skip
> > > > > >>>> For optional(nullable) attributes, if I want to insert a
> record
> > > with
> > > > > >>>> null
> > > > > >>>> value on that attribute. Should I assign null value or should
> I
> > > just
> > > > > >>>> skip
> > > > > >>>> it? (Probably this is related to the missing attribute that
> > Yingyi
> > > > > >>>> mentioned today?)
> > > > > >>>>
> > > > > >>>> Thanks for your help.
> > > > > >>>>
> > > > > >>>> Best,
> > > > > >>>> Xikui
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Questions of building record in AsterixDB

Reply via email to