Re: Including meta data with input tuples

Sandeep Deshmukh Wed, 18 Nov 2015 01:04:31 -0800

   1. Potentially each tuple can have different meta-data and hence sending
   meta-data and data tuples separately is not a good idea. Example could be
   tuple incoming time which will vary for each tuple. In such a case, data
   and meta-data should be* tightly coupled*.
   2. In case of separate meta-data tuple mechanism, schema will be
   different for data tuple and meta-data tuple, which will make things messy.
   3. Partitioning will pose a problem as data & meta-data tuples need to
   be passed on to the same partition



I would vote for  mechanism to bundle meta-data in the tuple, and schema to
worry only about the data.

Regards,
Sandeep

On Wed, Nov 18, 2015 at 11:14 AM, Gaurav Gupta <[email protected]>
wrote:

> Yes in worst case we’ll have meta data followed by data for every tuple.
>
> Data schema will only have id / reference of meta data instead of whole
> meta data
>
> Thanks
> - Gaurav
>
> > On Nov 17, 2015, at 9:39 PM, Bhupesh Chawda <[email protected]>
> wrote:
> >
> > Ok, so in the worst case, we'll have meta data followed by data for every
> > tuple.
> > However, in this case we need to include the meta data as part of the
> data
> > schema itself so as to allow the parser to process data and meta data in
> a
> > common way. This is similar to option 1 in the first email.
> >
> >
> > Thanks.
> > Bhupesh
> >
> > On Wed, Nov 18, 2015 at 11:02 AM, Gaurav Gupta <[email protected]>
> > wrote:
> >
> >> Bhupesh,
> >>
> >> No it doesn’t stall anything… Meta data and data tuples go on same port.
> >> Whenever there is a change in meta data, send the meta data first and
> then
> >> tuples following it. So the first tuple that arrives which has different
> >> meta data, will trigger sending of new meta data.
> >>
> >> Thanks
> >> - Gaurav
> >>
> >>> On Nov 17, 2015, at 9:28 PM, Bhupesh Chawda <[email protected]>
> >> wrote:
> >>>
> >>> Depends on how "real time" the scenario is.
> >>> I think sending it only once during a window might work for some use
> >> cases.
> >>> If my understanding is correct, this essentially stalls the processing
> >> of a
> >>> window until the meta data is available which is not until end window
> of
> >>> the upstream operator.
> >>>
> >>> Thanks
> >>> -Bhupesh
> >>>
> >>>
> >>> On Wed, Nov 18, 2015 at 10:54 AM, Gaurav Gupta <[email protected]
> >
> >>> wrote:
> >>>
> >>>> Bhupesh,
> >>>>
> >>>> If the requirement is to send meta data with every tuple then it
> should
> >> be
> >>>> send with data schema itself.
> >>>> Can sending meta data be optimized the way platform does with
> >>>> DefaultStatefulStreamCodec. I mean send the meta data only once in a
> >> window
> >>>> and all the tuples that are associated with this meta data have this
> >> meta
> >>>> data’s id.
> >>>>
> >>>> Thanks
> >>>> - Gaurav
> >>>>
> >>>>> On Nov 17, 2015, at 8:20 PM, Bhupesh Chawda <[email protected]
> >
> >>>> wrote:
> >>>>>
> >>>>> Hi All,
> >>>>>
> >>>>> In the design of input modules, we are facing situations where we
> might
> >>>>> need to pass on some meta data to the downstream modules, in addition
> >> to
> >>>>> actual data. Further, this meta data may need to be sent per record.
> An
> >>>>> example use case is to send a record and additionally send the file
> >> name
> >>>>> (as meta data) from which the record was read. Another example is
> >> sending
> >>>>> out the kafka topic information along with the message.
> >>>>>
> >>>>> We are exploring options on:
> >>>>>
> >>>>> 1. Whether to include the meta information in the data schema, so as
> >> to
> >>>>> allow the parser to handle this data as regular data. This will
> >> involve
> >>>>> changing the schema of the data.
> >>>>> 2. Whether to handle meta data separately and modify the behaviour of
> >>>>> parser / converter to handle meta data separately as well.
> >>>>> 3. Use additional ports to transfer such meta data depending on
> >>>>> different modules.
> >>>>> 4. Any other option
> >>>>>
> >>>>> Please comment.
> >>>>>
> >>>>> Consolidating comments on another thread here:
> >>>>>
> >>>>> 1. Have the tuple containing two parts, with the downstream parser
> >>>>> ignoring the meta data
> >>>>> 1. Data
> >>>>> 2. Meta-data
> >>>>> 2. Use option 1, but concern regarding how unifiers will treat meta
> >>>>> data, if they need to unify that as well.
> >>>>> 3. Another comment is to have a centralized meta data repo. This may
> >> be
> >>>>> in memory as well, may be as a separate operator which stores and
> >>>> serves
> >>>>> the meta data to other operators.
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>> -Bhupesh
> >>>>
> >>>>
> >>
> >>
>
>

Re: Including meta data with input tuples

Reply via email to