Re: Including meta data with input tuples

Amol Kekre Wed, 18 Nov 2015 07:06:57 -0800

That makes sense. But then this should not be ON by default as per-tuple
cost is high. meta data will also help with ask from Ilya for ability to
add latency as meta-data per-tuple.


Thks,
Amol


On Wed, Nov 18, 2015 at 1:03 AM, Sandeep Deshmukh <[email protected]>
wrote:

>    1. Potentially each tuple can have different meta-data and hence sending
>    meta-data and data tuples separately is not a good idea. Example could
> be
>    tuple incoming time which will vary for each tuple. In such a case, data
>    and meta-data should be* tightly coupled*.
>    2. In case of separate meta-data tuple mechanism, schema will be
>    different for data tuple and meta-data tuple, which will make things
> messy.
>    3. Partitioning will pose a problem as data & meta-data tuples need to
>    be passed on to the same partition
>
>
> I would vote for  mechanism to bundle meta-data in the tuple, and schema to
> worry only about the data.
>
> Regards,
> Sandeep
>
> On Wed, Nov 18, 2015 at 11:14 AM, Gaurav Gupta <[email protected]>
> wrote:
>
> > Yes in worst case we’ll have meta data followed by data for every tuple.
> >
> > Data schema will only have id / reference of meta data instead of whole
> > meta data
> >
> > Thanks
> > - Gaurav
> >
> > > On Nov 17, 2015, at 9:39 PM, Bhupesh Chawda <[email protected]>
> > wrote:
> > >
> > > Ok, so in the worst case, we'll have meta data followed by data for
> every
> > > tuple.
> > > However, in this case we need to include the meta data as part of the
> > data
> > > schema itself so as to allow the parser to process data and meta data
> in
> > a
> > > common way. This is similar to option 1 in the first email.
> > >
> > >
> > > Thanks.
> > > Bhupesh
> > >
> > > On Wed, Nov 18, 2015 at 11:02 AM, Gaurav Gupta <[email protected]
> >
> > > wrote:
> > >
> > >> Bhupesh,
> > >>
> > >> No it doesn’t stall anything… Meta data and data tuples go on same
> port.
> > >> Whenever there is a change in meta data, send the meta data first and
> > then
> > >> tuples following it. So the first tuple that arrives which has
> different
> > >> meta data, will trigger sending of new meta data.
> > >>
> > >> Thanks
> > >> - Gaurav
> > >>
> > >>> On Nov 17, 2015, at 9:28 PM, Bhupesh Chawda <[email protected]
> >
> > >> wrote:
> > >>>
> > >>> Depends on how "real time" the scenario is.
> > >>> I think sending it only once during a window might work for some use
> > >> cases.
> > >>> If my understanding is correct, this essentially stalls the
> processing
> > >> of a
> > >>> window until the meta data is available which is not until end window
> > of
> > >>> the upstream operator.
> > >>>
> > >>> Thanks
> > >>> -Bhupesh
> > >>>
> > >>>
> > >>> On Wed, Nov 18, 2015 at 10:54 AM, Gaurav Gupta <
> [email protected]
> > >
> > >>> wrote:
> > >>>
> > >>>> Bhupesh,
> > >>>>
> > >>>> If the requirement is to send meta data with every tuple then it
> > should
> > >> be
> > >>>> send with data schema itself.
> > >>>> Can sending meta data be optimized the way platform does with
> > >>>> DefaultStatefulStreamCodec. I mean send the meta data only once in a
> > >> window
> > >>>> and all the tuples that are associated with this meta data have this
> > >> meta
> > >>>> data’s id.
> > >>>>
> > >>>> Thanks
> > >>>> - Gaurav
> > >>>>
> > >>>>> On Nov 17, 2015, at 8:20 PM, Bhupesh Chawda <
> [email protected]
> > >
> > >>>> wrote:
> > >>>>>
> > >>>>> Hi All,
> > >>>>>
> > >>>>> In the design of input modules, we are facing situations where we
> > might
> > >>>>> need to pass on some meta data to the downstream modules, in
> addition
> > >> to
> > >>>>> actual data. Further, this meta data may need to be sent per
> record.
> > An
> > >>>>> example use case is to send a record and additionally send the file
> > >> name
> > >>>>> (as meta data) from which the record was read. Another example is
> > >> sending
> > >>>>> out the kafka topic information along with the message.
> > >>>>>
> > >>>>> We are exploring options on:
> > >>>>>
> > >>>>> 1. Whether to include the meta information in the data schema, so
> as
> > >> to
> > >>>>> allow the parser to handle this data as regular data. This will
> > >> involve
> > >>>>> changing the schema of the data.
> > >>>>> 2. Whether to handle meta data separately and modify the behaviour
> of
> > >>>>> parser / converter to handle meta data separately as well.
> > >>>>> 3. Use additional ports to transfer such meta data depending on
> > >>>>> different modules.
> > >>>>> 4. Any other option
> > >>>>>
> > >>>>> Please comment.
> > >>>>>
> > >>>>> Consolidating comments on another thread here:
> > >>>>>
> > >>>>> 1. Have the tuple containing two parts, with the downstream parser
> > >>>>> ignoring the meta data
> > >>>>> 1. Data
> > >>>>> 2. Meta-data
> > >>>>> 2. Use option 1, but concern regarding how unifiers will treat meta
> > >>>>> data, if they need to unify that as well.
> > >>>>> 3. Another comment is to have a centralized meta data repo. This
> may
> > >> be
> > >>>>> in memory as well, may be as a separate operator which stores and
> > >>>> serves
> > >>>>> the meta data to other operators.
> > >>>>>
> > >>>>> Thanks.
> > >>>>>
> > >>>>> -Bhupesh
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Re: Including meta data with input tuples

Reply via email to