Re: Including meta data with input tuples

Gaurav Gupta Tue, 17 Nov 2015 21:44:42 -0800

Yes in worst case we’ll have meta data followed by data for every tuple.

Data schema will only have id / reference of meta data instead of whole meta 
data


Thanks
- Gaurav

> On Nov 17, 2015, at 9:39 PM, Bhupesh Chawda <[email protected]> wrote:
> 
> Ok, so in the worst case, we'll have meta data followed by data for every
> tuple.
> However, in this case we need to include the meta data as part of the data
> schema itself so as to allow the parser to process data and meta data in a
> common way. This is similar to option 1 in the first email.
> 
> 
> Thanks.
> Bhupesh
> 
> On Wed, Nov 18, 2015 at 11:02 AM, Gaurav Gupta <[email protected]>
> wrote:
> 
>> Bhupesh,
>> 
>> No it doesn’t stall anything… Meta data and data tuples go on same port.
>> Whenever there is a change in meta data, send the meta data first and then
>> tuples following it. So the first tuple that arrives which has different
>> meta data, will trigger sending of new meta data.
>> 
>> Thanks
>> - Gaurav
>> 
>>> On Nov 17, 2015, at 9:28 PM, Bhupesh Chawda <[email protected]>
>> wrote:
>>> 
>>> Depends on how "real time" the scenario is.
>>> I think sending it only once during a window might work for some use
>> cases.
>>> If my understanding is correct, this essentially stalls the processing
>> of a
>>> window until the meta data is available which is not until end window of
>>> the upstream operator.
>>> 
>>> Thanks
>>> -Bhupesh
>>> 
>>> 
>>> On Wed, Nov 18, 2015 at 10:54 AM, Gaurav Gupta <[email protected]>
>>> wrote:
>>> 
>>>> Bhupesh,
>>>> 
>>>> If the requirement is to send meta data with every tuple then it should
>> be
>>>> send with data schema itself.
>>>> Can sending meta data be optimized the way platform does with
>>>> DefaultStatefulStreamCodec. I mean send the meta data only once in a
>> window
>>>> and all the tuples that are associated with this meta data have this
>> meta
>>>> data’s id.
>>>> 
>>>> Thanks
>>>> - Gaurav
>>>> 
>>>>> On Nov 17, 2015, at 8:20 PM, Bhupesh Chawda <[email protected]>
>>>> wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> In the design of input modules, we are facing situations where we might
>>>>> need to pass on some meta data to the downstream modules, in addition
>> to
>>>>> actual data. Further, this meta data may need to be sent per record. An
>>>>> example use case is to send a record and additionally send the file
>> name
>>>>> (as meta data) from which the record was read. Another example is
>> sending
>>>>> out the kafka topic information along with the message.
>>>>> 
>>>>> We are exploring options on:
>>>>> 
>>>>> 1. Whether to include the meta information in the data schema, so as
>> to
>>>>> allow the parser to handle this data as regular data. This will
>> involve
>>>>> changing the schema of the data.
>>>>> 2. Whether to handle meta data separately and modify the behaviour of
>>>>> parser / converter to handle meta data separately as well.
>>>>> 3. Use additional ports to transfer such meta data depending on
>>>>> different modules.
>>>>> 4. Any other option
>>>>> 
>>>>> Please comment.
>>>>> 
>>>>> Consolidating comments on another thread here:
>>>>> 
>>>>> 1. Have the tuple containing two parts, with the downstream parser
>>>>> ignoring the meta data
>>>>> 1. Data
>>>>> 2. Meta-data
>>>>> 2. Use option 1, but concern regarding how unifiers will treat meta
>>>>> data, if they need to unify that as well.
>>>>> 3. Another comment is to have a centralized meta data repo. This may
>> be
>>>>> in memory as well, may be as a separate operator which stores and
>>>> serves
>>>>> the meta data to other operators.
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> -Bhupesh
>>>> 
>>>> 
>> 
>>

Re: Including meta data with input tuples

Reply via email to