Yes in worst case we’ll have meta data followed by data for every tuple. Data schema will only have id / reference of meta data instead of whole meta data
Thanks - Gaurav > On Nov 17, 2015, at 9:39 PM, Bhupesh Chawda <[email protected]> wrote: > > Ok, so in the worst case, we'll have meta data followed by data for every > tuple. > However, in this case we need to include the meta data as part of the data > schema itself so as to allow the parser to process data and meta data in a > common way. This is similar to option 1 in the first email. > > > Thanks. > Bhupesh > > On Wed, Nov 18, 2015 at 11:02 AM, Gaurav Gupta <[email protected]> > wrote: > >> Bhupesh, >> >> No it doesn’t stall anything… Meta data and data tuples go on same port. >> Whenever there is a change in meta data, send the meta data first and then >> tuples following it. So the first tuple that arrives which has different >> meta data, will trigger sending of new meta data. >> >> Thanks >> - Gaurav >> >>> On Nov 17, 2015, at 9:28 PM, Bhupesh Chawda <[email protected]> >> wrote: >>> >>> Depends on how "real time" the scenario is. >>> I think sending it only once during a window might work for some use >> cases. >>> If my understanding is correct, this essentially stalls the processing >> of a >>> window until the meta data is available which is not until end window of >>> the upstream operator. >>> >>> Thanks >>> -Bhupesh >>> >>> >>> On Wed, Nov 18, 2015 at 10:54 AM, Gaurav Gupta <[email protected]> >>> wrote: >>> >>>> Bhupesh, >>>> >>>> If the requirement is to send meta data with every tuple then it should >> be >>>> send with data schema itself. >>>> Can sending meta data be optimized the way platform does with >>>> DefaultStatefulStreamCodec. I mean send the meta data only once in a >> window >>>> and all the tuples that are associated with this meta data have this >> meta >>>> data’s id. >>>> >>>> Thanks >>>> - Gaurav >>>> >>>>> On Nov 17, 2015, at 8:20 PM, Bhupesh Chawda <[email protected]> >>>> wrote: >>>>> >>>>> Hi All, >>>>> >>>>> In the design of input modules, we are facing situations where we might >>>>> need to pass on some meta data to the downstream modules, in addition >> to >>>>> actual data. Further, this meta data may need to be sent per record. An >>>>> example use case is to send a record and additionally send the file >> name >>>>> (as meta data) from which the record was read. Another example is >> sending >>>>> out the kafka topic information along with the message. >>>>> >>>>> We are exploring options on: >>>>> >>>>> 1. Whether to include the meta information in the data schema, so as >> to >>>>> allow the parser to handle this data as regular data. This will >> involve >>>>> changing the schema of the data. >>>>> 2. Whether to handle meta data separately and modify the behaviour of >>>>> parser / converter to handle meta data separately as well. >>>>> 3. Use additional ports to transfer such meta data depending on >>>>> different modules. >>>>> 4. Any other option >>>>> >>>>> Please comment. >>>>> >>>>> Consolidating comments on another thread here: >>>>> >>>>> 1. Have the tuple containing two parts, with the downstream parser >>>>> ignoring the meta data >>>>> 1. Data >>>>> 2. Meta-data >>>>> 2. Use option 1, but concern regarding how unifiers will treat meta >>>>> data, if they need to unify that as well. >>>>> 3. Another comment is to have a centralized meta data repo. This may >> be >>>>> in memory as well, may be as a separate operator which stores and >>>> serves >>>>> the meta data to other operators. >>>>> >>>>> Thanks. >>>>> >>>>> -Bhupesh >>>> >>>> >> >>
