1. Potentially each tuple can have different meta-data and hence sending meta-data and data tuples separately is not a good idea. Example could be tuple incoming time which will vary for each tuple. In such a case, data and meta-data should be* tightly coupled*. 2. In case of separate meta-data tuple mechanism, schema will be different for data tuple and meta-data tuple, which will make things messy. 3. Partitioning will pose a problem as data & meta-data tuples need to be passed on to the same partition
I would vote for mechanism to bundle meta-data in the tuple, and schema to worry only about the data. Regards, Sandeep On Wed, Nov 18, 2015 at 11:14 AM, Gaurav Gupta <[email protected]> wrote: > Yes in worst case we’ll have meta data followed by data for every tuple. > > Data schema will only have id / reference of meta data instead of whole > meta data > > Thanks > - Gaurav > > > On Nov 17, 2015, at 9:39 PM, Bhupesh Chawda <[email protected]> > wrote: > > > > Ok, so in the worst case, we'll have meta data followed by data for every > > tuple. > > However, in this case we need to include the meta data as part of the > data > > schema itself so as to allow the parser to process data and meta data in > a > > common way. This is similar to option 1 in the first email. > > > > > > Thanks. > > Bhupesh > > > > On Wed, Nov 18, 2015 at 11:02 AM, Gaurav Gupta <[email protected]> > > wrote: > > > >> Bhupesh, > >> > >> No it doesn’t stall anything… Meta data and data tuples go on same port. > >> Whenever there is a change in meta data, send the meta data first and > then > >> tuples following it. So the first tuple that arrives which has different > >> meta data, will trigger sending of new meta data. > >> > >> Thanks > >> - Gaurav > >> > >>> On Nov 17, 2015, at 9:28 PM, Bhupesh Chawda <[email protected]> > >> wrote: > >>> > >>> Depends on how "real time" the scenario is. > >>> I think sending it only once during a window might work for some use > >> cases. > >>> If my understanding is correct, this essentially stalls the processing > >> of a > >>> window until the meta data is available which is not until end window > of > >>> the upstream operator. > >>> > >>> Thanks > >>> -Bhupesh > >>> > >>> > >>> On Wed, Nov 18, 2015 at 10:54 AM, Gaurav Gupta <[email protected] > > > >>> wrote: > >>> > >>>> Bhupesh, > >>>> > >>>> If the requirement is to send meta data with every tuple then it > should > >> be > >>>> send with data schema itself. > >>>> Can sending meta data be optimized the way platform does with > >>>> DefaultStatefulStreamCodec. I mean send the meta data only once in a > >> window > >>>> and all the tuples that are associated with this meta data have this > >> meta > >>>> data’s id. > >>>> > >>>> Thanks > >>>> - Gaurav > >>>> > >>>>> On Nov 17, 2015, at 8:20 PM, Bhupesh Chawda <[email protected] > > > >>>> wrote: > >>>>> > >>>>> Hi All, > >>>>> > >>>>> In the design of input modules, we are facing situations where we > might > >>>>> need to pass on some meta data to the downstream modules, in addition > >> to > >>>>> actual data. Further, this meta data may need to be sent per record. > An > >>>>> example use case is to send a record and additionally send the file > >> name > >>>>> (as meta data) from which the record was read. Another example is > >> sending > >>>>> out the kafka topic information along with the message. > >>>>> > >>>>> We are exploring options on: > >>>>> > >>>>> 1. Whether to include the meta information in the data schema, so as > >> to > >>>>> allow the parser to handle this data as regular data. This will > >> involve > >>>>> changing the schema of the data. > >>>>> 2. Whether to handle meta data separately and modify the behaviour of > >>>>> parser / converter to handle meta data separately as well. > >>>>> 3. Use additional ports to transfer such meta data depending on > >>>>> different modules. > >>>>> 4. Any other option > >>>>> > >>>>> Please comment. > >>>>> > >>>>> Consolidating comments on another thread here: > >>>>> > >>>>> 1. Have the tuple containing two parts, with the downstream parser > >>>>> ignoring the meta data > >>>>> 1. Data > >>>>> 2. Meta-data > >>>>> 2. Use option 1, but concern regarding how unifiers will treat meta > >>>>> data, if they need to unify that as well. > >>>>> 3. Another comment is to have a centralized meta data repo. This may > >> be > >>>>> in memory as well, may be as a separate operator which stores and > >>>> serves > >>>>> the meta data to other operators. > >>>>> > >>>>> Thanks. > >>>>> > >>>>> -Bhupesh > >>>> > >>>> > >> > >> > >
