That makes sense. But then this should not be ON by default as per-tuple cost is high. meta data will also help with ask from Ilya for ability to add latency as meta-data per-tuple.
Thks, Amol On Wed, Nov 18, 2015 at 1:03 AM, Sandeep Deshmukh <[email protected]> wrote: > 1. Potentially each tuple can have different meta-data and hence sending > meta-data and data tuples separately is not a good idea. Example could > be > tuple incoming time which will vary for each tuple. In such a case, data > and meta-data should be* tightly coupled*. > 2. In case of separate meta-data tuple mechanism, schema will be > different for data tuple and meta-data tuple, which will make things > messy. > 3. Partitioning will pose a problem as data & meta-data tuples need to > be passed on to the same partition > > > I would vote for mechanism to bundle meta-data in the tuple, and schema to > worry only about the data. > > Regards, > Sandeep > > On Wed, Nov 18, 2015 at 11:14 AM, Gaurav Gupta <[email protected]> > wrote: > > > Yes in worst case we’ll have meta data followed by data for every tuple. > > > > Data schema will only have id / reference of meta data instead of whole > > meta data > > > > Thanks > > - Gaurav > > > > > On Nov 17, 2015, at 9:39 PM, Bhupesh Chawda <[email protected]> > > wrote: > > > > > > Ok, so in the worst case, we'll have meta data followed by data for > every > > > tuple. > > > However, in this case we need to include the meta data as part of the > > data > > > schema itself so as to allow the parser to process data and meta data > in > > a > > > common way. This is similar to option 1 in the first email. > > > > > > > > > Thanks. > > > Bhupesh > > > > > > On Wed, Nov 18, 2015 at 11:02 AM, Gaurav Gupta <[email protected] > > > > > wrote: > > > > > >> Bhupesh, > > >> > > >> No it doesn’t stall anything… Meta data and data tuples go on same > port. > > >> Whenever there is a change in meta data, send the meta data first and > > then > > >> tuples following it. So the first tuple that arrives which has > different > > >> meta data, will trigger sending of new meta data. > > >> > > >> Thanks > > >> - Gaurav > > >> > > >>> On Nov 17, 2015, at 9:28 PM, Bhupesh Chawda <[email protected] > > > > >> wrote: > > >>> > > >>> Depends on how "real time" the scenario is. > > >>> I think sending it only once during a window might work for some use > > >> cases. > > >>> If my understanding is correct, this essentially stalls the > processing > > >> of a > > >>> window until the meta data is available which is not until end window > > of > > >>> the upstream operator. > > >>> > > >>> Thanks > > >>> -Bhupesh > > >>> > > >>> > > >>> On Wed, Nov 18, 2015 at 10:54 AM, Gaurav Gupta < > [email protected] > > > > > >>> wrote: > > >>> > > >>>> Bhupesh, > > >>>> > > >>>> If the requirement is to send meta data with every tuple then it > > should > > >> be > > >>>> send with data schema itself. > > >>>> Can sending meta data be optimized the way platform does with > > >>>> DefaultStatefulStreamCodec. I mean send the meta data only once in a > > >> window > > >>>> and all the tuples that are associated with this meta data have this > > >> meta > > >>>> data’s id. > > >>>> > > >>>> Thanks > > >>>> - Gaurav > > >>>> > > >>>>> On Nov 17, 2015, at 8:20 PM, Bhupesh Chawda < > [email protected] > > > > > >>>> wrote: > > >>>>> > > >>>>> Hi All, > > >>>>> > > >>>>> In the design of input modules, we are facing situations where we > > might > > >>>>> need to pass on some meta data to the downstream modules, in > addition > > >> to > > >>>>> actual data. Further, this meta data may need to be sent per > record. > > An > > >>>>> example use case is to send a record and additionally send the file > > >> name > > >>>>> (as meta data) from which the record was read. Another example is > > >> sending > > >>>>> out the kafka topic information along with the message. > > >>>>> > > >>>>> We are exploring options on: > > >>>>> > > >>>>> 1. Whether to include the meta information in the data schema, so > as > > >> to > > >>>>> allow the parser to handle this data as regular data. This will > > >> involve > > >>>>> changing the schema of the data. > > >>>>> 2. Whether to handle meta data separately and modify the behaviour > of > > >>>>> parser / converter to handle meta data separately as well. > > >>>>> 3. Use additional ports to transfer such meta data depending on > > >>>>> different modules. > > >>>>> 4. Any other option > > >>>>> > > >>>>> Please comment. > > >>>>> > > >>>>> Consolidating comments on another thread here: > > >>>>> > > >>>>> 1. Have the tuple containing two parts, with the downstream parser > > >>>>> ignoring the meta data > > >>>>> 1. Data > > >>>>> 2. Meta-data > > >>>>> 2. Use option 1, but concern regarding how unifiers will treat meta > > >>>>> data, if they need to unify that as well. > > >>>>> 3. Another comment is to have a centralized meta data repo. This > may > > >> be > > >>>>> in memory as well, may be as a separate operator which stores and > > >>>> serves > > >>>>> the meta data to other operators. > > >>>>> > > >>>>> Thanks. > > >>>>> > > >>>>> -Bhupesh > > >>>> > > >>>> > > >> > > >> > > > > >
