Re: Merging Two KTables
Hi Sameer, Dmitry: Just a side note that for KStream.merge(), we do not guarantee timestamp ordering, so the resulted KStream may likely have out-of-ordering regarding the timestamps. If you do want to have some merging operations that respects the timestamps of the input streams because you believe they are well aligned, you need to either assume that all input streams do not have any out-of-ordering data, so some online merge-sort can be applied, or you assume the out of time range has some upper bound in practice so you can bookkeep and wait. As said, there is no golden standard rules for merging and hence we leave it to users to customize in the "process(Processor) API", or use "merge" if they are tolerable about timestamp ordering in the resulted stream. Guozhang On Tue, Jan 23, 2018 at 1:12 PM, Matthias J. Sax wrote: > Well. That is one possibility I guess. But some other way might be to > "merge both values" into a single one... There is no "straight forward" > best semantics IMHO. > > If you really need this, you can build it via Processor API. > > > -Matthias > > > On 1/23/18 7:46 AM, Dmitry Minkovsky wrote: > >> Merging two tables does not make too much sense because each table might > > contain an entry for the same key. So it's unclear, which of both values > > the merged table should contain. > > > > Which of both values should the table contain? Seems straightforward: it > > should contain the value with the highest timestamp, with > non-deterministic > > behavior when two timestamps are the same. > > > > > > ср, 26 июля 2017 г. в 9:42, Matthias J. Sax : > > > >> Merging two tables does not make too much sense because each table might > >> contain an entry for the same key. So it's unclear, which of both values > >> the merged table should contain. > >> > >> KTable.toStream() is just a semantic change and has no runtime overhead. > >> > >> -Matthias > >> > >> > >> On 7/26/17 1:34 PM, Sameer Kumar wrote: > >>> Hi, > >>> > >>> Is there a way I can merge two KTables just like I have in KStreams > api. > >>> KBuilder.merge(). > >>> > >>> I understand I can use KTable.toStream(), if I choose to use it, is > there > >>> any performance cost associated with this conversion or is it just a > API > >>> conversion. > >>> > >>> -Sameer. > >>> > >> > >> > > > > -- -- Guozhang
Re: Merging Two KTables
Well. That is one possibility I guess. But some other way might be to "merge both values" into a single one... There is no "straight forward" best semantics IMHO. If you really need this, you can build it via Processor API. -Matthias On 1/23/18 7:46 AM, Dmitry Minkovsky wrote: >> Merging two tables does not make too much sense because each table might > contain an entry for the same key. So it's unclear, which of both values > the merged table should contain. > > Which of both values should the table contain? Seems straightforward: it > should contain the value with the highest timestamp, with non-deterministic > behavior when two timestamps are the same. > > > ср, 26 июля 2017 г. в 9:42, Matthias J. Sax : > >> Merging two tables does not make too much sense because each table might >> contain an entry for the same key. So it's unclear, which of both values >> the merged table should contain. >> >> KTable.toStream() is just a semantic change and has no runtime overhead. >> >> -Matthias >> >> >> On 7/26/17 1:34 PM, Sameer Kumar wrote: >>> Hi, >>> >>> Is there a way I can merge two KTables just like I have in KStreams api. >>> KBuilder.merge(). >>> >>> I understand I can use KTable.toStream(), if I choose to use it, is there >>> any performance cost associated with this conversion or is it just a API >>> conversion. >>> >>> -Sameer. >>> >> >> > signature.asc Description: OpenPGP digital signature
Re: Merging Two KTables
> Merging two tables does not make too much sense because each table might contain an entry for the same key. So it's unclear, which of both values the merged table should contain. Which of both values should the table contain? Seems straightforward: it should contain the value with the highest timestamp, with non-deterministic behavior when two timestamps are the same. ср, 26 июля 2017 г. в 9:42, Matthias J. Sax : > Merging two tables does not make too much sense because each table might > contain an entry for the same key. So it's unclear, which of both values > the merged table should contain. > > KTable.toStream() is just a semantic change and has no runtime overhead. > > -Matthias > > > On 7/26/17 1:34 PM, Sameer Kumar wrote: > > Hi, > > > > Is there a way I can merge two KTables just like I have in KStreams api. > > KBuilder.merge(). > > > > I understand I can use KTable.toStream(), if I choose to use it, is there > > any performance cost associated with this conversion or is it just a API > > conversion. > > > > -Sameer. > > > >
Re: Merging Two KTables
Merging two tables does not make too much sense because each table might contain an entry for the same key. So it's unclear, which of both values the merged table should contain. KTable.toStream() is just a semantic change and has no runtime overhead. -Matthias On 7/26/17 1:34 PM, Sameer Kumar wrote: > Hi, > > Is there a way I can merge two KTables just like I have in KStreams api. > KBuilder.merge(). > > I understand I can use KTable.toStream(), if I choose to use it, is there > any performance cost associated with this conversion or is it just a API > conversion. > > -Sameer. > signature.asc Description: OpenPGP digital signature
Merging Two KTables
Hi, Is there a way I can merge two KTables just like I have in KStreams api. KBuilder.merge(). I understand I can use KTable.toStream(), if I choose to use it, is there any performance cost associated with this conversion or is it just a API conversion. -Sameer.