Re: Merging Two KTables

2018-01-23 Thread Guozhang Wang
Hi Sameer, Dmitry:

Just a side note that for KStream.merge(), we do not guarantee timestamp
ordering, so the resulted KStream may likely have out-of-ordering regarding
the timestamps. If you do want to have some merging operations that
respects the timestamps of the input streams because you believe they are
well aligned, you need to either assume that all input streams do not have
any out-of-ordering data, so some online merge-sort can be applied, or you
assume the out of time range has some upper bound in practice so you can
bookkeep and wait. As said, there is no golden standard rules for merging
and hence we leave it to users to customize in the "process(Processor)
API", or use "merge" if they are tolerable about timestamp ordering in the
resulted stream.


Guozhang


On Tue, Jan 23, 2018 at 1:12 PM, Matthias J. Sax 
wrote:

> Well. That is one possibility I guess. But some other way might be to
> "merge both values" into a single one... There is no "straight forward"
> best semantics IMHO.
>
> If you really need this, you can build it via Processor API.
>
>
> -Matthias
>
>
> On 1/23/18 7:46 AM, Dmitry Minkovsky wrote:
> >> Merging two tables does not make too much sense because each table might
> > contain an entry for the same key. So it's unclear, which of both values
> > the merged table should contain.
> >
> > Which of both values should the table contain? Seems straightforward: it
> > should contain the value with the highest timestamp, with
> non-deterministic
> > behavior when two timestamps are the same.
> >
> >
> > ср, 26 июля 2017 г. в 9:42, Matthias J. Sax :
> >
> >> Merging two tables does not make too much sense because each table might
> >> contain an entry for the same key. So it's unclear, which of both values
> >> the merged table should contain.
> >>
> >> KTable.toStream() is just a semantic change and has no runtime overhead.
> >>
> >> -Matthias
> >>
> >>
> >> On 7/26/17 1:34 PM, Sameer Kumar wrote:
> >>> Hi,
> >>>
> >>> Is there a way I can merge two KTables just like I have in KStreams
> api.
> >>> KBuilder.merge().
> >>>
> >>> I understand I can use KTable.toStream(), if I choose to use it, is
> there
> >>> any performance cost associated with this conversion or is it just a
> API
> >>> conversion.
> >>>
> >>> -Sameer.
> >>>
> >>
> >>
> >
>
>


-- 
-- Guozhang


Re: Merging Two KTables

2018-01-23 Thread Matthias J. Sax
Well. That is one possibility I guess. But some other way might be to
"merge both values" into a single one... There is no "straight forward"
best semantics IMHO.

If you really need this, you can build it via Processor API.


-Matthias


On 1/23/18 7:46 AM, Dmitry Minkovsky wrote:
>> Merging two tables does not make too much sense because each table might
> contain an entry for the same key. So it's unclear, which of both values
> the merged table should contain.
> 
> Which of both values should the table contain? Seems straightforward: it
> should contain the value with the highest timestamp, with non-deterministic
> behavior when two timestamps are the same.
> 
> 
> ср, 26 июля 2017 г. в 9:42, Matthias J. Sax :
> 
>> Merging two tables does not make too much sense because each table might
>> contain an entry for the same key. So it's unclear, which of both values
>> the merged table should contain.
>>
>> KTable.toStream() is just a semantic change and has no runtime overhead.
>>
>> -Matthias
>>
>>
>> On 7/26/17 1:34 PM, Sameer Kumar wrote:
>>> Hi,
>>>
>>> Is there a way I can merge two KTables just like I have in KStreams api.
>>> KBuilder.merge().
>>>
>>> I understand I can use KTable.toStream(), if I choose to use it, is there
>>> any performance cost associated with this conversion or is it just a API
>>> conversion.
>>>
>>> -Sameer.
>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Merging Two KTables

2018-01-23 Thread Dmitry Minkovsky
> Merging two tables does not make too much sense because each table might
contain an entry for the same key. So it's unclear, which of both values
the merged table should contain.

Which of both values should the table contain? Seems straightforward: it
should contain the value with the highest timestamp, with non-deterministic
behavior when two timestamps are the same.


ср, 26 июля 2017 г. в 9:42, Matthias J. Sax :

> Merging two tables does not make too much sense because each table might
> contain an entry for the same key. So it's unclear, which of both values
> the merged table should contain.
>
> KTable.toStream() is just a semantic change and has no runtime overhead.
>
> -Matthias
>
>
> On 7/26/17 1:34 PM, Sameer Kumar wrote:
> > Hi,
> >
> > Is there a way I can merge two KTables just like I have in KStreams api.
> > KBuilder.merge().
> >
> > I understand I can use KTable.toStream(), if I choose to use it, is there
> > any performance cost associated with this conversion or is it just a API
> > conversion.
> >
> > -Sameer.
> >
>
>


Re: Merging Two KTables

2017-07-26 Thread Matthias J. Sax
Merging two tables does not make too much sense because each table might
contain an entry for the same key. So it's unclear, which of both values
the merged table should contain.

KTable.toStream() is just a semantic change and has no runtime overhead.

-Matthias


On 7/26/17 1:34 PM, Sameer Kumar wrote:
> Hi,
> 
> Is there a way I can merge two KTables just like I have in KStreams api.
> KBuilder.merge().
> 
> I understand I can use KTable.toStream(), if I choose to use it, is there
> any performance cost associated with this conversion or is it just a API
> conversion.
> 
> -Sameer.
> 



signature.asc
Description: OpenPGP digital signature


Merging Two KTables

2017-07-26 Thread Sameer Kumar
Hi,

Is there a way I can merge two KTables just like I have in KStreams api.
KBuilder.merge().

I understand I can use KTable.toStream(), if I choose to use it, is there
any performance cost associated with this conversion or is it just a API
conversion.

-Sameer.