Re: Are RocksDBWindowStore windows hopping or sliding?

2020-03-30 Thread Matthias J. Sax
`windowSize` has not impact on used segments: it is only used to compute
the window end timestamp when you fetch a window, because the end-time
is not stored explicitly (as all windows are assumed to have the same
window size; i.e., it's a storage optimization to safe the bytes for
window end timestamp but re-compute it only-the-fly on read).

`retainDuplicates` should be set to `false` for your use case. If you
set it to `true`, you effectively make the store "append only" and each
key will internally be extended with a counter to create a unique
primary key for RocksDB to allow storing multiple record with the same
key (i.e., to store duplicates :)). This feature is used by Kafka
Streams to implement stream-stream joins, i.e., instead of storing
windowed data, the store is used to store raw records for a certain
period of time. Note that for KStreams there is no notion of a primary
key and record keys are not unique and the stored timestamp will just be
be the record timestamp.

Does this help?


-Matthias



On 3/30/20 2:58 AM, Sachin Mittal wrote:
> Hi,
> I understood how window stores are implemented using rocksdb.
> When creating an instance of RocksDBWindowStore we pass two additional
> arguments:
> retainDuplicates
> windowSize
>
> I have not clearly understood the purpose of these two.
> Like say in my application I just create one windowed store of a given
size
> say 10 minutes and retention of 30 minutes.
>
> Does this mean internally it will create a one rocksdb segment for every
> record within 10 minutes boundary and retain it for 30 minutes?
> If a new record arrives beyond that 10 minutes a new segment gets created?
>
> How does retainDuplicates comes into play here?
>
> Thanks
> Sachin
>
>
>
> On Mon, Mar 2, 2020 at 12:49 AM Matthias J. Sax  wrote:
>
> If you want to put a record into multiple window, you can do a `put()`
> for each window.
> 
> The DSL uses the store in the exact same manner for hopping window
> (compare the code I shared in the last reply). Even if windows are
> overlapping, the grouping-key+window-start-timestamp is a unique
> primary key for each window.
> 
> -Matthias
> 
> On 2/27/20 9:26 AM, Sachin Mittal wrote:
 Hi, Yes I get that when I am using the apis provided by kstream I
 can basically use both: -  Tumbling time window (non-overlapping,
 gap-less windows) -  Hopping time window (Time-based Fixed-size,
 overlapping windows)

 I wanted to know if I am using state store directly when created
 using a RocksDbWindowBytesStoreSupplier. In that case the
 RocksDBWindowStore created will always be of type Tumbling. ie any
 record put into that store will be part of one window only.

 Thanks Sachin






 On Thu, Feb 27, 2020 at 1:09 PM Matthias J. Sax 
 wrote:

> What you call "sliding window" is called "hopping window" in
> Kafka Streams.
>
> And yes, you can use a windowed-store for this case: In fact, a
> non-overlapping tumbling window is just a special case of a
> hopping window with advance == window-size.
>
> In Kafka Streams we have a single implementation for hopping
> windows (that we use for tumbling windows, too):
>
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/
> apache/kafka/streams/kstream/internals/KStreamWindowAggregate.java
> 
>
>
>
>
> -Matthias
>
> On 2/26/20 9:47 PM, Sachin Mittal wrote:
>> Hi, So far how I have understood is that when we create a
>> rocksdb window
> store;
>> we specify a window size and retention period.
>>
>> So windows are created from epoch time based on size, say size
>> if 100
> then
>> windows are: [0, 100), [100, 200), [200, 300) ...
>>
>> Windows are retained based on retention period and after which
>> it is dropped.
>>
>> Also a window is divided into segments which is implemented
>> using a
> treemap.
>>
>> Please confirm if my understanding is correct.
>>
>> Also looks from all this is that windows are always hopping.
>>
>> Is there a case of sliding windows that can be created? If yes
>> how? Example of sliding window would be: [0, 100), [75, 175),
>> [150, 250) ...
>>
>> Thanks Sachin
>>
>
>

>>
>



signature.asc
Description: OpenPGP digital signature


Re: Are RocksDBWindowStore windows hopping or sliding?

2020-03-30 Thread Sachin Mittal
Hi,
I understood how window stores are implemented using rocksdb.
When creating an instance of RocksDBWindowStore we pass two additional
arguments:
retainDuplicates
windowSize

I have not clearly understood the purpose of these two.
Like say in my application I just create one windowed store of a given size
say 10 minutes and retention of 30 minutes.

Does this mean internally it will create a one rocksdb segment for every
record within 10 minutes boundary and retain it for 30 minutes?
If a new record arrives beyond that 10 minutes a new segment gets created?

How does retainDuplicates comes into play here?

Thanks
Sachin



On Mon, Mar 2, 2020 at 12:49 AM Matthias J. Sax  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> If you want to put a record into multiple window, you can do a `put()`
> for each window.
>
> The DSL uses the store in the exact same manner for hopping window
> (compare the code I shared in the last reply). Even if windows are
> overlapping, the grouping-key+window-start-timestamp is a unique
> primary key for each window.
>
> - -Matthias
>
> On 2/27/20 9:26 AM, Sachin Mittal wrote:
> > Hi, Yes I get that when I am using the apis provided by kstream I
> > can basically use both: -  Tumbling time window (non-overlapping,
> > gap-less windows) -  Hopping time window (Time-based Fixed-size,
> > overlapping windows)
> >
> > I wanted to know if I am using state store directly when created
> > using a RocksDbWindowBytesStoreSupplier. In that case the
> > RocksDBWindowStore created will always be of type Tumbling. ie any
> > record put into that store will be part of one window only.
> >
> > Thanks Sachin
> >
> >
> >
> >
> >
> >
> > On Thu, Feb 27, 2020 at 1:09 PM Matthias J. Sax 
> > wrote:
> >
> >> What you call "sliding window" is called "hopping window" in
> >> Kafka Streams.
> >>
> >> And yes, you can use a windowed-store for this case: In fact, a
> >> non-overlapping tumbling window is just a special case of a
> >> hopping window with advance == window-size.
> >>
> >> In Kafka Streams we have a single implementation for hopping
> >> windows (that we use for tumbling windows, too):
> >>
> >> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/
> apache/kafka/streams/kstream/internals/KStreamWindowAggregate.java
> 
> >>
> >>
> >>
> >>
> - -Matthias
> >>
> >> On 2/26/20 9:47 PM, Sachin Mittal wrote:
> >>> Hi, So far how I have understood is that when we create a
> >>> rocksdb window
> >> store;
> >>> we specify a window size and retention period.
> >>>
> >>> So windows are created from epoch time based on size, say size
> >>> if 100
> >> then
> >>> windows are: [0, 100), [100, 200), [200, 300) ...
> >>>
> >>> Windows are retained based on retention period and after which
> >>> it is dropped.
> >>>
> >>> Also a window is divided into segments which is implemented
> >>> using a
> >> treemap.
> >>>
> >>> Please confirm if my understanding is correct.
> >>>
> >>> Also looks from all this is that windows are always hopping.
> >>>
> >>> Is there a case of sliding windows that can be created? If yes
> >>> how? Example of sliding window would be: [0, 100), [75, 175),
> >>> [150, 250) ...
> >>>
> >>> Thanks Sachin
> >>>
> >>
> >>
> >
> -BEGIN PGP SIGNATURE-
>
> iQIzBAEBCgAdFiEEI8mthP+5zxXZZdDSO4miYXKq/OgFAl5cCqcACgkQO4miYXKq
> /OiDDhAAp6tUdk97THqCODZ2bUPl2TpRcnyeL9uvGoThlQoNNczf0c4E33tRqGlB
> 9mNSXCsO5xfUEX2SS77pXSXIcTDeEDlV0u+rKG6dgul9ENo9pc7G431HPTf868LS
> UGUXC5TaoslpUmyc/Ig9eY9FNLWBUeB6rGQacTEPRReL3xufjiD3Af/PwXwZEAty
> yGfZ143MLl7a694m+y2lHkbdRsoYyQCMXOC09v34cm47EHxtAaAyXkC9zizKAS/T
> D4JQgI4zA1Xa8JDoDSHCxl/HWQWJspIpd0xoAPqBnAQ0pz4kb57bQsoZQq79uJOo
> UUvEV9wFALIbzdOhb247LsfqNe9oBtaqVkYloZI7T7wADG/Po8QhLEO6mUqN3hcb
> WgZQ+JhaWImbYpZT1kYq/xUnLP9fKhRiNsHNEsXWl7WZ68pYCFuKncUV9dxDtvd+
> KUiTzpAciP7cGi6wM4SyvCbzagYLqbLZpr6vE6s5uRexvAb/LZnGnUUYyozxuDKp
> YtG3ceUtJs62JrYB8/UF3ohYIODiXxW1TlkGRoKf0ydDZN2FTER8v3rrQov9PfFn
> J/0HBNAa+UU7DZJcx4YaxiQMmLPJgxLk2iQ0z60Q43xXwjI9lHjQgqbKn0XaVWUD
> lxnpXF9teT7UpxCs4p4pVqId5R7W29ryu6jjCrR87fgN8cFRF+s=
> =IkD6
> -END PGP SIGNATURE-
>


Re: Are RocksDBWindowStore windows hopping or sliding?

2020-03-01 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

If you want to put a record into multiple window, you can do a `put()`
for each window.

The DSL uses the store in the exact same manner for hopping window
(compare the code I shared in the last reply). Even if windows are
overlapping, the grouping-key+window-start-timestamp is a unique
primary key for each window.

- -Matthias

On 2/27/20 9:26 AM, Sachin Mittal wrote:
> Hi, Yes I get that when I am using the apis provided by kstream I
> can basically use both: -  Tumbling time window (non-overlapping,
> gap-less windows) -  Hopping time window (Time-based Fixed-size,
> overlapping windows)
>
> I wanted to know if I am using state store directly when created
> using a RocksDbWindowBytesStoreSupplier. In that case the
> RocksDBWindowStore created will always be of type Tumbling. ie any
> record put into that store will be part of one window only.
>
> Thanks Sachin
>
>
>
>
>
>
> On Thu, Feb 27, 2020 at 1:09 PM Matthias J. Sax 
> wrote:
>
>> What you call "sliding window" is called "hopping window" in
>> Kafka Streams.
>>
>> And yes, you can use a windowed-store for this case: In fact, a
>> non-overlapping tumbling window is just a special case of a
>> hopping window with advance == window-size.
>>
>> In Kafka Streams we have a single implementation for hopping
>> windows (that we use for tumbling windows, too):
>>
>> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/
apache/kafka/streams/kstream/internals/KStreamWindowAggregate.java
>>
>>
>>
>>
- -Matthias
>>
>> On 2/26/20 9:47 PM, Sachin Mittal wrote:
>>> Hi, So far how I have understood is that when we create a
>>> rocksdb window
>> store;
>>> we specify a window size and retention period.
>>>
>>> So windows are created from epoch time based on size, say size
>>> if 100
>> then
>>> windows are: [0, 100), [100, 200), [200, 300) ...
>>>
>>> Windows are retained based on retention period and after which
>>> it is dropped.
>>>
>>> Also a window is divided into segments which is implemented
>>> using a
>> treemap.
>>>
>>> Please confirm if my understanding is correct.
>>>
>>> Also looks from all this is that windows are always hopping.
>>>
>>> Is there a case of sliding windows that can be created? If yes
>>> how? Example of sliding window would be: [0, 100), [75, 175),
>>> [150, 250) ...
>>>
>>> Thanks Sachin
>>>
>>
>>
>
-BEGIN PGP SIGNATURE-

iQIzBAEBCgAdFiEEI8mthP+5zxXZZdDSO4miYXKq/OgFAl5cCqcACgkQO4miYXKq
/OiDDhAAp6tUdk97THqCODZ2bUPl2TpRcnyeL9uvGoThlQoNNczf0c4E33tRqGlB
9mNSXCsO5xfUEX2SS77pXSXIcTDeEDlV0u+rKG6dgul9ENo9pc7G431HPTf868LS
UGUXC5TaoslpUmyc/Ig9eY9FNLWBUeB6rGQacTEPRReL3xufjiD3Af/PwXwZEAty
yGfZ143MLl7a694m+y2lHkbdRsoYyQCMXOC09v34cm47EHxtAaAyXkC9zizKAS/T
D4JQgI4zA1Xa8JDoDSHCxl/HWQWJspIpd0xoAPqBnAQ0pz4kb57bQsoZQq79uJOo
UUvEV9wFALIbzdOhb247LsfqNe9oBtaqVkYloZI7T7wADG/Po8QhLEO6mUqN3hcb
WgZQ+JhaWImbYpZT1kYq/xUnLP9fKhRiNsHNEsXWl7WZ68pYCFuKncUV9dxDtvd+
KUiTzpAciP7cGi6wM4SyvCbzagYLqbLZpr6vE6s5uRexvAb/LZnGnUUYyozxuDKp
YtG3ceUtJs62JrYB8/UF3ohYIODiXxW1TlkGRoKf0ydDZN2FTER8v3rrQov9PfFn
J/0HBNAa+UU7DZJcx4YaxiQMmLPJgxLk2iQ0z60Q43xXwjI9lHjQgqbKn0XaVWUD
lxnpXF9teT7UpxCs4p4pVqId5R7W29ryu6jjCrR87fgN8cFRF+s=
=IkD6
-END PGP SIGNATURE-


Re: Are RocksDBWindowStore windows hopping or sliding?

2020-02-27 Thread Sachin Mittal
Hi,
Yes I get that when I am using the apis provided by kstream I can basically
use both:
-  Tumbling time window (non-overlapping, gap-less windows)
-  Hopping time window (Time-based Fixed-size, overlapping windows)

I wanted to know if I am using state store directly when created using a
RocksDbWindowBytesStoreSupplier.
In that case the RocksDBWindowStore created will always be of type Tumbling.
ie any record put into that store will be part of one window only.

Thanks
Sachin






On Thu, Feb 27, 2020 at 1:09 PM Matthias J. Sax  wrote:

> What you call "sliding window" is called "hopping window" in Kafka Streams.
>
> And yes, you can use a windowed-store for this case: In fact, a
> non-overlapping tumbling window is just a special case of a hopping
> window with advance == window-size.
>
> In Kafka Streams we have a single implementation for hopping windows
> (that we use for tumbling windows, too):
>
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamWindowAggregate.java
>
>
> -Matthias
>
> On 2/26/20 9:47 PM, Sachin Mittal wrote:
> > Hi,
> > So far how I have understood is that when we create a rocksdb window
> store;
> > we specify a window size and retention period.
> >
> > So windows are created from epoch time based on size, say size if 100
> then
> > windows are:
> > [0, 100), [100, 200), [200, 300) ...
> >
> > Windows are retained based on retention period and after which it is
> > dropped.
> >
> > Also a window is divided into segments which is implemented using a
> treemap.
> >
> > Please confirm if my understanding is correct.
> >
> > Also looks from all this is that windows are always hopping.
> >
> > Is there a case of sliding windows that can be created? If yes how?
> > Example of sliding window would be:
> > [0, 100), [75, 175), [150, 250) ...
> >
> > Thanks
> > Sachin
> >
>
>


Re: Are RocksDBWindowStore windows hopping or sliding?

2020-02-26 Thread Matthias J. Sax
What you call "sliding window" is called "hopping window" in Kafka Streams.

And yes, you can use a windowed-store for this case: In fact, a
non-overlapping tumbling window is just a special case of a hopping
window with advance == window-size.

In Kafka Streams we have a single implementation for hopping windows
(that we use for tumbling windows, too):
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamWindowAggregate.java


-Matthias

On 2/26/20 9:47 PM, Sachin Mittal wrote:
> Hi,
> So far how I have understood is that when we create a rocksdb window store;
> we specify a window size and retention period.
> 
> So windows are created from epoch time based on size, say size if 100 then
> windows are:
> [0, 100), [100, 200), [200, 300) ...
> 
> Windows are retained based on retention period and after which it is
> dropped.
> 
> Also a window is divided into segments which is implemented using a treemap.
> 
> Please confirm if my understanding is correct.
> 
> Also looks from all this is that windows are always hopping.
> 
> Is there a case of sliding windows that can be created? If yes how?
> Example of sliding window would be:
> [0, 100), [75, 175), [150, 250) ...
> 
> Thanks
> Sachin
> 



signature.asc
Description: OpenPGP digital signature