[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store

2016-09-07 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3595:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1792
[https://github.com/apache/kafka/pull/1792]

> Add capability to specify replication compact option for stream store
> -
>
> Key: KAFKA-3595
> URL: https://issues.apache.org/jira/browse/KAFKA-3595
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Henry Cai
>Assignee: Damian Guy
>Priority: Minor
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Currently state store replication always go through a compact kafka topic. 
> For some state stores, e.g. JoinWindow, there are no duplicates in the store, 
> there is not much benefit using a compacted topic.
> The problem of using compacted topic is the records can stay in kafka broker 
> forever. In my use case, my key is ad_id, it's incrementing all the time, not 
> bounded, I am worried the disk space on broker for that topic will go forever.
> I think we either need the capability to purge the compacted records on 
> broker, or allow us to specify different compact option for state store 
> replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store

2016-08-31 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy updated KAFKA-3595:
--
Status: Patch Available  (was: In Progress)

> Add capability to specify replication compact option for stream store
> -
>
> Key: KAFKA-3595
> URL: https://issues.apache.org/jira/browse/KAFKA-3595
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Henry Cai
>Assignee: Damian Guy
>Priority: Minor
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Currently state store replication always go through a compact kafka topic. 
> For some state stores, e.g. JoinWindow, there are no duplicates in the store, 
> there is not much benefit using a compacted topic.
> The problem of using compacted topic is the records can stay in kafka broker 
> forever. In my use case, my key is ad_id, it's incrementing all the time, not 
> bounded, I am worried the disk space on broker for that topic will go forever.
> I think we either need the capability to purge the compacted records on 
> broker, or allow us to specify different compact option for state store 
> replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store

2016-06-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3595:
-
Assignee: (was: Guozhang Wang)

> Add capability to specify replication compact option for stream store
> -
>
> Key: KAFKA-3595
> URL: https://issues.apache.org/jira/browse/KAFKA-3595
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Henry Cai
>Priority: Minor
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Currently state store replication always go through a compact kafka topic. 
> For some state stores, e.g. JoinWindow, there are no duplicates in the store, 
> there is not much benefit using a compacted topic.
> The problem of using compacted topic is the records can stay in kafka broker 
> forever. In my use case, my key is ad_id, it's incrementing all the time, not 
> bounded, I am worried the disk space on broker for that topic will go forever.
> I think we either need the capability to purge the compacted records on 
> broker, or allow us to specify different compact option for state store 
> replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store

2016-04-20 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3595:
-
Fix Version/s: 0.10.1.0

> Add capability to specify replication compact option for stream store
> -
>
> Key: KAFKA-3595
> URL: https://issues.apache.org/jira/browse/KAFKA-3595
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Henry Cai
>Assignee: Guozhang Wang
>Priority: Minor
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Currently state store replication always go through a compact kafka topic. 
> For some state stores, e.g. JoinWindow, there are no duplicates in the store, 
> there is not much benefit using a compacted topic.
> The problem of using compacted topic is the records can stay in kafka broker 
> forever. In my use case, my key is ad_id, it's incrementing all the time, not 
> bounded, I am worried the disk space on broker for that topic will go forever.
> I think we either need the capability to purge the compacted records on 
> broker, or allow us to specify different compact option for state store 
> replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store

2016-04-20 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3595:
-
Labels: user-experience  (was: )

> Add capability to specify replication compact option for stream store
> -
>
> Key: KAFKA-3595
> URL: https://issues.apache.org/jira/browse/KAFKA-3595
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Henry Cai
>Assignee: Guozhang Wang
>Priority: Minor
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Currently state store replication always go through a compact kafka topic. 
> For some state stores, e.g. JoinWindow, there are no duplicates in the store, 
> there is not much benefit using a compacted topic.
> The problem of using compacted topic is the records can stay in kafka broker 
> forever. In my use case, my key is ad_id, it's incrementing all the time, not 
> bounded, I am worried the disk space on broker for that topic will go forever.
> I think we either need the capability to purge the compacted records on 
> broker, or allow us to specify different compact option for state store 
> replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store

2016-04-20 Thread Henry Cai (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Cai updated KAFKA-3595:
-
Description: 
Currently state store replication always go through a compact kafka topic. For 
some state stores, e.g. JoinWindow, there are no duplicates in the store, there 
is not much benefit using a compacted topic.
The problem of using compacted topic is the records can stay in kafka broker 
forever. In my use case, my key is ad_id, it's incrementing all the time, not 
bounded, I am worried the disk space on broker for that topic will go forever.
I think we either need the capability to purge the compacted records on broker, 
or allow us to specify different compact option for state store replication.

  was:
Currently in Kafka Streams, the way the windows are expired in RocksDB is 
triggered by new event insertion.  When a window is created at T0 with 10 
minutes retention, when we saw a new record coming with event timestamp T0 + 10 
+1, we will expire that window (remove it) out of RocksDB.

In the real world, it's very easy to see event coming with future timestamp (or 
out-of-order events coming with big time gaps between events), this way of 
retiring a window based on one event's event timestamp is dangerous.  I think 
at least we need to consider both the event's event time and server/stream time 
elapse.


> Add capability to specify replication compact option for stream store
> -
>
> Key: KAFKA-3595
> URL: https://issues.apache.org/jira/browse/KAFKA-3595
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Henry Cai
>Assignee: Guozhang Wang
>Priority: Minor
>
> Currently state store replication always go through a compact kafka topic. 
> For some state stores, e.g. JoinWindow, there are no duplicates in the store, 
> there is not much benefit using a compacted topic.
> The problem of using compacted topic is the records can stay in kafka broker 
> forever. In my use case, my key is ad_id, it's incrementing all the time, not 
> bounded, I am worried the disk space on broker for that topic will go forever.
> I think we either need the capability to purge the compacted records on 
> broker, or allow us to specify different compact option for state store 
> replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store

2016-04-20 Thread Henry Cai (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Cai updated KAFKA-3595:
-
Description: 
Currently in Kafka Streams, the way the windows are expired in RocksDB is 
triggered by new event insertion.  When a window is created at T0 with 10 
minutes retention, when we saw a new record coming with event timestamp T0 + 10 
+1, we will expire that window (remove it) out of RocksDB.

In the real world, it's very easy to see event coming with future timestamp (or 
out-of-order events coming with big time gaps between events), this way of 
retiring a window based on one event's event timestamp is dangerous.  I think 
at least we need to consider both the event's event time and server/stream time 
elapse.

  was:
Currently state store replication always go through a compact kafka topic.  For 
some state stores, e.g. JoinWindow, there are no duplicates in the store, there 
is not much benefit using a compacted topic.

The problem of using compacted topic is the records can stay in kafka broker 
forever.  In my use case, my key is ad_id, it's incrementing all the time, not 
bounded, I am worried the disk space on broker for that topic will go forever.

I think we either need the capability to purge the compacted records on broker, 
or allow us to specify different compact option for state store replication.


> Add capability to specify replication compact option for stream store
> -
>
> Key: KAFKA-3595
> URL: https://issues.apache.org/jira/browse/KAFKA-3595
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Henry Cai
>Assignee: Guozhang Wang
>Priority: Minor
>
> Currently in Kafka Streams, the way the windows are expired in RocksDB is 
> triggered by new event insertion.  When a window is created at T0 with 10 
> minutes retention, when we saw a new record coming with event timestamp T0 + 
> 10 +1, we will expire that window (remove it) out of RocksDB.
> In the real world, it's very easy to see event coming with future timestamp 
> (or out-of-order events coming with big time gaps between events), this way 
> of retiring a window based on one event's event timestamp is dangerous.  I 
> think at least we need to consider both the event's event time and 
> server/stream time elapse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store

2016-04-20 Thread Henry Cai (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Cai updated KAFKA-3595:
-
Description: 
Currently state store replication always go through a compact kafka topic.  For 
some state stores, e.g. JoinWindow, there are no duplicates in the store, there 
is not much benefit using a compacted topic.

The problem of using compacted topic is the records can stay in kafka broker 
forever.  In my use case, my key is ad_id, it's incrementing all the time, not 
bounded, I am worried the disk space on broker for that topic will go forever.

I think we either need the capability to purge the compacted records on broker, 
or allow us to specify different compact option for state store replication.

  was:Add the ability to record metrics in the serializer/deserializer 
components. As it stands, I cannot record latency/sensor metrics since the API 
does not provide the context at the serde levels. Exposing the ProcessorContext 
at this level may not be the solution; but perhaps change the configure method 
to take a different config or init context and make the StreamMetrics available 
in that context along with config information.


> Add capability to specify replication compact option for stream store
> -
>
> Key: KAFKA-3595
> URL: https://issues.apache.org/jira/browse/KAFKA-3595
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Henry Cai
>Assignee: Guozhang Wang
>Priority: Minor
>
> Currently state store replication always go through a compact kafka topic.  
> For some state stores, e.g. JoinWindow, there are no duplicates in the store, 
> there is not much benefit using a compacted topic.
> The problem of using compacted topic is the records can stay in kafka broker 
> forever.  In my use case, my key is ad_id, it's incrementing all the time, 
> not bounded, I am worried the disk space on broker for that topic will go 
> forever.
> I think we either need the capability to purge the compacted records on 
> broker, or allow us to specify different compact option for state store 
> replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store

2016-04-20 Thread Henry Cai (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Cai updated KAFKA-3595:
-
Affects Version/s: (was: 0.9.0.1)
   0.10.1.0

> Add capability to specify replication compact option for stream store
> -
>
> Key: KAFKA-3595
> URL: https://issues.apache.org/jira/browse/KAFKA-3595
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Henry Cai
>Assignee: Guozhang Wang
>Priority: Minor
>
> Add the ability to record metrics in the serializer/deserializer components. 
> As it stands, I cannot record latency/sensor metrics since the API does not 
> provide the context at the serde levels. Exposing the ProcessorContext at 
> this level may not be the solution; but perhaps change the configure method 
> to take a different config or init context and make the StreamMetrics 
> available in that context along with config information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store

2016-04-20 Thread Henry Cai (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Cai updated KAFKA-3595:
-
Issue Type: Improvement  (was: New Feature)

> Add capability to specify replication compact option for stream store
> -
>
> Key: KAFKA-3595
> URL: https://issues.apache.org/jira/browse/KAFKA-3595
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Henry Cai
>Assignee: Guozhang Wang
>Priority: Minor
>
> Add the ability to record metrics in the serializer/deserializer components. 
> As it stands, I cannot record latency/sensor metrics since the API does not 
> provide the context at the serde levels. Exposing the ProcessorContext at 
> this level may not be the solution; but perhaps change the configure method 
> to take a different config or init context and make the StreamMetrics 
> available in that context along with config information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)