[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store
[ https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-3595: - Resolution: Fixed Status: Resolved (was: Patch Available) Issue resolved by pull request 1792 [https://github.com/apache/kafka/pull/1792] > Add capability to specify replication compact option for stream store > - > > Key: KAFKA-3595 > URL: https://issues.apache.org/jira/browse/KAFKA-3595 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Henry Cai >Assignee: Damian Guy >Priority: Minor > Labels: user-experience > Fix For: 0.10.1.0 > > > Currently state store replication always go through a compact kafka topic. > For some state stores, e.g. JoinWindow, there are no duplicates in the store, > there is not much benefit using a compacted topic. > The problem of using compacted topic is the records can stay in kafka broker > forever. In my use case, my key is ad_id, it's incrementing all the time, not > bounded, I am worried the disk space on broker for that topic will go forever. > I think we either need the capability to purge the compacted records on > broker, or allow us to specify different compact option for state store > replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store
[ https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damian Guy updated KAFKA-3595: -- Status: Patch Available (was: In Progress) > Add capability to specify replication compact option for stream store > - > > Key: KAFKA-3595 > URL: https://issues.apache.org/jira/browse/KAFKA-3595 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Henry Cai >Assignee: Damian Guy >Priority: Minor > Labels: user-experience > Fix For: 0.10.1.0 > > > Currently state store replication always go through a compact kafka topic. > For some state stores, e.g. JoinWindow, there are no duplicates in the store, > there is not much benefit using a compacted topic. > The problem of using compacted topic is the records can stay in kafka broker > forever. In my use case, my key is ad_id, it's incrementing all the time, not > bounded, I am worried the disk space on broker for that topic will go forever. > I think we either need the capability to purge the compacted records on > broker, or allow us to specify different compact option for state store > replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store
[ https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-3595: - Assignee: (was: Guozhang Wang) > Add capability to specify replication compact option for stream store > - > > Key: KAFKA-3595 > URL: https://issues.apache.org/jira/browse/KAFKA-3595 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Henry Cai >Priority: Minor > Labels: user-experience > Fix For: 0.10.1.0 > > > Currently state store replication always go through a compact kafka topic. > For some state stores, e.g. JoinWindow, there are no duplicates in the store, > there is not much benefit using a compacted topic. > The problem of using compacted topic is the records can stay in kafka broker > forever. In my use case, my key is ad_id, it's incrementing all the time, not > bounded, I am worried the disk space on broker for that topic will go forever. > I think we either need the capability to purge the compacted records on > broker, or allow us to specify different compact option for state store > replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store
[ https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-3595: - Fix Version/s: 0.10.1.0 > Add capability to specify replication compact option for stream store > - > > Key: KAFKA-3595 > URL: https://issues.apache.org/jira/browse/KAFKA-3595 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Henry Cai >Assignee: Guozhang Wang >Priority: Minor > Labels: user-experience > Fix For: 0.10.1.0 > > > Currently state store replication always go through a compact kafka topic. > For some state stores, e.g. JoinWindow, there are no duplicates in the store, > there is not much benefit using a compacted topic. > The problem of using compacted topic is the records can stay in kafka broker > forever. In my use case, my key is ad_id, it's incrementing all the time, not > bounded, I am worried the disk space on broker for that topic will go forever. > I think we either need the capability to purge the compacted records on > broker, or allow us to specify different compact option for state store > replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store
[ https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-3595: - Labels: user-experience (was: ) > Add capability to specify replication compact option for stream store > - > > Key: KAFKA-3595 > URL: https://issues.apache.org/jira/browse/KAFKA-3595 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Henry Cai >Assignee: Guozhang Wang >Priority: Minor > Labels: user-experience > Fix For: 0.10.1.0 > > > Currently state store replication always go through a compact kafka topic. > For some state stores, e.g. JoinWindow, there are no duplicates in the store, > there is not much benefit using a compacted topic. > The problem of using compacted topic is the records can stay in kafka broker > forever. In my use case, my key is ad_id, it's incrementing all the time, not > bounded, I am worried the disk space on broker for that topic will go forever. > I think we either need the capability to purge the compacted records on > broker, or allow us to specify different compact option for state store > replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store
[ https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Cai updated KAFKA-3595: - Description: Currently state store replication always go through a compact kafka topic. For some state stores, e.g. JoinWindow, there are no duplicates in the store, there is not much benefit using a compacted topic. The problem of using compacted topic is the records can stay in kafka broker forever. In my use case, my key is ad_id, it's incrementing all the time, not bounded, I am worried the disk space on broker for that topic will go forever. I think we either need the capability to purge the compacted records on broker, or allow us to specify different compact option for state store replication. was: Currently in Kafka Streams, the way the windows are expired in RocksDB is triggered by new event insertion. When a window is created at T0 with 10 minutes retention, when we saw a new record coming with event timestamp T0 + 10 +1, we will expire that window (remove it) out of RocksDB. In the real world, it's very easy to see event coming with future timestamp (or out-of-order events coming with big time gaps between events), this way of retiring a window based on one event's event timestamp is dangerous. I think at least we need to consider both the event's event time and server/stream time elapse. > Add capability to specify replication compact option for stream store > - > > Key: KAFKA-3595 > URL: https://issues.apache.org/jira/browse/KAFKA-3595 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Henry Cai >Assignee: Guozhang Wang >Priority: Minor > > Currently state store replication always go through a compact kafka topic. > For some state stores, e.g. JoinWindow, there are no duplicates in the store, > there is not much benefit using a compacted topic. > The problem of using compacted topic is the records can stay in kafka broker > forever. In my use case, my key is ad_id, it's incrementing all the time, not > bounded, I am worried the disk space on broker for that topic will go forever. > I think we either need the capability to purge the compacted records on > broker, or allow us to specify different compact option for state store > replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store
[ https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Cai updated KAFKA-3595: - Description: Currently in Kafka Streams, the way the windows are expired in RocksDB is triggered by new event insertion. When a window is created at T0 with 10 minutes retention, when we saw a new record coming with event timestamp T0 + 10 +1, we will expire that window (remove it) out of RocksDB. In the real world, it's very easy to see event coming with future timestamp (or out-of-order events coming with big time gaps between events), this way of retiring a window based on one event's event timestamp is dangerous. I think at least we need to consider both the event's event time and server/stream time elapse. was: Currently state store replication always go through a compact kafka topic. For some state stores, e.g. JoinWindow, there are no duplicates in the store, there is not much benefit using a compacted topic. The problem of using compacted topic is the records can stay in kafka broker forever. In my use case, my key is ad_id, it's incrementing all the time, not bounded, I am worried the disk space on broker for that topic will go forever. I think we either need the capability to purge the compacted records on broker, or allow us to specify different compact option for state store replication. > Add capability to specify replication compact option for stream store > - > > Key: KAFKA-3595 > URL: https://issues.apache.org/jira/browse/KAFKA-3595 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Henry Cai >Assignee: Guozhang Wang >Priority: Minor > > Currently in Kafka Streams, the way the windows are expired in RocksDB is > triggered by new event insertion. When a window is created at T0 with 10 > minutes retention, when we saw a new record coming with event timestamp T0 + > 10 +1, we will expire that window (remove it) out of RocksDB. > In the real world, it's very easy to see event coming with future timestamp > (or out-of-order events coming with big time gaps between events), this way > of retiring a window based on one event's event timestamp is dangerous. I > think at least we need to consider both the event's event time and > server/stream time elapse. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store
[ https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Cai updated KAFKA-3595: - Description: Currently state store replication always go through a compact kafka topic. For some state stores, e.g. JoinWindow, there are no duplicates in the store, there is not much benefit using a compacted topic. The problem of using compacted topic is the records can stay in kafka broker forever. In my use case, my key is ad_id, it's incrementing all the time, not bounded, I am worried the disk space on broker for that topic will go forever. I think we either need the capability to purge the compacted records on broker, or allow us to specify different compact option for state store replication. was:Add the ability to record metrics in the serializer/deserializer components. As it stands, I cannot record latency/sensor metrics since the API does not provide the context at the serde levels. Exposing the ProcessorContext at this level may not be the solution; but perhaps change the configure method to take a different config or init context and make the StreamMetrics available in that context along with config information. > Add capability to specify replication compact option for stream store > - > > Key: KAFKA-3595 > URL: https://issues.apache.org/jira/browse/KAFKA-3595 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Henry Cai >Assignee: Guozhang Wang >Priority: Minor > > Currently state store replication always go through a compact kafka topic. > For some state stores, e.g. JoinWindow, there are no duplicates in the store, > there is not much benefit using a compacted topic. > The problem of using compacted topic is the records can stay in kafka broker > forever. In my use case, my key is ad_id, it's incrementing all the time, > not bounded, I am worried the disk space on broker for that topic will go > forever. > I think we either need the capability to purge the compacted records on > broker, or allow us to specify different compact option for state store > replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store
[ https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Cai updated KAFKA-3595: - Affects Version/s: (was: 0.9.0.1) 0.10.1.0 > Add capability to specify replication compact option for stream store > - > > Key: KAFKA-3595 > URL: https://issues.apache.org/jira/browse/KAFKA-3595 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Henry Cai >Assignee: Guozhang Wang >Priority: Minor > > Add the ability to record metrics in the serializer/deserializer components. > As it stands, I cannot record latency/sensor metrics since the API does not > provide the context at the serde levels. Exposing the ProcessorContext at > this level may not be the solution; but perhaps change the configure method > to take a different config or init context and make the StreamMetrics > available in that context along with config information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3595) Add capability to specify replication compact option for stream store
[ https://issues.apache.org/jira/browse/KAFKA-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Cai updated KAFKA-3595: - Issue Type: Improvement (was: New Feature) > Add capability to specify replication compact option for stream store > - > > Key: KAFKA-3595 > URL: https://issues.apache.org/jira/browse/KAFKA-3595 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 0.10.1.0 >Reporter: Henry Cai >Assignee: Guozhang Wang >Priority: Minor > > Add the ability to record metrics in the serializer/deserializer components. > As it stands, I cannot record latency/sensor metrics since the API does not > provide the context at the serde levels. Exposing the ProcessorContext at > this level may not be the solution; but perhaps change the configure method > to take a different config or init context and make the StreamMetrics > available in that context along with config information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)