[jira] [Commented] (KAFKA-6035) Avoid creating changelog topics for state stores that are directly piped to a sink topic

2023-03-07 Thread Andy Coates (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697387#comment-17697387
 ] 

Andy Coates commented on KAFKA-6035:


I've had a couple of instances now where I've had to suffer these "dual 
changelog topics". A few of these times the topic in question was a busy topic 
and having two copies was expensive in terms of cluster load / storage.

Consider a KS based microservice architecture, where each service defines sets 
of static input and output topics, using sensible naming conventions where the 
name of the output topic should be any one of the following:
 # static, i.e. not dependent on something that can be changed in config, i.e. 
application.id
 # data-centric, i.e. based on the data set it contains, not the service that 
happens to be generating it
 # hierarchical, i.e. the topic prefix should conform to some org-wide data 
model
 # etc

Any of the above mean a change-log topic name of 
"--changelog" is going to be problematic.

Either avoiding the internal change-log (as covered by this issue), or allowing 
full control of the internal topics name (as covered by 
https://issues.apache.org/jira/browse/KAFKA-5386), would work as a solution.

> Avoid creating changelog topics for state stores that are directly piped to a 
> sink topic
> 
>
> Key: KAFKA-6035
> URL: https://issues.apache.org/jira/browse/KAFKA-6035
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Jeyhun Karimov
>Priority: Major
>
> Today Streams make all state stores to be backed by a changelog topic by 
> default unless users overrides it by {{disableLogging}} when creating the 
> state store / materializing the KTable. However there are a few cases where a 
> separate changelog topic would not be required as we can re-use an existing 
> topic for that. This ticket summarize a specific issue that can be optimized:
> Consider the case when a KTable is materialized and then sent directly into a 
> sink topic with the same key, e.g.
> {code}
> table1 = stream.groupBy(...).aggregate("state1").to("topic2");
> {code}
> Then we do not need to create a {{state1-changelog}} but can just use 
> {{topic2}} as its changelog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-6035) Avoid creating changelog topics for state stores that are directly piped to a sink topic

2022-03-08 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503213#comment-17503213
 ] 

Matthias J. Sax commented on KAFKA-6035:


Even if this ticket is finished, if you call `withLoggingDisabled` you tell 
Kafka Streams that the store is ephemeral and does not need to be recovered 
after a failure.

This feature will only avoid creating a separate changelog topic and reuse the 
output topic, if logging is enabled. We might do the same thing as for input 
table and only do the merge if topology optimization is enabled, but I guess 
it's TDB.

> Avoid creating changelog topics for state stores that are directly piped to a 
> sink topic
> 
>
> Key: KAFKA-6035
> URL: https://issues.apache.org/jira/browse/KAFKA-6035
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Jeyhun Karimov
>Priority: Major
>
> Today Streams make all state stores to be backed by a changelog topic by 
> default unless users overrides it by {{disableLogging}} when creating the 
> state store / materializing the KTable. However there are a few cases where a 
> separate changelog topic would not be required as we can re-use an existing 
> topic for that. This ticket summarize a specific issue that can be optimized:
> Consider the case when a KTable is materialized and then sent directly into a 
> sink topic with the same key, e.g.
> {code}
> table1 = stream.groupBy(...).aggregate("state1").to("topic2");
> {code}
> Then we do not need to create a {{state1-changelog}} but can just use 
> {{topic2}} as its changelog.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-6035) Avoid creating changelog topics for state stores that are directly piped to a sink topic

2022-03-04 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501214#comment-17501214
 ] 

Bruno Cadonna commented on KAFKA-6035:
--

In general, if you use `withLoggingDisabled` no changelog topic will be created 
for a state store and consequently the local state store cannot be rebuild from 
the changelog topic. Since this ticket is still open and the PR is not merged, 
I assume that this behavior is also valid for state stores right before the 
sink.

> Avoid creating changelog topics for state stores that are directly piped to a 
> sink topic
> 
>
> Key: KAFKA-6035
> URL: https://issues.apache.org/jira/browse/KAFKA-6035
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Jeyhun Karimov
>Priority: Major
>
> Today Streams make all state stores to be backed by a changelog topic by 
> default unless users overrides it by {{disableLogging}} when creating the 
> state store / materializing the KTable. However there are a few cases where a 
> separate changelog topic would not be required as we can re-use an existing 
> topic for that. This ticket summarize a specific issue that can be optimized:
> Consider the case when a KTable is materialized and then sent directly into a 
> sink topic with the same key, e.g.
> {code}
> table1 = stream.groupBy(...).aggregate("state1").to("topic2");
> {code}
> Then we do not need to create a {{state1-changelog}} but can just use 
> {{topic2}} as its changelog.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-6035) Avoid creating changelog topics for state stores that are directly piped to a sink topic

2022-03-03 Thread Mohammad Yousuf Minhaj Zia (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500901#comment-17500901
 ] 

Mohammad Yousuf Minhaj Zia commented on KAFKA-6035:
---

Hey guys, just wanted to confirm. If we use `withLoggingDisabled` right now 
without this PR being merged for the above case, would the output topic be used 
for rebuilding instead? Or is this PR needed to allow that to happen.

> Avoid creating changelog topics for state stores that are directly piped to a 
> sink topic
> 
>
> Key: KAFKA-6035
> URL: https://issues.apache.org/jira/browse/KAFKA-6035
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Jeyhun Karimov
>Priority: Major
>
> Today Streams make all state stores to be backed by a changelog topic by 
> default unless users overrides it by {{disableLogging}} when creating the 
> state store / materializing the KTable. However there are a few cases where a 
> separate changelog topic would not be required as we can re-use an existing 
> topic for that. This ticket summarize a specific issue that can be optimized:
> Consider the case when a KTable is materialized and then sent directly into a 
> sink topic with the same key, e.g.
> {code}
> table1 = stream.groupBy(...).aggregate("state1").to("topic2");
> {code}
> Then we do not need to create a {{state1-changelog}} but can just use 
> {{topic2}} as its changelog.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KAFKA-6035) Avoid creating changelog topics for state stores that are directly piped to a sink topic

2018-01-29 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343820#comment-16343820
 ] 

Matthias J. Sax commented on KAFKA-6035:


Thanks [~Yohan123] for being a team player! Really appreciate it!

> Avoid creating changelog topics for state stores that are directly piped to a 
> sink topic
> 
>
> Key: KAFKA-6035
> URL: https://issues.apache.org/jira/browse/KAFKA-6035
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Jeyhun Karimov
>Priority: Major
>
> Today Streams make all state stores to be backed by a changelog topic by 
> default unless users overrides it by {{disableLogging}} when creating the 
> state store / materializing the KTable. However there are a few cases where a 
> separate changelog topic would not be required as we can re-use an existing 
> topic for that. This ticket summarize a specific issue that can be optimized:
> Consider the case when a KTable is materialized and then sent directly into a 
> sink topic with the same key, e.g.
> {code}
> table1 = stream.groupBy(...).aggregate("state1").to("topic2");
> {code}
> Then we do not need to create a {{state1-changelog}} but can just use 
> {{topic2}} as its changelog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6035) Avoid creating changelog topics for state stores that are directly piped to a sink topic

2018-01-27 Thread Richard Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342323#comment-16342323
 ] 

Richard Yu commented on KAFKA-6035:
---

I got it. I have unassigned the JIRA.

> Avoid creating changelog topics for state stores that are directly piped to a 
> sink topic
> 
>
> Key: KAFKA-6035
> URL: https://issues.apache.org/jira/browse/KAFKA-6035
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Priority: Major
>
> Today Streams make all state stores to be backed by a changelog topic by 
> default unless users overrides it by {{disableLogging}} when creating the 
> state store / materializing the KTable. However there are a few cases where a 
> separate changelog topic would not be required as we can re-use an existing 
> topic for that. This ticket summarize a specific issue that can be optimized:
> Consider the case when a KTable is materialized and then sent directly into a 
> sink topic with the same key, e.g.
> {code}
> table1 = stream.groupBy(...).aggregate("state1").to("topic2");
> {code}
> Then we do not need to create a {{state1-changelog}} but can just use 
> {{topic2}} as its changelog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6035) Avoid creating changelog topics for state stores that are directly piped to a sink topic

2018-01-26 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341648#comment-16341648
 ] 

Matthias J. Sax commented on KAFKA-6035:


It is best practice to assign JIRAs before starting to work on them (it's 
actually ok, to assign if you plan to work on it). If you start working, you 
should update the ticket to "work in progress". If you open a PR, update to 
"patch available". This helps to avoid conflicts.

[~Yohan123]: Did you start to work on a PR already? If not, it might be most 
efficient for the project as a whole, to not repeat the work [~jeyhunkarimov] 
put into the existing PR already.

But I leave it up to you to settle on something.

> Avoid creating changelog topics for state stores that are directly piped to a 
> sink topic
> 
>
> Key: KAFKA-6035
> URL: https://issues.apache.org/jira/browse/KAFKA-6035
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Richard Yu
>Priority: Major
>
> Today Streams make all state stores to be backed by a changelog topic by 
> default unless users overrides it by {{disableLogging}} when creating the 
> state store / materializing the KTable. However there are a few cases where a 
> separate changelog topic would not be required as we can re-use an existing 
> topic for that. This ticket summarize a specific issue that can be optimized:
> Consider the case when a KTable is materialized and then sent directly into a 
> sink topic with the same key, e.g.
> {code}
> table1 = stream.groupBy(...).aggregate("state1").to("topic2");
> {code}
> Then we do not need to create a {{state1-changelog}} but can just use 
> {{topic2}} as its changelog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6035) Avoid creating changelog topics for state stores that are directly piped to a sink topic

2018-01-25 Thread Jeyhun Karimov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339090#comment-16339090
 ] 

Jeyhun Karimov commented on KAFKA-6035:
---

it is already attached to the issue. 

> Avoid creating changelog topics for state stores that are directly piped to a 
> sink topic
> 
>
> Key: KAFKA-6035
> URL: https://issues.apache.org/jira/browse/KAFKA-6035
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Richard Yu
>Priority: Major
>
> Today Streams make all state stores to be backed by a changelog topic by 
> default unless users overrides it by {{disableLogging}} when creating the 
> state store / materializing the KTable. However there are a few cases where a 
> separate changelog topic would not be required as we can re-use an existing 
> topic for that. This ticket summarize a specific issue that can be optimized:
> Consider the case when a KTable is materialized and then sent directly into a 
> sink topic with the same key, e.g.
> {code}
> table1 = stream.groupBy(...).aggregate("state1").to("topic2");
> {code}
> Then we do not need to create a {{state1-changelog}} but can just use 
> {{topic2}} as its changelog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6035) Avoid creating changelog topics for state stores that are directly piped to a sink topic

2018-01-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339071#comment-16339071
 ] 

Ted Yu commented on KAFKA-6035:
---

Can you post the link to the pull request ?

> Avoid creating changelog topics for state stores that are directly piped to a 
> sink topic
> 
>
> Key: KAFKA-6035
> URL: https://issues.apache.org/jira/browse/KAFKA-6035
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Richard Yu
>Priority: Major
>
> Today Streams make all state stores to be backed by a changelog topic by 
> default unless users overrides it by {{disableLogging}} when creating the 
> state store / materializing the KTable. However there are a few cases where a 
> separate changelog topic would not be required as we can re-use an existing 
> topic for that. This ticket summarize a specific issue that can be optimized:
> Consider the case when a KTable is materialized and then sent directly into a 
> sink topic with the same key, e.g.
> {code}
> table1 = stream.groupBy(...).aggregate("state1").to("topic2");
> {code}
> Then we do not need to create a {{state1-changelog}} but can just use 
> {{topic2}} as its changelog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6035) Avoid creating changelog topics for state stores that are directly piped to a sink topic

2018-01-25 Thread Jeyhun Karimov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338981#comment-16338981
 ] 

Jeyhun Karimov commented on KAFKA-6035:
---

[~Yohan123] I don't know why jira didnt't automatically pipe the PR to the 
issue, but there already an open PR on this. 

> Avoid creating changelog topics for state stores that are directly piped to a 
> sink topic
> 
>
> Key: KAFKA-6035
> URL: https://issues.apache.org/jira/browse/KAFKA-6035
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Richard Yu
>Priority: Major
>
> Today Streams make all state stores to be backed by a changelog topic by 
> default unless users overrides it by {{disableLogging}} when creating the 
> state store / materializing the KTable. However there are a few cases where a 
> separate changelog topic would not be required as we can re-use an existing 
> topic for that. This ticket summarize a specific issue that can be optimized:
> Consider the case when a KTable is materialized and then sent directly into a 
> sink topic with the same key, e.g.
> {code}
> table1 = stream.groupBy(...).aggregate("state1").to("topic2");
> {code}
> Then we do not need to create a {{state1-changelog}} but can just use 
> {{topic2}} as its changelog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6035) Avoid creating changelog topics for state stores that are directly piped to a sink topic

2018-01-23 Thread Richard Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336798#comment-16336798
 ] 

Richard Yu commented on KAFKA-6035:
---

[~guozhang] Currently, I am unsure as to which class represents the change log 
because in the java doc of the file \{{KGroupedStream.java}} I found it wrote:

 
{code:java}
/**
Aggregate the values of records in this stream by the grouped key.
 * Records with {@code null} key or value are ignored.
 * Aggregating is a generalization of {@link #reduce(Reducer) combining via 
reduce(...)} as it, for example,
 * allows the result to have a different type than the input values.
 * The result is written into a local {@link KeyValueStore} (which is 
basically an ever-updating materialized view)
 * that can be queried using the provided {@code queryableStoreName}.
 * Furthermore, updates to the store are sent downstream into a {@link 
KTable} changelog stream.
...
**/
KTable aggregrate(final Initializer initializer,
    final Aggregator aggregator,
    final Materialized> 
materialized); 
{code}
In the last line of the above excerpt, it appears KTable was referred to as a 
changelog stream. Could you please help clarrify this for me?

Thanks.

 

> Avoid creating changelog topics for state stores that are directly piped to a 
> sink topic
> 
>
> Key: KAFKA-6035
> URL: https://issues.apache.org/jira/browse/KAFKA-6035
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Richard Yu
>Priority: Major
>
> Today Streams make all state stores to be backed by a changelog topic by 
> default unless users overrides it by {{disableLogging}} when creating the 
> state store / materializing the KTable. However there are a few cases where a 
> separate changelog topic would not be required as we can re-use an existing 
> topic for that. This ticket summarize a specific issue that can be optimized:
> Consider the case when a KTable is materialized and then sent directly into a 
> sink topic with the same key, e.g.
> {code}
> table1 = stream.groupBy(...).aggregate("state1").to("topic2");
> {code}
> Then we do not need to create a {{state1-changelog}} but can just use 
> {{topic2}} as its changelog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6035) Avoid creating changelog topics for state stores that are directly piped to a sink topic

2018-01-23 Thread Richard Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336667#comment-16336667
 ] 

Richard Yu commented on KAFKA-6035:
---

If it is ok, I will be taking this one. Thanks.

> Avoid creating changelog topics for state stores that are directly piped to a 
> sink topic
> 
>
> Key: KAFKA-6035
> URL: https://issues.apache.org/jira/browse/KAFKA-6035
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Richard Yu
>Priority: Major
>
> Today Streams make all state stores to be backed by a changelog topic by 
> default unless users overrides it by {{disableLogging}} when creating the 
> state store / materializing the KTable. However there are a few cases where a 
> separate changelog topic would not be required as we can re-use an existing 
> topic for that. This ticket summarize a specific issue that can be optimized:
> Consider the case when a KTable is materialized and then sent directly into a 
> sink topic with the same key, e.g.
> {code}
> table1 = stream.groupBy(...).aggregate("state1").to("topic2");
> {code}
> Then we do not need to create a {{state1-changelog}} but can just use 
> {{topic2}} as its changelog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)