[
https://issues.apache.org/jira/browse/KAFKA-19935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sanghyeok An updated KAFKA-19935:
---------------------------------
Description:
If a developer does not explicitly specify a name for a {{{}StateStore{}}}, its
name is generated using an incremental number. Consequently, the corresponding
Changelog topic is created using this same generated name.
Let's assume a scenario where the application evolves and a new Source Node is
added. Since the new Source Node is typically built at the beginning of the
topology, the incremental numbers for all subsequent nodes shift by 1. As a
result, the names of the {{{}StateStore{}}}s change as well.
For example, the change might look like this:
* *Previous:* {{MY-APP-KSTREAM-AGGREGATE-STATE-STORE-0000000001-changelog}}
* *New:* {{MY-APP-KSTREAM-AGGREGATE-STATE-STORE-0000000002-changelog}}
This shift can lead to significant operational issues. Simply replaying the
Source Topic is often insufficient to make the new {{StateStore}} identical to
the previous one.
There are several scenarios where this is impossible. For instance, if the
application only processes messages after the last commit(most case) and the
data includes keys that appear infrequently, the state will never match.
(Consider a restaurant with very few orders compared to one with many; the
"sparse" orders might have existed in the previous StateStore but will likely
be missing in the new one after the shift).
While this might be dismissed as a minor issue by some, it can be a critical
problem for organizations where data consistency is paramount.
A clear solution to this problem is using {*}Named StateStores{*}. However,
some users may not be aware of this feature or may not feel the need for it.
Furthermore, they might not realize that the IDs of the StateStore and
Changelog topic have incremented, leading to the unintentional creation of a
new store and potential data loss.
To address this, I propose introducing a {{TopologyValidator}} as a utility
class. Ideally, the usage would look something like this:
{code:java}
TopologyValidator.of(prevTopology, newTopology).diff(); {code}
* When {{.diff()}} is called, the {{TopologyValidator}} would identify changes
in the topology and issue a warning if {{StateStore}} IDs have shifted.
* The {{TopologyValidator}} could be included in the Kafka Streams application
code or integrated into CI pipelines. Since it essentially requires only the
code to build the topology, it does not need a connection to an actual broker.
Therefore, we can verify the diff using just the previous and new topology
definitions.
By introducing {{TopologyValidator}} under the {{streams/test-utils}} module
and encouraging users to utilize it, I believe we can promote greater stability
from the perspective of {*}Topology Evolution Compatibility{*}.
Please give your opinion.
was:
If a developer does not explicitly specify a name for a {{{}StateStore{}}}, its
name is generated using an incremental number. Consequently, the corresponding
Changelog topic is created using this same generated name.
Let's assume a scenario where the application evolves and a new Source Node is
added. Since the new Source Node is typically built at the beginning of the
topology, the incremental numbers for all subsequent nodes shift by 1. As a
result, the names of the {{{}StateStore{}}}s change as well.
For example, the change might look like this:
* *Previous:* {{MY-APP-KSTREAM-AGGREGATE-STATE-STORE-0000000001-changelog}}
* *New:* {{MY-APP-KSTREAM-AGGREGATE-STATE-STORE-0000000002-changelog}}
This shift can lead to significant operational issues. Simply replaying the
Source Topic is often insufficient to make the new {{StateStore}} identical to
the previous one.
There are several scenarios where this is impossible. For instance, if the
application only processes messages after the last commit(most case) and the
data includes keys that appear infrequently, the state will never match.
(Consider a restaurant with very few orders compared to one with many; the
"sparse" orders might have existed in the previous StateStore but will likely
be missing in the new one after the shift).
While this might be dismissed as a minor issue by some, it can be a critical
problem for organizations where data consistency is paramount.
A clear solution to this problem is using {*}Named StateStores{*}. However,
some users may not be aware of this feature or may not feel the need for it.
Furthermore, they might not realize that the IDs of the StateStore and
Changelog topic have incremented, leading to the unintentional creation of a
new store and potential data loss.
To address this, I propose introducing a {{TopologyValidator}} as a utility
class. Ideally, the usage would look something like this:
{code:java}
TopologyValidator.of(prevTopology, newTopology).diff(); {code}
* When {{.diff()}} is called, the {{TopologyValidator}} would identify
changes in the topology and issue a warning if {{StateStore}} IDs have shifted.
* The {{TopologyValidator}} could be included in the Kafka Streams application
code or integrated into CI pipelines. Since it essentially requires only the
code to build the topology, it does not need a connection to an actual broker.
Therefore, we can verify the diff using just the previous and new topology
definitions.
By introducing {{TopologyValidator}} under the {{streams/test-utils}} module
and encouraging users to utilize it, I believe we can promote greater stability
from the perspective of {*}Topology Evolution Compatibility{*}.
Please give your opinion.
> Introduce TopologyValidator Utils for improving topology evolution
> comptiability.
> ---------------------------------------------------------------------------------
>
> Key: KAFKA-19935
> URL: https://issues.apache.org/jira/browse/KAFKA-19935
> Project: Kafka
> Issue Type: Improvement
> Components: streams, streams-test-utils
> Reporter: sanghyeok An
> Assignee: sanghyeok An
> Priority: Minor
>
> If a developer does not explicitly specify a name for a {{{}StateStore{}}},
> its name is generated using an incremental number. Consequently, the
> corresponding Changelog topic is created using this same generated name.
> Let's assume a scenario where the application evolves and a new Source Node
> is added. Since the new Source Node is typically built at the beginning of
> the topology, the incremental numbers for all subsequent nodes shift by 1. As
> a result, the names of the {{{}StateStore{}}}s change as well.
> For example, the change might look like this:
> * *Previous:* {{MY-APP-KSTREAM-AGGREGATE-STATE-STORE-0000000001-changelog}}
> * *New:* {{MY-APP-KSTREAM-AGGREGATE-STATE-STORE-0000000002-changelog}}
>
> This shift can lead to significant operational issues. Simply replaying the
> Source Topic is often insufficient to make the new {{StateStore}} identical
> to the previous one.
>
> There are several scenarios where this is impossible. For instance, if the
> application only processes messages after the last commit(most case) and the
> data includes keys that appear infrequently, the state will never match.
> (Consider a restaurant with very few orders compared to one with many; the
> "sparse" orders might have existed in the previous StateStore but will likely
> be missing in the new one after the shift).
> While this might be dismissed as a minor issue by some, it can be a critical
> problem for organizations where data consistency is paramount.
>
> A clear solution to this problem is using {*}Named StateStores{*}. However,
> some users may not be aware of this feature or may not feel the need for it.
> Furthermore, they might not realize that the IDs of the StateStore and
> Changelog topic have incremented, leading to the unintentional creation of a
> new store and potential data loss.
> To address this, I propose introducing a {{TopologyValidator}} as a utility
> class. Ideally, the usage would look something like this:
> {code:java}
> TopologyValidator.of(prevTopology, newTopology).diff(); {code}
> * When {{.diff()}} is called, the {{TopologyValidator}} would identify
> changes in the topology and issue a warning if {{StateStore}} IDs have
> shifted.
> * The {{TopologyValidator}} could be included in the Kafka Streams
> application code or integrated into CI pipelines. Since it essentially
> requires only the code to build the topology, it does not need a connection
> to an actual broker. Therefore, we can verify the diff using just the
> previous and new topology definitions.
>
> By introducing {{TopologyValidator}} under the {{streams/test-utils}} module
> and encouraging users to utilize it, I believe we can promote greater
> stability from the perspective of {*}Topology Evolution Compatibility{*}.
>
> Please give your opinion.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)