[ 
https://issues.apache.org/jira/browse/KAFKA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767880#comment-16767880
 ] 

Murad M edited comment on KAFKA-7930 at 2/14/19 5:14 AM:
---------------------------------------------------------

Historically those topics was auto-created as per GlobalKTable naming pattern. 
Now they are being used for different purposes like cached lookup, which are 
populated from various other places. And they effectively became "input" topics 
represented as GlobalKTable. While there are mechanisms to re-populate them, it 
does not necessarily mean that it has to be done every time application being 
reset. For application reset it is enough to shift offsets to earliest, so that 
GlobalKTable will materialize again from existing topic. It is true that, from 
replay point of view, we would achieve different results, but same is true for 
any other topic with 'cleanup.policy=compact,delete'. Some use-cases:
- static / managed configuration data (routes, flags etc.)
- caches pulled from external systems
- caches based on pushed from external systems
- the other case is sequence of services, where first service builds state 
represented by GlobalKTable from event stream, and number of other applications 
leveraging from built state. So for first application, it is "internal" 
deleteable topic, while for rest applications it is "input" topic, but attempt 
to reset any of services using that topic as "input", ends in loosing topic, so 
all services in sequence has to be reset.

Once we hit that limitation, first option was to get rid of historical naming 
convention, but that is not an option either, as there is no such thing as 
"renaming" topics. It is possible to "copy" data with tool like mirror-maker, 
but that is whole different story. 


was (Author: muradm):
Historically those topics was auto-created as per GlobalKTable naming pattern. 
Now they are being used for different purposes like cached lookup, which are 
populated from various other places. And they effectively became "input" topics 
represented as GlobalKTable. While there are mechanisms to re-populate them, it 
does not necessarily mean that it has to be done every time application being 
reset. For application reset it is enough to shift offsets to earliest, so that 
GlobalKTable will materialize again from existing topic. It is true that, from 
replay point of view, we would achieve different results, but same is true for 
any other topic with 'cleanup.policy=compact,delete'. Some use-cases:
- static / managed configuration data (routes, flags etc.)
- caches pulled from external systems
- caches based on pushed from external systems
- the other case is sequence of services, where first service builds state 
represented by GlobalKTable from event stream, and number of other applications 
leveraging from built state. So for first application, it is "internal" 
deleteable topic, while for rest applications it is "input" topic, but attempt 
to reset any of services using that topic as "input", ends in loosing topic, so 
all services in sequence has to be reset.
Once we hit that limitation, first option was to get rid of historical naming 
convention, but that is not an option either, as there is no such thing as 
"renaming" topics. It is possible to "copy" data with tool like mirror-maker, 
but that is whole different story. 

> StreamsResetter makes "changelog" topic naming assumptions
> ----------------------------------------------------------
>
>                 Key: KAFKA-7930
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7930
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams, tools
>    Affects Versions: 2.1.0
>            Reporter: Murad M
>            Priority: Major
>              Labels: features, needs-kip, patch-available, usability
>
> StreamsResetter deletes the topics considered internal. Currently it just 
> checks the naming as per 
> [code|https://github.com/apache/kafka/blob/1aae604861068bb7337d4972c9dcc0c0a99c374d/core/src/main/scala/kafka/tools/StreamsResetter.java#L660].
>  If assumption is wrong (either topic prefix or suffix), tool becomes useless 
> if aware even dangerous if not. Probably better either:
>  * naming assumption should be optional and supply internal topics with 
> argument (--internal-topics)
>  * deletion could be optional (--no-delete-internal)
>  * ignore topics which are included in list of --input-topics
> Faced this, when was trying to reset applications with GlobalKTable topics 
> named as *-changelog. Such topics sometimes are not desirable for deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to