[ 
https://issues.apache.org/jira/browse/KAFKA-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159461#comment-16159461
 ] 

Randall Hauch commented on KAFKA-4107:
--------------------------------------

Not sure if anyone had any thoughts on how this might work, but the challenge 
is that source connectors can define partitions and offsets as maps with any 
key/value pairs. Yes, we could make a fairly complex tool that could read and 
apply some transformation to an existing offset, but would it be sufficient to 
have a simpler tool that could:

* output the array of existing partitions-offsets pairs as JSON (to standard 
out or to a file?)
* read (from standard in or a file?) a JSON document with an array of 
partitions & offset pairs that should be written as-is to the offsets topic. A 
partition-offset pair with a null offset doc could be used to "remove" the 
existing offset.

Main options:
* --group (required): the group identifier of the worker cluster
* --bootstrap-server (required): the address of the initial brokers to connect
* --topic (required): the name of the offset topic

Export options:
* --export (required): used to specify that the parition-offset pairs are to be 
read from the topic and exported to a JSON document/array
* --to-file (optional): the name of the file where the JSON document/array is 
to be written; if not provided, it would be written to standard output.

Update options:
* --update (required): used to specify that the specified partition-offset 
pairs are to be written to the specified topic.
* --from-file (optional): the name of the file where the JSON document/array is 
to be read; if not provided, it would be read from standard input.
* --dry-run (optional): used to signal that the tool should output what it 
would change, but should not actually change anything

For example, the following would export the current source partition-offset 
pairs:
{code}
bin/kafka-connect-source-offset-reset.sh --export --group=my-group 
--boostrap-server=localhost:9092 --topic=offset-topic --to-file=my-offsets.json
{code}

The user can then edit the file as needed, including changing to null any of 
the offset doc values that are to be removed. To apply the changes, the user 
would then run the following command to read in the file and update source 
partition-offset pairs in the topic:
{code}
bin/kafka-connect-source-offset-reset.sh --update --group=my-group 
--boostrap-server=localhost:9092 --topic=offset-topic 
--from-file=my-offsets.json
{code}

This tool would only work if the messages in the Kafka Connect offset topic 
were serialized with the JSON converter (corresponding to the 
`internal.key.converter` and `internal.value.converter`).

> Support offset reset capability in Kafka Connect
> ------------------------------------------------
>
>                 Key: KAFKA-4107
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4107
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: Jason Gustafson
>
> It would be useful in some cases to be able to reset connector offsets. For 
> example, if a topic in Kafka corresponding to a source database is 
> accidentally deleted (or deleted because of corrupt data), an administrator 
> may want to reset offsets and reproduce the log from the beginning. It may 
> also be useful to have support for overriding offsets, but that seems like a 
> less likely use case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to