[
https://issues.apache.org/jira/browse/SAMZA-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115297#comment-14115297
]
Chris Riccomini commented on SAMZA-354:
---------------------------------------
One thing that might be nice with this tool is to leave it consuming from the
0.7 checkpoint topic even after it's read up to the "latest" offset, rather
than having it shut itself down once it's caught up. This would allow the
checkpoint migration to be somewhat decoupled from when a Samza job is upgraded
to 0.8, since you can just leave the tool running, and then upgrade the Samza
job to 0.8 at any point after that. If the tool were to be a one-time tool that
migrates up to the latest 0.7 checkpoint, and then shuts down, you'd have to do
these steps:
# Shutdown 0.7 Samza job.
# Run one-time migration tool.
# Bring up 0.8 Samza job.
If the tool were allowed to continue running, you could do:
# Run continuous migration tool.
# Shutdown 0.7 Samza job.
# Bring up 0.8 Samza job.
# Shutdown continuous migration tool.
It'd also be nice if this tool could do multiple checkpoint topics at once. In
a shared grid, this would allow us to run a single instance of the checkpoint
migration tool for all 0.7 checkpoint topics, and then when we've verified that
all Samza jobs have upgraded to 0.8, we could stop the tool. This is just a
convenience, since we could obviously run one instance of the tool for every
topic, but it could make life a bit easier.
One thing that I'm not sure of here is how to handle the TaskName to Changelog
partition mapping messages for Samza jobs that have state. I *think* that the
Samza job should take care of this itself, if no changelog partition mapping
exists when it starts, but we should verify this.
> Write tool to convert old-style checkpoint log to post-SAMZA-123 format
> -----------------------------------------------------------------------
>
> Key: SAMZA-354
> URL: https://issues.apache.org/jira/browse/SAMZA-354
> Project: Samza
> Issue Type: Task
> Affects Versions: 0.8.0
> Reporter: Jakob Homan
> Assignee: David Chen
>
> After SAMZA-123, the checkpoint log has a new format (keyed entries
> interspersed with statelog-partition mapping) and a new name. It would be
> simple to write a tool that would consume an old-style log and write out a
> new-style log, using the GroupByPartition strategy. This would allow
> existing jobs to not lose checkpointing.
--
This message was sent by Atlassian JIRA
(v6.2#6252)