[ 
https://issues.apache.org/jira/browse/SAMZA-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115297#comment-14115297
 ] 

Chris Riccomini commented on SAMZA-354:
---------------------------------------

One thing that might be nice with this tool is to leave it consuming from the 
0.7 checkpoint topic even after it's read up to the "latest" offset, rather 
than having it shut itself down once it's caught up. This would allow the 
checkpoint migration to be somewhat decoupled from when a Samza job is upgraded 
to 0.8, since you can just leave the tool running, and then upgrade the Samza 
job to 0.8 at any point after that. If the tool were to be a one-time tool that 
migrates up to the latest 0.7 checkpoint, and then shuts down, you'd have to do 
these steps:

# Shutdown 0.7 Samza job.
# Run one-time migration tool.
# Bring up 0.8 Samza job.

If the tool were allowed to continue running, you could do:

# Run continuous migration tool.
# Shutdown 0.7 Samza job.
# Bring up 0.8 Samza job.
# Shutdown continuous migration tool.

It'd also be nice if this tool could do multiple checkpoint topics at once. In 
a shared grid, this would allow us to run a single instance of the checkpoint 
migration tool for all 0.7 checkpoint topics, and then when we've verified that 
all Samza jobs have upgraded to 0.8, we could stop the tool. This is just a 
convenience, since we could obviously run one instance of the tool for every 
topic, but it could make life a bit easier.

One thing that I'm not sure of here is how to handle the TaskName to Changelog 
partition mapping messages for Samza jobs that have state. I *think* that the 
Samza job should take care of this itself, if no changelog partition mapping 
exists when it starts, but we should verify this.

> Write tool to convert old-style checkpoint log to post-SAMZA-123 format
> -----------------------------------------------------------------------
>
>                 Key: SAMZA-354
>                 URL: https://issues.apache.org/jira/browse/SAMZA-354
>             Project: Samza
>          Issue Type: Task
>    Affects Versions: 0.8.0
>            Reporter: Jakob Homan
>            Assignee: David Chen
>
> After SAMZA-123, the checkpoint log has a new format (keyed entries 
> interspersed with statelog-partition mapping) and a new name.  It would be 
> simple to write a tool that would consume an old-style log and write out a 
> new-style log, using the GroupByPartition strategy.  This would allow 
> existing jobs to not lose checkpointing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to