Hi colleagues,

If I understand samza source code correctly, without migration code we
would not lose offsets but lose TaskName-to-ChangelogPartition mapping.
State restore for TaskStorage starts from ChangelogSystemStreamPartition
beginning (if we don't use data locality, offsets to start from are stored
locally in the file then)
https://github.com/apache/samza/blob/0.12.0/samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala#L161-L161
.

Hence if we lose TaskName-to-ChangelogPartition mapping, or in case of
migration if we migrate from Samza 0.9 to Samza 0.11 or 0.12 without
intermediate migration to Samza 0.10, and if we still have data in our
ChangelogSystemStream, Samza will recreate TaskName-to-ChangelogPartition
mapping and restore state from newly selected
ChangelogSystemStreamPartition. Samza 0.12 sort collection by TaskName
during re-creating of this mapping (
https://github.com/apache/samza/blob/0.12.0/samza-core/src/main/scala/org/apache/samza/coordinator/JobModelManager.scala#L259-L259),
but Samza 0.9 does not (
https://github.com/apache/samza/blob/0.9.1/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala#L142).
Hence, in case of migration from Samza 0.9, we can end up with the wrong
state restored for TaskStorages, because there is no guarantee for
iteration order in Map.

Please, correct me if I'm wrong.

Best regards,
Maxim Logvinenko

On 27 March 2017 at 20:58:24, Navina Ramesh (Apache) (nav...@apache.org)
wrote:

@Jake: Yes. We removed the migration code (for 0.9 to 0.10) in the 0.11
release, I believe.

@XiaoChuan: As per Jagadish's recommendation, if you have changelog backed
stores, you should upgrade from 0.9.1 to 0.10.0 before upgrading to samza
0.12.0.

I checked with LinkedIn's internal release notes. The most significant
change listed is adding a new configuration *job.coordinator.system*. This
system can be the same as your currently configured checkpoint system
(task.checkpoint.system). I am assuming you are using
KafkaCheckpointManagerFactory. If you are using other custom checkpoint
managers, the migration may be more involved. Please let us know and we can
try to help you out.

Feel free to email us if you have more questions.

Cheers!
Navina

On Mon, Mar 27, 2017 at 10:07 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Good observation Jake!
>
> The code for migration was removed in Samza 11. The migration would read
> change-log offsets from the checkpoint topic and write them to the
> coordinator stream.
>
> If you're using change-logged stores, I'd recommend upgrading from 0.9.1
to
> 0.10.0 first.
> Otherwise, you will loose offsets for change-logged stores.
>
> I suspect you should be okay for 0.10.0 to 0.12 upgrade.
>
> On Mon, Mar 27, 2017 at 9:30 AM, Jacob Maes <jacob.m...@gmail.com> wrote:
>
> > As I recall, samza 0.10 introduced the coordinator stream and there was
> > code to do an automatic migration to use that feature. @navina, @yi, do
> you
> > know if that migration code is still in samza 12?
> >
> > If not, then it's probably better to update from 0.9.1 to 0.10.0 and
then
> > to 0.12.0. I don't think there were any changes requiring migration
> between
> > 0.10.and 0.12, so upgrading directly from 0.10 to 0.12 is probably less
> of
> > an issue.
> >
> > On Fri, Mar 24, 2017 at 11:05 PM, Jagadish Venkatraman <
> > jagadish1...@gmail.com> wrote:
> >
> > > Hi Xiaochuan,
> > >
> > > >> Do I need to upgrade Kafka and/or YARN?
> > >
> > > *Yarn version:*
> > >
> > > - Samza 0.12 supports Yarn 2.6.1 and 2.7.1.
> > > - If you already have 2.6.0 installed (as you have said), I believe
> > you
> > > will be fine. (but I'm not sure)
> > >
> > > *Kafka version: *
> > >
> > > - Samza 0.12 upgraded the version of Kafka to 0.10.
> > > - If your Kafka brokers are on an older version of Kafka, you should
> > > upgrade them to use at-least 0.10. Kafka clients are usually
> > > incompatible with older versions of brokers.
> > >
> > > *Java version: *
> > >
> > >
> > >
> > > - Samza 0.12 binaries are compiled using Java 8. Hence, they cannot
> > be
> > > run on older versions of the Java run-time.
> > >
> > >
> > > >> I'm extremely new to Samza in terms of operations aspect. I'm not
> sure
> > > what
> > > information would be relevant in this case so please ask away.
> > >
> > > I'd first start by upgrading the Kafka brokers (assuming you're on
Java
> > 8+
> > > already).
> > > Let us know how the migration goes!
> > >
> > > Thanks,
> > > Jagadish
> > >
> > >
> > > On Fri, Mar 24, 2017 at 8:23 PM, XiaoChuan Yu <xiaochuan...@kik.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > What are the general steps for upgrading Samza from 0.9 to 0.12?
> > > > Do I need to upgrade Kafka and/or YARN?
> > > >
> > > > I don't know how Samza was setup initially but we currently have
the
> > > > following setup:
> > > >
> > > > Samza version: 0.9.1
> > > > YARN version: Hadoop 2.6.0-cdh5.4.8
> > > > Kafka version: 0.9.0.1
> > > >
> > > > I think installation of Kafka and YARN were managed through Puppet.
> > > > I'm extremely new to Samza in terms of operations aspect. I'm not
> sure
> > > what
> > > > information would be relevant in this case so please ask away.
> > > >
> > > > Thanks,
> > > > Xiaochuan Yu
> > > >
> > >
> > >
> > >
> > > --
> > > Jagadish V,
> > > Graduate Student,
> > > Department of Computer Science,
> > > Stanford University
> > >
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>

Reply via email to