Hi colleagues, If I understand samza source code correctly, without migration code we would not lose offsets but lose TaskName-to-ChangelogPartition mapping. State restore for TaskStorage starts from ChangelogSystemStreamPartition beginning (if we don't use data locality, offsets to start from are stored locally in the file then) https://github.com/apache/samza/blob/0.12.0/samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala#L161-L161 .
Hence if we lose TaskName-to-ChangelogPartition mapping, or in case of migration if we migrate from Samza 0.9 to Samza 0.11 or 0.12 without intermediate migration to Samza 0.10, and if we still have data in our ChangelogSystemStream, Samza will recreate TaskName-to-ChangelogPartition mapping and restore state from newly selected ChangelogSystemStreamPartition. Samza 0.12 sort collection by TaskName during re-creating of this mapping ( https://github.com/apache/samza/blob/0.12.0/samza-core/src/main/scala/org/apache/samza/coordinator/JobModelManager.scala#L259-L259), but Samza 0.9 does not ( https://github.com/apache/samza/blob/0.9.1/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala#L142). Hence, in case of migration from Samza 0.9, we can end up with the wrong state restored for TaskStorages, because there is no guarantee for iteration order in Map. Please, correct me if I'm wrong. Best regards, Maxim Logvinenko On 27 March 2017 at 20:58:24, Navina Ramesh (Apache) (nav...@apache.org) wrote: @Jake: Yes. We removed the migration code (for 0.9 to 0.10) in the 0.11 release, I believe. @XiaoChuan: As per Jagadish's recommendation, if you have changelog backed stores, you should upgrade from 0.9.1 to 0.10.0 before upgrading to samza 0.12.0. I checked with LinkedIn's internal release notes. The most significant change listed is adding a new configuration *job.coordinator.system*. This system can be the same as your currently configured checkpoint system (task.checkpoint.system). I am assuming you are using KafkaCheckpointManagerFactory. If you are using other custom checkpoint managers, the migration may be more involved. Please let us know and we can try to help you out. Feel free to email us if you have more questions. Cheers! Navina On Mon, Mar 27, 2017 at 10:07 AM, Jagadish Venkatraman < jagadish1...@gmail.com> wrote: > Good observation Jake! > > The code for migration was removed in Samza 11. The migration would read > change-log offsets from the checkpoint topic and write them to the > coordinator stream. > > If you're using change-logged stores, I'd recommend upgrading from 0.9.1 to > 0.10.0 first. > Otherwise, you will loose offsets for change-logged stores. > > I suspect you should be okay for 0.10.0 to 0.12 upgrade. > > On Mon, Mar 27, 2017 at 9:30 AM, Jacob Maes <jacob.m...@gmail.com> wrote: > > > As I recall, samza 0.10 introduced the coordinator stream and there was > > code to do an automatic migration to use that feature. @navina, @yi, do > you > > know if that migration code is still in samza 12? > > > > If not, then it's probably better to update from 0.9.1 to 0.10.0 and then > > to 0.12.0. I don't think there were any changes requiring migration > between > > 0.10.and 0.12, so upgrading directly from 0.10 to 0.12 is probably less > of > > an issue. > > > > On Fri, Mar 24, 2017 at 11:05 PM, Jagadish Venkatraman < > > jagadish1...@gmail.com> wrote: > > > > > Hi Xiaochuan, > > > > > > >> Do I need to upgrade Kafka and/or YARN? > > > > > > *Yarn version:* > > > > > > - Samza 0.12 supports Yarn 2.6.1 and 2.7.1. > > > - If you already have 2.6.0 installed (as you have said), I believe > > you > > > will be fine. (but I'm not sure) > > > > > > *Kafka version: * > > > > > > - Samza 0.12 upgraded the version of Kafka to 0.10. > > > - If your Kafka brokers are on an older version of Kafka, you should > > > upgrade them to use at-least 0.10. Kafka clients are usually > > > incompatible with older versions of brokers. > > > > > > *Java version: * > > > > > > > > > > > > - Samza 0.12 binaries are compiled using Java 8. Hence, they cannot > > be > > > run on older versions of the Java run-time. > > > > > > > > > >> I'm extremely new to Samza in terms of operations aspect. I'm not > sure > > > what > > > information would be relevant in this case so please ask away. > > > > > > I'd first start by upgrading the Kafka brokers (assuming you're on Java > > 8+ > > > already). > > > Let us know how the migration goes! > > > > > > Thanks, > > > Jagadish > > > > > > > > > On Fri, Mar 24, 2017 at 8:23 PM, XiaoChuan Yu <xiaochuan...@kik.com> > > > wrote: > > > > > > > Hi, > > > > > > > > What are the general steps for upgrading Samza from 0.9 to 0.12? > > > > Do I need to upgrade Kafka and/or YARN? > > > > > > > > I don't know how Samza was setup initially but we currently have the > > > > following setup: > > > > > > > > Samza version: 0.9.1 > > > > YARN version: Hadoop 2.6.0-cdh5.4.8 > > > > Kafka version: 0.9.0.1 > > > > > > > > I think installation of Kafka and YARN were managed through Puppet. > > > > I'm extremely new to Samza in terms of operations aspect. I'm not > sure > > > what > > > > information would be relevant in this case so please ask away. > > > > > > > > Thanks, > > > > Xiaochuan Yu > > > > > > > > > > > > > > > > -- > > > Jagadish V, > > > Graduate Student, > > > Department of Computer Science, > > > Stanford University > > > > > > > > > -- > Jagadish V, > Graduate Student, > Department of Computer Science, > Stanford University >