Hey Guys, Samza originally had support for compiling against multiple versions of YARN, but this caused a lot of problems. We kept running into source compatibility issues between different versions of YARN. Once Hortonworks made the commitment that YARN 2.2+ would be backwards compatible, we eliminated multi-YARN compilation, and just stuck with the latest version.
At LI, we have already upgraded to YARN 2.4.1 (RC) in all of our Samza grids (as of this morning), even though we're running off of the Samza 0.7.0 branch with YARN 2.2 on the client-side. If we see any issues, I'll let you know, but so far, so good. Garry, regarding your comment on the 2.4 upgrade, I believe master is already upgraded as of SAMZA-186. In an effort to stabilize 0.7.0, I opted not to upgrade it to 2.3 or 2.4. We could always bring it up for discussion, but my inclination is to leave 0.7.0 as it is, and focus on the 0.8 release. :) Cheers, Chris On 6/30/14 12:39 PM, "Garry Turkington" <[email protected]> wrote: >Returning to this topic, also salient given the recent Jiras re binary >releases: > >Not sure of the best approach but considering version conflicts with YARN >are likely to be the most common impediment for new users trying to >integrate with existing infrastructure I think we need an approach. > >Especially as I just went and looked at the YARN versions in the current >releases of the big 3 distros. CDH5 is on 2.4 while HDP2.1 and MapR4 are >on YARN 2.3. So anyone trying to push a job to an existing cluster is >likely to find it go bang. And considering the increasing arms race >between the distros to get stuff out quicker and quicker it's a problem >that won't likely fix itself. > >I'd suggest we look to upgrade master to 2.3 as a new baseline though I'm >not sure if that will work on the 2.4 based CDH or not, needs tested. >Beyond that I'm uncertain how to proceed as the obvious options of (a) >say "tough, use a dedicated grid", (b) pick a version and if your distro >isn't compatible then sorry or (c) try and munge multi-version support >all feel fugly. > >Thoughts? >Garry > >-----Original Message----- >From: Jakob Homan [mailto:[email protected]] >Sent: 25 June 2014 21:18 >To: [email protected] >Subject: Re: Do we want to provide different hadoop versions for 0.7.0 >release? > >Is Spark doing this by only supporting the intersection of available >APIs, or via some type of munging? We did the latter in Giraph and it >was a nightmare... > > >On Wed, Jun 25, 2014 at 11:40 AM, Yan Fang <[email protected]> wrote: > >> Hi guys, >> >> I am thinking of this because of Dotan's email (thanks, Dotan) . >> Currently people are using different versions of hadoop. They will >> definitely have problem if their hadoop server has different version >> from what Samza is complied. That hurts user experience, no matter he >>is a veteran or newbie. >> >> In Spark, they provide a way to configure different hadoop version >> during compiling in their latest and previous release: >> http://spark.apache.org/docs/latest/building-with-maven.html >> http://spark.apache.org/docs/0.9.0/ >> >> Maybe we should consider this as an add-on for our 0.7.0 release too. >> Since we already are able to switch scala version, it should not have >> technical difficulty. The risk may come from Samza is not extensively >> tested in other hadoop versions. If the risk is the concern, at least, >> we can provide simple instruction to help user build the Samza with >>other hadoop versions. >> >> What do you think? >> >> Thanks, >> >> Fang, Yan >> [email protected] >> +1 (206) 849-4108 >> > >----- >No virus found in this message. >Checked by AVG - www.avg.com >Version: 2014.0.4592 / Virus Database: 3986/7742 - Release Date: 06/25/14
