Re: [discuss] dropping Hadoop 2.2 and 2.3 support in Spark 2.0?

Reynold Xin Thu, 14 Jan 2016 14:48:52 -0800

Thanks for chiming in. Note that an organization's agility in Spark
upgrades can be very different from Hadoop upgrades.


For many orgs, Hadoop is responsible for cluster resource scheduling (YARN)
and data storage (HDFS). These two are notorious difficult to upgrade. It
is all or nothing for a cluster. (You can't have a subset of the nodes
running Hadoop 2.2 and the other subset running Hadoop 2.6). For Spark, it
is a very different story. It is pretty easy to run multiple different
versions of Spark in different applications, even though they are all
running in a single cluster.

As a result, you might see a lot of orgs with really old Hadoop versions
and yet are willing to upgrade to Spark 2.x.





On Thu, Jan 14, 2016 at 11:26 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
> > On 14 Jan 2016, at 09:28, Steve Loughran <ste...@hortonworks.com> wrote:
> >>
> >
> > 2.6.x is still having active releases, likely through 2016. It'll be the
> only hadoop version where problems Spark encounters would get fixed
>
> Correction: minimum Hadoop version
>
> Any problem reported against older versions will probably get a message
> saying "upgrade"
>
> >
> > It's also the last iteration of interesting API features —especially in
> YARN: timeline server, registry, various other things
> >
> > And it has s3a, which, for anyone using S3 for storage, is the only S3
> filesystem binding I'd recommend. Hadoop 2.4 not only has s3n, it's got a
> broken one that (HADOOP-10589)
> >
> > I believe 2.6 supportsr recent guava versions, even if it is frozen on
> 11.0 to avoid surprising people (i.e. all deprecated/removed classes should
> have been stripped)
> >
> > Finally: it's the only version of Hadoop that works on Java 7, has
> patches to support Java8+kerberos (in fact, Java 7u80+ and kerberos).
> >
> > For the reason of JVMs and guava alone, I'd abandon Hadoop < 2.6. Those
> versions won't work on secure Java 7 clusters, recent guava versions, and
> have lots of uncorrected issues.
> >
> > Oh, and did I mention the test matrix? The later version of Hadoop you
> use, the less versions to test against.
> >
> >> My general position is that backwards-compatibility and supporting
> >> older platforms needs to be a low priority in a major release; it's a
> >> decision about what to support for users in the next couple years, not
> >> the preceding couple years. Users on older technologies simply stay on
> >> the older Spark until ready to update; they are in no sense suddenly
> >> left behind otherwise.
> >
> >
> > If they are running older versions of Hadoop, they generally have stable
> apps which they don't bother upgrading. New clusters => new versions => new
> apps.
> >
> >
> >
> B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB� �
> [��X��ܚX�K  K[XZ[ �  ]�][��X��ܚX�P � \�˘\ X� K�ܙ�B��܈ Y  ] [ۘ[  ��[X[� �
> K[XZ[ �  ]�Z [   � \�˘\ X� K�ܙ�B�
>
> I have no idea what this is or why it made it to the tail of my email.
> Maybe outlook has changed its signature for me.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: [discuss] dropping Hadoop 2.2 and 2.3 support in Spark 2.0?

Reply via email to