Yeah so Steve, hopefully it's self evident, but that is a perfect
example of the kind of annoying stuff we don't want to force users to
deal with by forcing an upgrade to 2.X. Compare the pain from Spark
users of trying to reason about what to do (and btw it seems like the
answer is simply that
On 12 Jun 2015, at 17:12, Patrick Wendell pwend...@gmail.com wrote:
For instance at Databricks we use
the FileSystem library for talking to S3... every time we've tried to
upgrade to Hadoop 2.X there have been significant regressions in
performance and we've had to downgrade. That's
+1 for 2.2+
Not only are the APis in Hadoop 2 better, there's more people testing Hadoop
2.x spark, and bugs in Hadoop itself being fixed.
(usual disclaimers, I work off branch-2.7 snapshots I build nightly, etc)
On 12 Jun 2015, at 11:09, Sean Owen so...@cloudera.com wrote:
How does the
I feel this is quite different from the Java 6 decision and personally
I don't see sufficient cause to do it.
I would like to understand though Sean - what is the proposal exactly?
Hadoop 2 itself supports all of the Hadoop 1 API's, so things like
removing the Hadoop 1 variant of sc.hadoopFile,
+1 for Hadoop 2.2+
On Fri, Jun 12, 2015 at 8:45 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
I'm personally in favor, but I don't have a sense of how many people still
rely on Hadoop 1.
Nick
2015년 6월 12일 (금) 오전 9:13, Steve Loughran
ste...@hortonworks.com님이 작성:
+1 for 2.2+
My 2 cents: The biggest reason from my view for keeping Hadoop 1 support
was that our EC2 scripts which launch an environment for benchmarking /
testing / research only supported Hadoop 1 variants till very recently. We
did add Hadoop 2.4 support a few weeks back but that it is still not the
How does the idea of removing support for Hadoop 1.x for Spark 1.5
strike everyone? Really, I mean, Hadoop 2.2, as 2.2 seems to me more
consistent with the modern 2.x line than 2.1 or 2.0.
The arguments against are simply, well, someone out there might be
using these versions.
The arguments for
I'm personally in favor, but I don't have a sense of how many people still
rely on Hadoop 1.
Nick
2015년 6월 12일 (금) 오전 9:13, Steve Loughran
ste...@hortonworks.com님이 작성:
+1 for 2.2+
Not only are the APis in Hadoop 2 better, there's more people testing
Hadoop 2.x spark, and bugs in Hadoop
On Fri, Jun 12, 2015 at 5:12 PM, Patrick Wendell pwend...@gmail.com wrote:
I would like to understand though Sean - what is the proposal exactly?
Hadoop 2 itself supports all of the Hadoop 1 API's, so things like
removing the Hadoop 1 variant of sc.hadoopFile, etc, I don't think
Not entirely;
I don't imagine that can be guaranteed to be supported anyway... the
0.x branch has never necessarily worked with Spark, even if it might
happen to. Is this really something you would veto for everyone
because of your deployment?
On Fri, Jun 12, 2015 at 7:18 PM, Thomas Dudziak tom...@gmail.com
I don't like the idea of removing Hadoop 1 unless it becomes a significant
maintenance burden, which I don't think it is. You'll always be surprised how
many people use old software, even though various companies may no longer
support them.
With Hadoop 2 in particular, I may be misremembering,
0.23 (and hive 0.12) code base in Spark works well from our perspective, so
not sure what you are referring to. As I said, I'm happy to maintain my own
plugins but as it stands there is no sane way to do so in Spark because
there is no clear separation/developer APIs for these.
cheers,
Tom
On
-1 to this, we use it with an old Hadoop version (well, a fork of an old
version, 0.23). That being said, if there were a nice developer api that
separates Spark from Hadoop (or rather, two APIs, one for scheduling and
one for HDFS), then we'd be happy to maintain our own plugins for those.
13 matches
Mail list logo