Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-13 Thread Patrick Wendell
Yeah so Steve, hopefully it's self evident, but that is a perfect example of the kind of annoying stuff we don't want to force users to deal with by forcing an upgrade to 2.X. Compare the pain from Spark users of trying to reason about what to do (and btw it seems like the answer is simply that

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-13 Thread Steve Loughran
On 12 Jun 2015, at 17:12, Patrick Wendell pwend...@gmail.com wrote: For instance at Databricks we use the FileSystem library for talking to S3... every time we've tried to upgrade to Hadoop 2.X there have been significant regressions in performance and we've had to downgrade. That's

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Steve Loughran
+1 for 2.2+ Not only are the APis in Hadoop 2 better, there's more people testing Hadoop 2.x spark, and bugs in Hadoop itself being fixed. (usual disclaimers, I work off branch-2.7 snapshots I build nightly, etc) On 12 Jun 2015, at 11:09, Sean Owen so...@cloudera.com wrote: How does the

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Patrick Wendell
I feel this is quite different from the Java 6 decision and personally I don't see sufficient cause to do it. I would like to understand though Sean - what is the proposal exactly? Hadoop 2 itself supports all of the Hadoop 1 API's, so things like removing the Hadoop 1 variant of sc.hadoopFile,

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Ram Sriharsha
+1 for Hadoop 2.2+ On Fri, Jun 12, 2015 at 8:45 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I'm personally in favor, but I don't have a sense of how many people still rely on Hadoop 1. Nick 2015년 6월 12일 (금) 오전 9:13, Steve Loughran ste...@hortonworks.com님이 작성: +1 for 2.2+

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Shivaram Venkataraman
My 2 cents: The biggest reason from my view for keeping Hadoop 1 support was that our EC2 scripts which launch an environment for benchmarking / testing / research only supported Hadoop 1 variants till very recently. We did add Hadoop 2.4 support a few weeks back but that it is still not the

Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Sean Owen
How does the idea of removing support for Hadoop 1.x for Spark 1.5 strike everyone? Really, I mean, Hadoop 2.2, as 2.2 seems to me more consistent with the modern 2.x line than 2.1 or 2.0. The arguments against are simply, well, someone out there might be using these versions. The arguments for

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Nicholas Chammas
I'm personally in favor, but I don't have a sense of how many people still rely on Hadoop 1. Nick 2015년 6월 12일 (금) 오전 9:13, Steve Loughran ste...@hortonworks.com님이 작성: +1 for 2.2+ Not only are the APis in Hadoop 2 better, there's more people testing Hadoop 2.x spark, and bugs in Hadoop

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Sean Owen
On Fri, Jun 12, 2015 at 5:12 PM, Patrick Wendell pwend...@gmail.com wrote: I would like to understand though Sean - what is the proposal exactly? Hadoop 2 itself supports all of the Hadoop 1 API's, so things like removing the Hadoop 1 variant of sc.hadoopFile, etc, I don't think Not entirely;

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Sean Owen
I don't imagine that can be guaranteed to be supported anyway... the 0.x branch has never necessarily worked with Spark, even if it might happen to. Is this really something you would veto for everyone because of your deployment? On Fri, Jun 12, 2015 at 7:18 PM, Thomas Dudziak tom...@gmail.com

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Matei Zaharia
I don't like the idea of removing Hadoop 1 unless it becomes a significant maintenance burden, which I don't think it is. You'll always be surprised how many people use old software, even though various companies may no longer support them. With Hadoop 2 in particular, I may be misremembering,

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Thomas Dudziak
0.23 (and hive 0.12) code base in Spark works well from our perspective, so not sure what you are referring to. As I said, I'm happy to maintain my own plugins but as it stands there is no sane way to do so in Spark because there is no clear separation/developer APIs for these. cheers, Tom On

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Thomas Dudziak
-1 to this, we use it with an old Hadoop version (well, a fork of an old version, 0.23). That being said, if there were a nice developer api that separates Spark from Hadoop (or rather, two APIs, one for scheduling and one for HDFS), then we'd be happy to maintain our own plugins for those.