On 19 Nov 2015, at 22:14, Reynold Xin <r...@databricks.com<mailto:r...@databricks.com>> wrote:
I proposed dropping support for Hadoop 1.x in the Spark 2.0 email, and I think everybody is for that. https://issues.apache.org/jira/browse/SPARK-11807 Sean suggested also dropping support for Hadoop 2.2, 2.3, and 2.4. That is to say, keep only Hadoop 2.6 and greater. What are the community's thoughts on that? +1 It's the common APIs under pretty much shipping; EMR, CDH & HDP, and there's no significant API changes between it and 2.7. [There's a couple of extra records in job submissions in 2.7 which you can get at with reflection for AM failure reset window and rolling log capture patterns]. It's also getting some ongoing maintenance (2.6.3 being planned for dec). It's not perfect; if I were to list troublespots to me they are : s3a isn't ready for use; there's better logging and tracing in later versions. But those aren't at the API level.