Sean Busbey wrote:
On Wed, Oct 8, 2014 at 11:12 AM, Christopher<[email protected]> wrote:
On Wed, Oct 8, 2014 at 9:47 AM, Sean Busbey<[email protected]> wrote:
On Wed, Oct 8, 2014 at 12:13 AM, Josh Elser<[email protected]>
wrote:
Forgot one:
*Drop Hadoop 1 support*
- We would no longer care about maintaining Hadoop 1 APIs (get rid of
crappy reflection)
- 2.2.0 (Hadoop 2 "stable") came out just under 1 year ago
- Can be done for 1.7 or reconsidered for 2.0
Do we already know if focusing on just Hadoop 2.2.0+ support will result
in
any API impact?
I can't imagine it will impact our client API much, if any, but it'll
certainly help simplify and fix bugs in our use of Hadoop (when they
appear), and may result in fewer compatibility issues in MapReduce and
elsewhere (dynamic class loading, reflection workarounds). It should also
allow us to provide some guarantees in durability-related features (because
we'll know they exist). It should also help simplify documentation and
example configs, as well as reduce testing burdens.
+1 to dropping Hadoop<2.2.0 in 1.7.0.
If it doesn't impact API or usage I'd also be +1. If it ends up leaking out
somewhere, I'd want to take the opportunity of 1.7.0 to flag deprecations
so the transition can be made less severe (and remove in 2.0).
Frankly, given the general expectations on durability we formed in 1.4 I'd
say it's dangerous for people to be running 1.5+ on Hadoop 1.
Yeah, that's definitely what has to be weighed. Are we so aggressive to
prevent users from even running on Hadoop 1 because it is inherently
missing some of the sync mechanics needed to ensure no data loss
regardless of redundant power?
We warn users when they don't have the necessary options set at the HDFS
level to properly sync, but we don't keep them from running like that.
The more I think about it, I think I'd be more in favor of continuing to
strongly encourage users to adopt Hadoop 2 and revisit dropping Hadoop 1
for 2.0.0.