Aggression being "do we go out of our way to prevent Hadoop 1 from
working". Like you said, it works now and we have nothing in the code
that doesn't reasonably support Hadoop 1 and 2.
I agree with reconsidering this again when 2.0.0 circles around once more.
Thanks for everyone's input.
Christopher wrote:
I don't think it's about aggression... it's about progress. We can continue
to support a version with bugfixes that does work on Hadoop 1 for some
time. If it's likely to cause problems in the bleeding edge, though... we
can drop it there. In any case, I'm fine with doing this in 2.0.0, instead,
if it makes it easier to leave a support window in 1.x for Hadoop 1.
--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
On Wed, Oct 8, 2014 at 12:43 PM, Josh Elser<[email protected]> wrote:
Sean Busbey wrote:
On Wed, Oct 8, 2014 at 11:12 AM, Christopher<[email protected]> wrote:
On Wed, Oct 8, 2014 at 9:47 AM, Sean Busbey<[email protected]> wrote:
On Wed, Oct 8, 2014 at 12:13 AM, Josh Elser<[email protected]>
wrote:
Forgot one:
*Drop Hadoop 1 support*
- We would no longer care about maintaining Hadoop 1 APIs (get rid
of
crappy reflection)
- 2.2.0 (Hadoop 2 "stable") came out just under 1 year ago
- Can be done for 1.7 or reconsidered for 2.0
Do we already know if focusing on just Hadoop 2.2.0+ support will
result
in
any API impact?
I can't imagine it will impact our client API much, if any, but it'll
certainly help simplify and fix bugs in our use of Hadoop (when they
appear), and may result in fewer compatibility issues in MapReduce and
elsewhere (dynamic class loading, reflection workarounds). It should also
allow us to provide some guarantees in durability-related features
(because
we'll know they exist). It should also help simplify documentation and
example configs, as well as reduce testing burdens.
+1 to dropping Hadoop<2.2.0 in 1.7.0.
If it doesn't impact API or usage I'd also be +1. If it ends up leaking
out
somewhere, I'd want to take the opportunity of 1.7.0 to flag deprecations
so the transition can be made less severe (and remove in 2.0).
Frankly, given the general expectations on durability we formed in 1.4 I'd
say it's dangerous for people to be running 1.5+ on Hadoop 1.
Yeah, that's definitely what has to be weighed. Are we so aggressive to
prevent users from even running on Hadoop 1 because it is inherently
missing some of the sync mechanics needed to ensure no data loss regardless
of redundant power?
We warn users when they don't have the necessary options set at the HDFS
level to properly sync, but we don't keep them from running like that.
The more I think about it, I think I'd be more in favor of continuing to
strongly encourage users to adopt Hadoop 2 and revisit dropping Hadoop 1
for 2.0.0.