Hi Xiao Chen,

I am unaffected by this change either way.  If this change saves people time, 
then we should include it.  The voting outcome for 3.0.1 release determines if 
this should be addressed by the community.  I am merely bringing up the 
potential risk of the change.  With proper communication, this should not be an 
issue.

Regards,
Eric

On 1/22/18, 2:37 PM, "Xiao Chen" <x...@cloudera.com> wrote:

    Thanks all for the comments, and ATM for initiating the discussion thread.
    (I have just returned from a 2-week PTO).
    
    Reading up all the comments here and from HDFS-12990, I think we all agree
    having different default NN ports will be inconvenient for all, and
    problematic for several cases - ranging from rolling upgrade to various
    downstream use cases. In CDH, this was initially reported from downstream
    (Impala) testing when the scripts there tries to do RPC with 8020 but NN is
    running on 9820. The intuitive was 'change CM to match it'. Later cases pop
    up, including the table location in Hive metastore and custom scripts
    (including Oozie WFs). The only other real world example we heard so far is
    Anu's comment on HDFS-12990, where he did not enjoy keeping separate
    scripts for hadoop 2 / 3.
    
    Note that this limits only to NN RPC port (8020 <-> 9820), because other
    port changes in HDFS-9427 are indeed switching the default from ephemeral
    ports.
    
    The disagreement so far is how to proceed from here.
    1. Not fix it at all.
    
    This means everyone on 2.x will run into this issue when they upgrade.
    
    2. Make NN RPC listen to both 8020 and 9820
    
    Nicholas came up with this idea, which by itself smartly solves the
    compatibility problems.
    
    The downside of it is, even though things works during/after an upgrade,
    people will still have to whack-a-mole their existing 8020's. I agree
    adding this will have the side-effect to give NN more flexibility in the
    future. We can do this with or without the port change.
    
    3. Change back to 8020
    
    This will make all upgrades from 2.x -> 3.0.1 (if this goes in) free of
    this problem, because the original 8020->9820 switch doesn't appear to be a
    mature move.
    
    Downside that I summarize up are: a) what about 3.0.0 users b) compat
    
    For a), since we have *just* released 3.0.0, it's safe to say we have
    tremendously more users on 2.x than 3.0.0 now. If we make the release notes
    clear, this will benefit tremendously more users than harming.
    For b), as various others commented, this can be a special case where a
    by-definition incompatible change actually fixes a previously problematic
    incompatible change. If we can have consensus, and also notify users from
    mailing list + release notes, it doesn't weaken our compatibility
    guidelines nor surprise the community.
    
    
    Eric and Nicholas, does this address your concerns?
    
    
    -Xiao
    
    On Sun, Jan 21, 2018 at 8:27 PM, Akira Ajisaka <aajis...@apache.org> wrote:
    
    > Thanks Chris and Daryn for the replies.
    >
    > First of all, I missed why NN RPC port was moved to 9820.
    > HDFS-9427 is to avoid ephemeral port, however, NN RPC port (8020)
    > is already out of the range. The change is only to move the
    > all ports in the same range, so the change is not really necessary.
    >
    > I agree the change is disastrous for many users, however,
    > reverting the change is also disastrous for 3.0.0 users.
    > Therefore, if we are to revert the change, we must notify
    > the incompatibility to the users. Adding the notification in the
    > release announcement seems to be a good choice. Probably
    > users does not carefully read the change logs and they can
    > easily miss it, as we missed it in the release process.
    >
    > Cancelling my -1.
    >
    > -Akira
    >
    > On 2018/01/20 7:17, Daryn Sharp wrote:
    >
    >>  > I'm -1 for reverting HDFS-9427 in 3.x.
    >>
    >> I'm -1 on not reverting.  If yahoo/oath had the cycles to begin testing
    >> 3.0 prior to release, I would have -1'ed this change immediately.  It's
    >> already broken our QE testing pipeline.
    >>
    >>  > The port number is configurable, so if you want to use 8020 for NN RPC
    >> port in Hadoop 3.x, you configure this to 8020.
    >>
    >> No, it's not that easy.  THE DEFAULT IS HARDCODED.  You can only
    >> "configure" the port via hardcoding it into all paths.   Which ironically
    >> multiple people think shouldn't be done?  Let's starting thinking about 
the
    >> impact to those not running just 1 isolated cluster with fully managed
    >> services that can take downtime and be fully upgraded in sync.
    >>
    >> If the community doesn't revert, I'm not going to tell users to put the
    >> port in all their paths.  I'll hack the default back to 8020.  Then I'll
    >> have to deal with other users or closed software stacks bundled with a
    >> stock 3.0 hadoop client, or using a different 3.0 distro, using the wrong
    >> port.   They will break unless they hardcode 8020 port into paths.
    >>
    >> Let's say I do change to the new port, I still have to tell all my users
    >> with 2.x client to hardcode the new port but only after the upgrade.  If
    >> the "solution" is listening on the old and new port, it only proves that 
a
    >> port change is frivolous with zero added value.
    >>
    >> Someone please explain to me how any heterogenous multi-cluster
    >> environment benefits from this change?  How does a single cluster
    >> environment benefit from this change?  If there are no benefits to 
anyone,
    >> why are we even debating a revert?  Taking a hardline, under the guise of
    >> worrying about compatibility for tiny number of users, is either naive or
    >> political because this will potentially be disastrous for existing
    >> deployments.
    >>
    >> Daryn
    >>
    >> On Fri, Jan 19, 2018 at 3:06 AM, Akira Ajisaka <aajis...@apache.org
    >> <mailto:aajis...@apache.org>> wrote:
    >>
    >>     I'm -1 for reverting HDFS-9427 in 3.x.
    >>
    >>     The port number is configurable, so if you want to use 8020 for
    >>     NN RPC port in Hadoop 3.x, you configure this to 8020. That's fine.
    >>     I don't think it is critical problem.
    >>
    >>     If we are to revert this in 3.x, it causes additional incompatible
    >> change.
    >>
    >>     -Akira
    >>
    >>
    >>     On 2018/01/18 11:03, Tsz Wo (Nicholas), Sze wrote:
    >>
    >>            (Re-sent. Just found that my previous email seems not
    >> delivered to common-dev.)
    >>
    >>                 The question is: how are we going to fix it?>> What do
    >> you propose? -C
    >>
    >>         First of all, let's state clearly what is the problem about.
    >> Please help me out if I have missed anything.
    >>         The problem reported by HDFS-12990 is that HDFS-9427 has changed
    >> NN default RPC port from 8020 to 9820.  HDFS-12990 claimed, “the NN RPC
    >> port change is painful for downstream on migrating to Hadoop 3.”
    >>         Note 1: This isn't a problem for HA cluster.Note 2: The port is
    >> configurable.  User can set it to any value.Note 3: HDFS-9427 has also
    >> changed many other HTTP/RPC ports as shown below
    >>         Namenode ports: 50470 --> 9871, 50070 --> 9870, 8020 -->
    >> 9820Secondary NN ports: 50091 --> 9869, 50090 --> 9868Datanode ports: 
50020
    >> --> 9867, 50010 --> 9866, 50475 --> 9865, 50075 --> 9864
    >>         The other port changes probably also affect downstream projects
    >> and give them a “painful” experience.  For example, NN UI and WebHDFS 
use a
    >> different port.
    >>         The problem is related convenience but not anything serious like
    >> a security bug.
    >>         There are a few possible solutions:1) Considered that the port
    >> changes are not limited to NN RPC and the default port value should not 
be
    >> hardcoded.  Also, downstream projects probably need to fix other 
hardcoded
    >> ports (e.g. WebHDFS) anyway.  Let’s just keep all the port changes and
    >> document them clearly about the changes (we may throw an exception if 
some
    >> applications try to connect to the old ports.)  In this way, 3.0.1 is
    >> compatible with 3.0.0.
    >>         2) Further change the NN RPC so that NN listens to both 8020 and
    >> 9820 by default.  It is a new feature that NN listen to two ports
    >> simultaneously.  The feature has other benefits, e.g. one of the ports is
    >> reserved to some high priority applications so that it can have a better
    >> response time.  It is compatible to both 2.x and 3.0.0. Of course, users
    >> could choose to set it back to one of the ports in the conf.
    >>         3) Revert the NN RPC port back to 8020.  We need to ask where
    >> should the revert happen?3.1) Revert it in 3.0.1 as proposed by
    >> HDFS-12990.  However, this is an incompatible change between dot releases
    >> 3.0.0 and 3.0.1 and it violates our policy.  Being compatible is very
    >> important.  Users expect 3.0.0 and 3.0.1 are compatible.  How could we
    >> explain 3.0.0 and 3.0.1 are incompatible due to convenience?3.2) Revert 
it
    >> in 4.0.0.  There is no compatibility issue since 3.0.0 and 4.0.0 are
    >> allowed to have incompatible changes according to our policy.
    >>         Since compatibility is more important than convenience, Solution
    >> #3.1 is impermissible.  For the remaining solutions, both #1 and #2 are
    >> fine to me.
    >>         Thanks.Tsz-Wo
    >>
    >>
    >>               On Friday, January 12, 2018, 12:26:47 PM GMT+8, Chris
    >> Douglas <cdoug...@apache.org <mailto:cdoug...@apache.org>> wrote:
    >>         On Thu, Jan 11, 2018 at 6:34 PM Tsz Wo Sze <szets...@yahoo.com
    >> <mailto:szets...@yahoo.com>> wrote:
    >>
    >>            The question is: how are we going to fix it?
    >>
    >>
    >>         What do you propose? -C
    >>
    >>
    >>
    >>
    >>             No incompatible changes are allowed between 3.0.0 and 3.0.1.
    >> Dot releases only allow bug fixes.
    >>
    >>
    >>         We may not like the statement above but it is our compatibility
    >> policy.  We should either follow the policy or revise it.
    >>
    >>         Some more questions:
    >>                  - What if someone is already using 3.0.0 and has changed
    >> all the scripts to 9820?  Just let them fail?
    >>              - Compared to 2.x, 3.0.0 has many incompatible changes. Are
    >> we going to have other incompatible changes in the future minor and dot
    >> releases? What is the criteria to decide which incompatible changes are
    >> allowed?
    >>              - I hate that we have prematurely released 3.0.0 and make
    >> 3.0.1 incompatible to 3.0.0. If the "bug" is that serious, why not fixing
    >> it in 4.0.0 and declare 3.x as dead?
    >>              - It seems obvious that no one has seriously tested it so
    >> that the problem is not uncovered until now. Are there bugs in our 
current
    >> release procedure?
    >>
    >>         ThanksTsz-Wo
    >>
    >>
    >>               On Thursday, January 11, 2018, 11:36:33 AM GMT+8, Chris
    >> Douglas <cdoug...@apache.org <mailto:cdoug...@apache.org>> wrote:
    >>              Isn't this limited to reverting the 8020 -> 9820 change? -C
    >>
    >>         On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com
    >> <mailto:ey...@hortonworks.com>> wrote:
    >>
    >>             The fix in HDFS-9427 can potentially bring in new customers
    >> because less
    >>             chance for new comer to encountering “port already in use”
    >> problem.  If we
    >>             make change according to HDFS-12990, then this incompatible
    >> change does not
    >>             make incompatible change compatible.  Other ports are not
    >> reverted
    >>             according to HDFS-12990.  User will encounter the bad taste
    >> in the mouth
    >>             that HDFS-9427 attempt to solve.  Please do consider both
    >> negative side
    >>             effects of reverting as well as incompatible minor release
    >> change.  Thanks
    >>
    >>             Regards,
    >>             Eric
    >>
    >>             From: larry mccay <lmc...@apache.org <mailto:
    >> lmc...@apache.org>>
    >>             Date: Wednesday, January 10, 2018 at 10:53 AM
    >>             To: Daryn Sharp <da...@oath.com <mailto:da...@oath.com>>
    >>             Cc: "Aaron T. Myers" <a...@apache.org 
<mailto:a...@apache.org>>,
    >> Eric Yang <ey...@hortonworks.com <mailto:ey...@hortonworks.com>>,
    >>             Chris Douglas <cdoug...@apache.org <mailto:
    >> cdoug...@apache.org>>, Hadoop Common <
    >>             common-dev@hadoop.apache.org <mailto:common-...@hadoop.apac
    >> he.org>>
    >>             Subject: Re: When are incompatible changes acceptable
    >> (HDFS-12990)
    >>
    >>             On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com
    >> <mailto:da...@oath.com><mailto:
    >>
    >>             da...@oath.com <mailto:da...@oath.com>>> wrote:
    >>
    >>             I fully agree the port changes should be reverted.  Although
    >>             "incompatible", the potential impact to existing 2.x deploys
    >> is huge.  I'd
    >>             rather inconvenience 3.0 deploys that compromise <1%
    >> customers.  An
    >>             incompatible change to revert an incompatible change is 
called
    >>             compatibility.
    >>
    >>             +1
    >>
    >>
    >>
    >>
    >>             Most importantly, consider that there is no good upgrade path
    >> existing
    >>             deploys, esp. large and/or multi-cluster environments.  It’s
    >> only feasible
    >>             for first-time deploys or simple single-cluster upgrades
    >> willing to take
    >>             downtime.  Let's consider a few reasons why:
    >>
    >>
    >>
    >>             1. RU is completely broken.  Running jobs will fail.  If MR
    >> on hdfs
    >>             bundles the configs, there's no way to transparently
    >> coordinate the switch
    >>             to the new bundle with the port changed.  Job submissions
    >> will fail.
    >>
    >>
    >>
    >>             2. Users generally do not add the rpc port number to uris so
    >> unless their
    >>             configs are updated they will contact the wrong port.
    >> Seamlessly
    >>             coordinating the conf change without massive failures is
    >> impossible.
    >>
    >>
    >>
    >>             3. Even if client confs are updated, they will break in a
    >> multi-cluster
    >>             env with NNs using different ports.  Users/services will be
    >> forced to add
    >>             the port.  The cited hive "issue" is not a bug since it's the
    >> only way to
    >>             work in a multi-port env.
    >>
    >>
    >>
    >>             4. Coordinating the port add/change of uris is systems
    >> everywhere (you
    >>             know something will be missed), updating of confs, restarting
    >> all services,
    >>             requiring customers to redeploy their workflows in sync with
    >> the NN
    >>             upgrade, will cause mass disruption and downtime that will be
    >> unacceptable
    >>             for production environments.
    >>
    >>
    >>
    >>             This is a solution to a non-existent problem.  Ports can be
    >> bound by
    >>             multiple processes but only 1 can listen.  Maybe multiple
    >> listeners is an
    >>             issue for compute nodes but not responsibly managed service
    >> nodes.  Ie. Who
    >>             runs arbitrary services on the NNs that bind to random
    >> ports?  Besides, the
    >>             default port is and was ephemeral so it solved nothing.
    >>
    >>
    >>
    >>             This either standardizes ports to a particular customer's
    >> ports or is a
    >>             poorly thought out whim.  In either case, the needs of the
    >> many outweigh
    >>             the needs of the few/none (3.0 users).  The only logical
    >> conclusion is
    >>             revert.  If a particular site wants to change default ports
    >> and deal with
    >>             the massive fallout, they can explicitly change the ports
    >> themselves.
    >>
    >>
    >>
    >>             Daryn
    >>
    >>             On Tue, Jan 9, 2018 at 11:22 PM, Aaron T. Myers <
    >> a...@apache.org <mailto:a...@apache.org><mailto:
    >>             a...@apache.org <mailto:a...@apache.org>>> wrote:
    >>             On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang <
    >> ey...@hortonworks.com <mailto:ey...@hortonworks.com><mailto:
    >>
    >>             ey...@hortonworks.com <mailto:ey...@hortonworks.com>>> wrote:
    >>
    >>                 While I agree the original port change was unnecessary, I
    >> don’t think
    >>                 Hadoop NN port change is a bad thing.
    >>
    >>                 I worked for a Hadoop distro that NN RPC port was default
    >> to port 9000.
    >>                 When we migrate from BigInsights to IOP and now to HDP,
    >> we have to move
    >>                 customer Hive metadata to new NN RPC port.  It only took
    >> one developer
    >>                 (myself) to write the tool for the migration.  The
    >> incurring workload is
    >>                 not as bad as most people anticipated because Hadoop
    >> depends on
    >>                 configuration file for referencing namenode.  Most of the
    >> code can work
    >>                 transparently.  It helped to harden the downstream
    >> testing tools to be
    >>
    >>             more
    >>
    >>                 robust.
    >>
    >>
    >>             While there are of course ways to deal with this, the
    >> question really
    >>             should be whether or not it's a desirable thing to do to our
    >> users.
    >>
    >>
    >>
    >>                 We will never know how many people are actively working
    >> on Hadoop 3.0.0.
    >>                 Perhaps, couple hundred developers or thousands.
    >>
    >>
    >>
    >>             You're right that we can't know for sure, but I strongly
    >> suspect that this
    >>             is a substantial overestimate. Given how conservative Hadoop
    >> operators tend
    >>             to be, I view it as exceptionally unlikely that many
    >> deployments have been
    >>             created on or upgraded to Hadoop 3.0.0 since it was released
    >> less than a
    >>             month ago.
    >>
    >>             Further, I hope you'll agree that the number of
    >>             users/developers/deployments/applications which are
    >> currently on Hadoop 2.x
    >>             is *vastly* greater than anyone who might have jumped on
    >> Hadoop 3.0.0 so
    >>             quickly. When all of those users upgrade to any 3.x version,
    >> they will
    >>             encounter this needless incompatible change and be forced to
    >> work around
    >>             it.
    >>
    >>
    >>                 I think the switch back may have saved few developers
    >> work, but there
    >>                 could be more people getting impacted at unexpected minor
    >> release change
    >>
    >>             in
    >>
    >>                 the future.  I recommend keeping current values to avoid
    >> rule bending and
    >>                 future frustrations.
    >>
    >>
    >>             That we allow this incompatible change now does not mean that
    >> we are
    >>             categorically allowing more incompatible changes in the
    >> future. My point is
    >>             that we should in all instances evaluate the merit of any
    >> incompatible
    >>             change on a case-by-case basis. This is not an exceptional
    >> circumstance -
    >>             we've made incompatible changes in the past when appropriate,
    >> e.g. breaking
    >>             some clients to address a security issue. I and others
    >> believe that in this
    >>             case the benefits greatly outweigh the downsides of changing
    >> this back to
    >>             what it has always been.
    >>
    >>             Best,
    >>             Aaron
    >>
    >>
    >>
    >>                 Regards,
    >>                 Eric
    >>
    >>                 On 1/9/18, 11:21 AM, "Chris Douglas" <cdoug...@apache.org
    >> <mailto:cdoug...@apache.org><mailto:
    >>
    >>             cdoug...@apache.org <mailto:cdoug...@apache.org>>> wrote:
    >>
    >>
    >>                       Particularly since 9820 isn't in the contiguous
    >> range of ports in
    >>                       HDFS-9427, is there any value in this change?
    >>
    >>                       Let's change it back to prevent the disruption to
    >> users, but
    >>                       downstream projects should treat this as a bug in
    >> their tests. Please
    >>                       open JIRAs in affected projects. -C
    >>
    >>
    >>                       On Tue, Jan 9, 2018 at 5:18 AM, larry mccay <
    >> lmc...@apache.org <mailto:lmc...@apache.org>
    >>
    >>             <mailto:lmc...@apache.org <mailto:lmc...@apache.org>>> wrote:
    >>
    >>                       > On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers <
    >> a...@apache.org <mailto:a...@apache.org>
    >>
    >>             <mailto:a...@apache.org <mailto:a...@apache.org>>>
    >>
    >>                 wrote:
    >>                       >
    >>                       >> Thanks a lot for the response, Larry. Comments
    >> inline.
    >>                       >>
    >>                       >> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay <
    >> lmc...@apache.org <mailto:lmc...@apache.org>
    >>
    >>             <mailto:lmc...@apache.org <mailto:lmc...@apache.org>>>
    >>
    >>
    >>                 wrote:
    >>                       >>
    >>                       >>> Question...
    >>                       >>>
    >>                       >>> Can this be addressed in some way during or
    >> before upgrade that
    >>                 allows it
    >>                       >>> to only affect new installs?
    >>                       >>> Even a config based workaround prior to upgrade
    >> might make this a
    >>                 change
    >>                       >>> less disruptive.
    >>                       >>>
    >>                       >>> If part of the upgrade process includes a step
    >> (maybe even a
    >>                 script) to
    >>                       >>> set the NN RPC port explicitly beforehand then
    >> it would allow
    >>                 existing
    >>                       >>> deployments and related clients to remain whole
    >> - otherwise it
    >>                 will uptake
    >>                       >>> the new default port.
    >>                       >>>
    >>                       >>
    >>                       >> Perhaps something like this could be done, but I
    >> think there are
    >>                 downsides
    >>                       >> to anything like this. For example, I'm sure
    >> there are plenty of
    >>                       >> applications written on top of Hadoop that have
    >> tests which
    >>                 hard-code the
    >>                       >> port number. Nothing we do in a setup script
    >> will help here. If we
    >>                 don't
    >>                       >> change the default port back to what it was,
    >> these tests will
    >>                 likely all
    >>                       >> have to be updated.
    >>                       >>
    >>                       >>
    >>                       >
    >>                       > I may not have made my point clear enough.
    >>                       > What I meant to say is to fix the default port
    >> but direct folks to
    >>                       > explicitly set the port they are using in a
    >> deployment (the current
    >>                       > default) so that it doesn't change out from under
    >> them - unless
    >>
    >>             they
    >>
    >>                 are
    >>                       > fine with it changing.
    >>                       >
    >>                       >
    >>                       >>
    >>                       >>> Meta note: we shouldn't be so pedantic about
    >> policy that we can't
    >>                 back
    >>                       >>> out something that is considered a bug or even
    >> mistake.
    >>                       >>>
    >>                       >>
    >>                       >> This is my bigger point. Rigidly adhering to the
    >> compat guidelines
    >>                 in this
    >>                       >> instance helps almost no one, while hurting many
    >> folks.
    >>                       >>
    >>                       >> We basically made a mistake when we decided to
    >> change the default
    >>                 NN port
    >>                       >> with little upside, even between major versions.
    >> We discovered
    >>
    >>             this
    >>
    >>                 very
    >>                       >> quickly, and we have an opportunity to fix it
    >> now and in so doing
    >>                 likely
    >>                       >> disrupt very, very few users and downstream
    >> applications. If we
    >>                 don't
    >>                       >> change it, we'll be causing difficulty for our
    >> users, downstream
    >>                       >> developers, and ourselves, potentially for 
years.
    >>                       >>
    >>                       >
    >>                       > Agreed.
    >>                       >
    >>                       >
    >>                       >>
    >>                       >> Best,
    >>                       >> Aaron
    >>                       >>
    >>
    >>                       ------------------------------
    >> ---------------------------------------
    >>                       To unsubscribe, e-mail:
    >> common-dev-unsubscr...@hadoop.apache.org <mailto:common-dev-unsubscribe
    >> @hadoop.apache.org>
    >>
    >>             <mailto:common-dev-unsubscr...@hadoop.apache.org <mailto:
    >> common-dev-unsubscr...@hadoop.apache.org>>
    >>
    >>                       For additional commands, e-mail:
    >> common-dev-h...@hadoop.apache.org <mailto:common-dev-help@hadoop
    >> .apache.org>
    >>
    >>             <mailto:common-dev-h...@hadoop.apache.org <mailto:
    >> common-dev-h...@hadoop.apache.org>>
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >>     ---------------------------------------------------------------------
    >>     To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
    >> <mailto:common-dev-unsubscr...@hadoop.apache.org>
    >>     For additional commands, e-mail: common-dev-h...@hadoop.apache.org
    >> <mailto:common-dev-h...@hadoop.apache.org>
    >>
    >>
    >>
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
    > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
    >
    >
    


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to