On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com> wrote:

> I fully agree the port changes should be reverted.  Although
> "incompatible", the potential impact to existing 2.x deploys is huge.  I'd
> rather inconvenience 3.0 deploys that compromise <1% customers.  An
> incompatible change to revert an incompatible change is called
> compatibility.
>

+1


>
> Most importantly, consider that there is no good upgrade path existing
> deploys, esp. large and/or multi-cluster environments.  It’s only feasible
> for first-time deploys or simple single-cluster upgrades willing to take
> downtime.  Let's consider a few reasons why:
>
>
> 1. RU is completely broken.  Running jobs will fail.  If MR on hdfs
> bundles the configs, there's no way to transparently coordinate the switch
> to the new bundle with the port changed.  Job submissions will fail.
>
>
> 2. Users generally do not add the rpc port number to uris so unless their
> configs are updated they will contact the wrong port.  Seamlessly
> coordinating the conf change without massive failures is impossible.
>
>
> 3. Even if client confs are updated, they will break in a multi-cluster
> env with NNs using different ports.  Users/services will be forced to add
> the port.  The cited hive "issue" is not a bug since it's the only way to
> work in a multi-port env.
>
>
> 4. Coordinating the port add/change of uris is systems everywhere (you
> know something will be missed), updating of confs, restarting all services,
> requiring customers to redeploy their workflows in sync with the NN
> upgrade, will cause mass disruption and downtime that will be unacceptable
> for production environments.
>
>
> This is a solution to a non-existent problem.  Ports can be bound by
> multiple processes but only 1 can listen.  Maybe multiple listeners is an
> issue for compute nodes but not responsibly managed service nodes.  Ie. Who
> runs arbitrary services on the NNs that bind to random ports?  Besides, the
> default port is and was ephemeral so it solved nothing.
>
>
> This either standardizes ports to a particular customer's ports or is a
> poorly thought out whim.  In either case, the needs of the many outweigh
> the needs of the few/none (3.0 users).  The only logical conclusion is
> revert.  If a particular site wants to change default ports and deal with
> the massive fallout, they can explicitly change the ports themselves.
>
>
> Daryn
>
> On Tue, Jan 9, 2018 at 11:22 PM, Aaron T. Myers <a...@apache.org> wrote:
>
>> On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang <ey...@hortonworks.com> wrote:
>>
>> > While I agree the original port change was unnecessary, I don’t think
>> > Hadoop NN port change is a bad thing.
>> >
>> > I worked for a Hadoop distro that NN RPC port was default to port 9000.
>> > When we migrate from BigInsights to IOP and now to HDP, we have to move
>> > customer Hive metadata to new NN RPC port.  It only took one developer
>> > (myself) to write the tool for the migration.  The incurring workload is
>> > not as bad as most people anticipated because Hadoop depends on
>> > configuration file for referencing namenode.  Most of the code can work
>> > transparently.  It helped to harden the downstream testing tools to be
>> more
>> > robust.
>> >
>>
>> While there are of course ways to deal with this, the question really
>> should be whether or not it's a desirable thing to do to our users.
>>
>>
>> >
>> > We will never know how many people are actively working on Hadoop 3.0.0.
>> > Perhaps, couple hundred developers or thousands.
>>
>>
>> You're right that we can't know for sure, but I strongly suspect that this
>> is a substantial overestimate. Given how conservative Hadoop operators
>> tend
>> to be, I view it as exceptionally unlikely that many deployments have been
>> created on or upgraded to Hadoop 3.0.0 since it was released less than a
>> month ago.
>>
>> Further, I hope you'll agree that the number of
>> users/developers/deployments/applications which are currently on Hadoop
>> 2.x
>> is *vastly* greater than anyone who might have jumped on Hadoop 3.0.0 so
>> quickly. When all of those users upgrade to any 3.x version, they will
>> encounter this needless incompatible change and be forced to work around
>> it.
>>
>>
>> > I think the switch back may have saved few developers work, but there
>> > could be more people getting impacted at unexpected minor release
>> change in
>> > the future.  I recommend keeping current values to avoid rule bending
>> and
>> > future frustrations.
>> >
>>
>> That we allow this incompatible change now does not mean that we are
>> categorically allowing more incompatible changes in the future. My point
>> is
>> that we should in all instances evaluate the merit of any incompatible
>> change on a case-by-case basis. This is not an exceptional circumstance -
>> we've made incompatible changes in the past when appropriate, e.g.
>> breaking
>> some clients to address a security issue. I and others believe that in
>> this
>> case the benefits greatly outweigh the downsides of changing this back to
>> what it has always been.
>>
>> Best,
>> Aaron
>>
>>
>> >
>> > Regards,
>> > Eric
>> >
>> > On 1/9/18, 11:21 AM, "Chris Douglas" <cdoug...@apache.org> wrote:
>> >
>> >     Particularly since 9820 isn't in the contiguous range of ports in
>> >     HDFS-9427, is there any value in this change?
>> >
>> >     Let's change it back to prevent the disruption to users, but
>> >     downstream projects should treat this as a bug in their tests.
>> Please
>> >     open JIRAs in affected projects. -C
>> >
>> >
>> >     On Tue, Jan 9, 2018 at 5:18 AM, larry mccay <lmc...@apache.org>
>> wrote:
>> >     > On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers <a...@apache.org>
>> > wrote:
>> >     >
>> >     >> Thanks a lot for the response, Larry. Comments inline.
>> >     >>
>> >     >> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay <lmc...@apache.org>
>> > wrote:
>> >     >>
>> >     >>> Question...
>> >     >>>
>> >     >>> Can this be addressed in some way during or before upgrade that
>> > allows it
>> >     >>> to only affect new installs?
>> >     >>> Even a config based workaround prior to upgrade might make this
>> a
>> > change
>> >     >>> less disruptive.
>> >     >>>
>> >     >>> If part of the upgrade process includes a step (maybe even a
>> > script) to
>> >     >>> set the NN RPC port explicitly beforehand then it would allow
>> > existing
>> >     >>> deployments and related clients to remain whole - otherwise it
>> > will uptake
>> >     >>> the new default port.
>> >     >>>
>> >     >>
>> >     >> Perhaps something like this could be done, but I think there are
>> > downsides
>> >     >> to anything like this. For example, I'm sure there are plenty of
>> >     >> applications written on top of Hadoop that have tests which
>> > hard-code the
>> >     >> port number. Nothing we do in a setup script will help here. If
>> we
>> > don't
>> >     >> change the default port back to what it was, these tests will
>> > likely all
>> >     >> have to be updated.
>> >     >>
>> >     >>
>> >     >
>> >     > I may not have made my point clear enough.
>> >     > What I meant to say is to fix the default port but direct folks to
>> >     > explicitly set the port they are using in a deployment (the
>> current
>> >     > default) so that it doesn't change out from under them - unless
>> they
>> > are
>> >     > fine with it changing.
>> >     >
>> >     >
>> >     >>
>> >     >>> Meta note: we shouldn't be so pedantic about policy that we
>> can't
>> > back
>> >     >>> out something that is considered a bug or even mistake.
>> >     >>>
>> >     >>
>> >     >> This is my bigger point. Rigidly adhering to the compat
>> guidelines
>> > in this
>> >     >> instance helps almost no one, while hurting many folks.
>> >     >>
>> >     >> We basically made a mistake when we decided to change the default
>> > NN port
>> >     >> with little upside, even between major versions. We discovered
>> this
>> > very
>> >     >> quickly, and we have an opportunity to fix it now and in so doing
>> > likely
>> >     >> disrupt very, very few users and downstream applications. If we
>> > don't
>> >     >> change it, we'll be causing difficulty for our users, downstream
>> >     >> developers, and ourselves, potentially for years.
>> >     >>
>> >     >
>> >     > Agreed.
>> >     >
>> >     >
>> >     >>
>> >     >> Best,
>> >     >> Aaron
>> >     >>
>> >
>> >     -----------------------------------------------------------
>> ----------
>> >     To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> >     For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>> >
>> >
>> >
>> >
>>
>
>

Reply via email to