On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com> wrote: > I fully agree the port changes should be reverted. Although > "incompatible", the potential impact to existing 2.x deploys is huge. I'd > rather inconvenience 3.0 deploys that compromise <1% customers. An > incompatible change to revert an incompatible change is called > compatibility. >
+1 > > Most importantly, consider that there is no good upgrade path existing > deploys, esp. large and/or multi-cluster environments. It’s only feasible > for first-time deploys or simple single-cluster upgrades willing to take > downtime. Let's consider a few reasons why: > > > 1. RU is completely broken. Running jobs will fail. If MR on hdfs > bundles the configs, there's no way to transparently coordinate the switch > to the new bundle with the port changed. Job submissions will fail. > > > 2. Users generally do not add the rpc port number to uris so unless their > configs are updated they will contact the wrong port. Seamlessly > coordinating the conf change without massive failures is impossible. > > > 3. Even if client confs are updated, they will break in a multi-cluster > env with NNs using different ports. Users/services will be forced to add > the port. The cited hive "issue" is not a bug since it's the only way to > work in a multi-port env. > > > 4. Coordinating the port add/change of uris is systems everywhere (you > know something will be missed), updating of confs, restarting all services, > requiring customers to redeploy their workflows in sync with the NN > upgrade, will cause mass disruption and downtime that will be unacceptable > for production environments. > > > This is a solution to a non-existent problem. Ports can be bound by > multiple processes but only 1 can listen. Maybe multiple listeners is an > issue for compute nodes but not responsibly managed service nodes. Ie. Who > runs arbitrary services on the NNs that bind to random ports? Besides, the > default port is and was ephemeral so it solved nothing. > > > This either standardizes ports to a particular customer's ports or is a > poorly thought out whim. In either case, the needs of the many outweigh > the needs of the few/none (3.0 users). The only logical conclusion is > revert. If a particular site wants to change default ports and deal with > the massive fallout, they can explicitly change the ports themselves. > > > Daryn > > On Tue, Jan 9, 2018 at 11:22 PM, Aaron T. Myers <a...@apache.org> wrote: > >> On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang <ey...@hortonworks.com> wrote: >> >> > While I agree the original port change was unnecessary, I don’t think >> > Hadoop NN port change is a bad thing. >> > >> > I worked for a Hadoop distro that NN RPC port was default to port 9000. >> > When we migrate from BigInsights to IOP and now to HDP, we have to move >> > customer Hive metadata to new NN RPC port. It only took one developer >> > (myself) to write the tool for the migration. The incurring workload is >> > not as bad as most people anticipated because Hadoop depends on >> > configuration file for referencing namenode. Most of the code can work >> > transparently. It helped to harden the downstream testing tools to be >> more >> > robust. >> > >> >> While there are of course ways to deal with this, the question really >> should be whether or not it's a desirable thing to do to our users. >> >> >> > >> > We will never know how many people are actively working on Hadoop 3.0.0. >> > Perhaps, couple hundred developers or thousands. >> >> >> You're right that we can't know for sure, but I strongly suspect that this >> is a substantial overestimate. Given how conservative Hadoop operators >> tend >> to be, I view it as exceptionally unlikely that many deployments have been >> created on or upgraded to Hadoop 3.0.0 since it was released less than a >> month ago. >> >> Further, I hope you'll agree that the number of >> users/developers/deployments/applications which are currently on Hadoop >> 2.x >> is *vastly* greater than anyone who might have jumped on Hadoop 3.0.0 so >> quickly. When all of those users upgrade to any 3.x version, they will >> encounter this needless incompatible change and be forced to work around >> it. >> >> >> > I think the switch back may have saved few developers work, but there >> > could be more people getting impacted at unexpected minor release >> change in >> > the future. I recommend keeping current values to avoid rule bending >> and >> > future frustrations. >> > >> >> That we allow this incompatible change now does not mean that we are >> categorically allowing more incompatible changes in the future. My point >> is >> that we should in all instances evaluate the merit of any incompatible >> change on a case-by-case basis. This is not an exceptional circumstance - >> we've made incompatible changes in the past when appropriate, e.g. >> breaking >> some clients to address a security issue. I and others believe that in >> this >> case the benefits greatly outweigh the downsides of changing this back to >> what it has always been. >> >> Best, >> Aaron >> >> >> > >> > Regards, >> > Eric >> > >> > On 1/9/18, 11:21 AM, "Chris Douglas" <cdoug...@apache.org> wrote: >> > >> > Particularly since 9820 isn't in the contiguous range of ports in >> > HDFS-9427, is there any value in this change? >> > >> > Let's change it back to prevent the disruption to users, but >> > downstream projects should treat this as a bug in their tests. >> Please >> > open JIRAs in affected projects. -C >> > >> > >> > On Tue, Jan 9, 2018 at 5:18 AM, larry mccay <lmc...@apache.org> >> wrote: >> > > On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers <a...@apache.org> >> > wrote: >> > > >> > >> Thanks a lot for the response, Larry. Comments inline. >> > >> >> > >> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay <lmc...@apache.org> >> > wrote: >> > >> >> > >>> Question... >> > >>> >> > >>> Can this be addressed in some way during or before upgrade that >> > allows it >> > >>> to only affect new installs? >> > >>> Even a config based workaround prior to upgrade might make this >> a >> > change >> > >>> less disruptive. >> > >>> >> > >>> If part of the upgrade process includes a step (maybe even a >> > script) to >> > >>> set the NN RPC port explicitly beforehand then it would allow >> > existing >> > >>> deployments and related clients to remain whole - otherwise it >> > will uptake >> > >>> the new default port. >> > >>> >> > >> >> > >> Perhaps something like this could be done, but I think there are >> > downsides >> > >> to anything like this. For example, I'm sure there are plenty of >> > >> applications written on top of Hadoop that have tests which >> > hard-code the >> > >> port number. Nothing we do in a setup script will help here. If >> we >> > don't >> > >> change the default port back to what it was, these tests will >> > likely all >> > >> have to be updated. >> > >> >> > >> >> > > >> > > I may not have made my point clear enough. >> > > What I meant to say is to fix the default port but direct folks to >> > > explicitly set the port they are using in a deployment (the >> current >> > > default) so that it doesn't change out from under them - unless >> they >> > are >> > > fine with it changing. >> > > >> > > >> > >> >> > >>> Meta note: we shouldn't be so pedantic about policy that we >> can't >> > back >> > >>> out something that is considered a bug or even mistake. >> > >>> >> > >> >> > >> This is my bigger point. Rigidly adhering to the compat >> guidelines >> > in this >> > >> instance helps almost no one, while hurting many folks. >> > >> >> > >> We basically made a mistake when we decided to change the default >> > NN port >> > >> with little upside, even between major versions. We discovered >> this >> > very >> > >> quickly, and we have an opportunity to fix it now and in so doing >> > likely >> > >> disrupt very, very few users and downstream applications. If we >> > don't >> > >> change it, we'll be causing difficulty for our users, downstream >> > >> developers, and ourselves, potentially for years. >> > >> >> > > >> > > Agreed. >> > > >> > > >> > >> >> > >> Best, >> > >> Aaron >> > >> >> > >> > ----------------------------------------------------------- >> ---------- >> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org >> > >> > >> > >> > >> > >