Re: AWS contact?
Could have been but that's why I tested with a lower MTU sending back ICMP packet too big to prove that the packet sizes from the server decrease as a result. Andras > On 21 Feb 2021, at 08:27, William Herrin wrote: > > On Fri, Feb 19, 2021 at 4:18 PM Andras Toth wrote: >> Given the fact that the TCP 3-way handshake is established, sounds like some >> Path MTU blackholing happening. Due to it happening during TLS handshake >> it's likely from the server towards you. > > > Could also be another case of botched anycast TCP where packet #2 > arrived at a different server than packet #1. > > -Bill > > > > -- > William Herrin > b...@herrin.us > https://bill.herrin.us/
Re: AWS contact?
On Fri, Feb 19, 2021 at 4:18 PM Andras Toth wrote: > Given the fact that the TCP 3-way handshake is established, sounds like some > Path MTU blackholing happening. Due to it happening during TLS handshake it's > likely from the server towards you. Could also be another case of botched anycast TCP where packet #2 arrived at a different server than packet #1. -Bill -- William Herrin b...@herrin.us https://bill.herrin.us/
famous operation issues
Did anyone have the fun experience of ever going into the San Jose/Santa Clara Global Crossing Datacenter in the mid '90s? I recall going in there to visit a new client's gear, no real security once on the floor. Open racks fully exposed systems and wiring. I was working on my client's systems that we next to a 19" telco rack with some shelves on it holding a router, switches, and servers. I restrained from just tapping in!
Re: Famous operational issues
Oh, I actually wanted to keep this for my memoirs, but if we can name danger datacenter operational issues …. somehow 2000s: Somebody ran its own datacenter, - once had an active ant colony living under the raised floor and in the climate system, - for a while had several electric grounding defects, leading to the work instruction of “don’t touch any metallic or conducting materials”, - for a minute, had a “look what we have bought on Ebay” - UPS system, until started to roast after turned on, - from time to time had climate issues, leading to temperatures around peaks with 68 centigrade room temperature, and yes, some equipment survived and even continued to work. Decided not to go back there, after “look what we have bought on Ebay, an argon fire distinguisher, we just need to mount it”. On 20 Feb 2021, at 10:15, Eric Kuhnke wrote: From a datacenter ROI and economics, cooling, HVAC perspective that might just be the best colo customer ever. As long as they're paying full price for the cabinet and nothing is *dangerous* about how they've hung the 2U server vertically, using up all that space for just one thing has to be a lot better than a customer that makes full and efficient use of space and all the amperage allotted to them.
Re: RPKI invalid logs?
Dear Hank, On Sat, Feb 20, 2021 at 07:37:08PM +0200, Hank Nussbacher wrote: > Is there a place where one can examine RPKI invalid logs for a specific date > & time I have set up a publicly accessible archiver instance in Dallas, and one in Amsterdam which capture and archive data every 20 minutes. Please visit for access to downloadable archives http://www.rpkiviews.org/ > or even better logs showing those that dropped RPKI invalid > announcements? You can extract the rpki-client.json file from the archive from the timestamp you are interested in, and pass it as cache file to https://github.com/job/rpki-ov-checker, and via STDIN feed it a list of Prefix + OriginAS combos (sourced from MRT data or your internal administration / expectations). If you like this service, please consider making a server in Israel available to rpkiviews.org. All that is required is a POSIX.1-ish-compliant server (BSD, Linux, or UNIX), and about 6 terabytes of storage (should be good for next 3 years), and a globally unique publicly reachable IP address. You pick the hostname. Kind regards, Job
Re: public open resolver list?
- Original Message - > From: "Bill Woodcock" > Are all y’all allergic to Wikipedia or something? Lots of people seem to be... :-} > https://en.wikipedia.org/wiki/Public_recursive_name_server I find it interesting that that article mentions alt-roots, but doesn't have a column for that, nor any actual mention of such resolvers... Cheers, -- jra -- Jay R. Ashworth Baylink j...@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274
Re: Support for End User Services
On 2/20/21 19:18, Mike Hammett wrote: Leave aside any conversation about whether the business has the ability (or approval) to pay for it or not. Is it appropriate for organizations that provide services to end-users to require that you are a paying customer to contact their support? Is it appropriate to pretend to be your complaining customer to get support on network-level issues (IP Geolocation, false VPN notices, buffering, despite a clean path to their CDN, etc.)? I'd argue, no... but then, the company does expend resources to deal with support queries. It won't scale well to use those resources on queries that do not contribute to that cost. That said, the major content providers have found a way to provide support without actually speaking to warm bodies. While it doesn't work so well, it is better than nothing for queries that do not directly contribute to that cost. Am I convinced there should be better way? Sure! Do I know what that is? Not right now! Mark.
RPKI invalid logs?
Is there a place where one can examine RPKI invalid logs for a specific date & time or even better logs showing those that dropped RPKI invalid announcements? Thanks, Hank
Support for End User Services
Leave aside any conversation about whether the business has the ability (or approval) to pay for it or not. Is it appropriate for organizations that provide services to end-users to require that you are a paying customer to contact their support? Is it appropriate to pretend to be your complaining customer to get support on network-level issues (IP Geolocation, false VPN notices, buffering, despite a clean path to their CDN, etc.)? - Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP
Re: Famous operational issues
Not a famous operational issue, but in 2000, we had a major outage of our dialup modem pool. The owner of the building was re-skinning the outside using Styrofoam and stucco. A bunch of the Styrofoam had blocked the roof drains on the podium section of the building, immediately above our equipment room. A flash rainstorm filled the entire flat roof, and water came back in over the flashings, and poured directly in to our dialup modem pool through the hole in the concrete roof deck where the drain pipe protruded through. In retrospect, it was a monumentally stupid place to put our main modem pool, but we didn't realize what was above the drop ceiling - and that it was roof, not the other 11 floors of the building. 1 bay of 6 shelves of USR TC 1000 HiperDSPs were now very wet and blinking funny patterns on their LEDs. Fortunately, our vendor in Toronto (4 hour drive away) had stock of equipment that another customer kept delaying shipment on. They got their staff in, started un-boxing and, slotting cards. We spent a few hours tearing out the old gear and getting ready for replacements. We left Windsor, Ontario at around 12:00am - same time they left Toronto, heading towards us. We coordinated a meet at one of the rural exits along Highway 401 at a closed gas station at around 2am. Everything was going so well until a cop pulled up, and asked us what we were doing, as we were slinging modem chassis between the back of the vendor's SUV and our van... We calmly explained what happened. He looked between us a couple of times, shook his head and said "well, good luck with that", got back in his car and drove away. We had everything back online within 14 hours of the initial outage. At 02:37 PM 16/02/2021, John Kristoff wrote: Friends, I'd like to start a thread about the most famous and widespread Internet operational issues, outages or implementation incompatibilities you have seen. Which examples would make up your top three? To get things started, I'd suggest the AS 7007 event is perhaps the most notorious and likely to top many lists including mine. So if that is one for you I'm asking for just two more. I'm particularly interested in this as the first step in developing a future NANOG session. I'd be particularly interested in any issues that also identify key individuals that might still be around and interested in participating in a retrospective. I already have someone that is willing to talk about AS 7007, which shouldn't be hard to guess who. Thanks in advance for your suggestions, John -- Clayton Zekelman Managed Network Systems Inc. (MNSi) 3363 Tecumseh Rd. E Windsor, Ontario N8W 1H4 tel. 519-985-8410 fax. 519-985-8409
Re: Famous operational issues
>From a datacenter ROI and economics, cooling, HVAC perspective that might just be the best colo customer ever. As long as they're paying full price for the cabinet and nothing is *dangerous* about how they've hung the 2U server vertically, using up all that space for just one thing has to be a lot better than a customer that makes full and efficient use of space and all the amperage allotted to them. On Thu, Feb 18, 2021 at 11:38 AM t...@pelican.org wrote: > On Thursday, 18 February, 2021 16:23, "Seth Mattinen" > said: > > > I had a customer that tried to stack their servers - no rails except the > > bottom most one - using 2x4's between each server. Up until then I > > hadn't imagined anyone would want to fill their cabinet with wood, so I > > made a rule to ban wood and anything tangentially related (cardboard, > > paper, plastic, etc.). Easier to just ban all things. Fire reasons too > > but mainly I thought a cabinet full of wood was too stupid to allow. > > On the "stupid racking" front, I give you most of a rack dedicated to a > single server. Not all that high a server, maybe 2U or so, but *way* too > deep for the rack, so it had been installed vertically. By looping some > fairly hefty chain through the handles on either side of the front of the > chassis, and then bolting the four chain ends to the four rack posts. I > wish I'd kept pictures of that one. Not flammable, but a serious WTF > moment. > > Cheers, > Tim. > > >
Re: Famous operational issues
On Thu, Feb 18, 2021 at 07:34:39AM -0500, Patrick W. Gilmore wrote: > In 1994, there was a major earthquake near the city of Los Angeles. City hall > had to be evacuated and it would take over a year to reinforce the building > to make it habitable again. My company moved all the systems in the basement > of city hall to a new datacenter a mile or so away. After the install, we > spent more than a week coaxing their ancient (even for 1994) machines back > online, such as a Prime Computer and an AS400 with tons of DASD. Well, tons > of cabinets, certainly less storage than my watch has now. > > I was in the DC going over something with the lady in charge when someone > walked in to ask her something. She said âjust a secondâ. That person > took one step to the side of the door and leaned against the wall - right on > an EPO which had no cover. > > Have you ever heard an entire row of DASD spin down instantly? Or taken 40 > minutes to IPL an AS400? In the middle of the business day? For the second > most populous city in the country? > > Me: Maybe you should get a cover for that? > Her: Good idea. > > Couple weeks later, in the same DC, going over final checklist. A fedex guy > walks in. (To this day, no idea how he got in a supposedly locked DC.) She > says âjust a secondâ, and I get a very strong deja vu feeling. He takes > one step to the side and leans against the wall. > > Me: Did you order that EPO cover? > Her: Nope. some of the ibm 4300 series mini-mainframes came with a console terminal that had a very large, raised (completely not flush), alternate power button on the upper panel of the keyboard, facing the operator. in later versions, the button was inset in a little open box with high sides. in earlier versions, there was just a pair of raised ribs on either side of the button. in the earliest version, if that panel needed to be replaced, the replacement part didn't even have those protective ribs, this huge button was just sitting there. on our 4341, someone had dropped the keyboard during installation and the damaged panel was replaced with the no-protection-whatsoever part. i had an operator who, working a double shift into the overnight run, fell asleep and managed to bang his head square on the button. the overnight jobs running were left in various states of ruin. third party manufacturers had an easy sell for lucite power/EPO button covers. -- Henry Yen Aegis Information Systems, Inc. Senior Systems Programmer Hicksville, New York