Re: AWS contact?

2021-02-20 Thread Andras Toth
Could have been but that's why I tested with a lower MTU sending back ICMP 
packet too big to prove that the packet sizes from the server decrease as a 
result.

Andras

> On 21 Feb 2021, at 08:27, William Herrin  wrote:
> 
> On Fri, Feb 19, 2021 at 4:18 PM Andras Toth  wrote:
>> Given the fact that the TCP 3-way handshake is established, sounds like some 
>> Path MTU blackholing happening. Due to it happening during TLS handshake 
>> it's likely from the server towards you.
> 
> 
> Could also be another case of botched anycast TCP where packet #2
> arrived at a different server than packet #1.
> 
> -Bill
> 
> 
> 
> -- 
> William Herrin
> b...@herrin.us
> https://bill.herrin.us/


Re: AWS contact?

2021-02-20 Thread William Herrin
On Fri, Feb 19, 2021 at 4:18 PM Andras Toth  wrote:
> Given the fact that the TCP 3-way handshake is established, sounds like some 
> Path MTU blackholing happening. Due to it happening during TLS handshake it's 
> likely from the server towards you.


Could also be another case of botched anycast TCP where packet #2
arrived at a different server than packet #1.

-Bill



-- 
William Herrin
b...@herrin.us
https://bill.herrin.us/


famous operation issues

2021-02-20 Thread Chris Cariffe
Did anyone have the fun experience of ever going into the San Jose/Santa
Clara Global Crossing Datacenter in the mid '90s?  I recall going in there
to visit a new client's gear, no real security once on the floor.  Open
racks fully exposed systems and wiring.  I was working on my client's
systems that we next to a 19" telco rack with some shelves on it holding a
router, switches, and servers.  I restrained from just tapping in!


Re: Famous operational issues

2021-02-20 Thread Jörg Kost

Oh,

I actually wanted to keep this for my memoirs, but if we can name danger 
datacenter operational issues …. somehow 2000s:


Somebody ran its own datacenter,
- once had an active ant colony living under the raised floor and in the 
climate system,
- for a while had several electric grounding defects, leading to the 
work instruction of “don’t touch any metallic or conducting 
materials”,
- for a minute, had a “look what we have bought on Ebay” - UPS 
system, until started to roast after turned on,
- from time to time had climate issues, leading to temperatures around 
peaks  with 68 centigrade room temperature, and yes, some equipment 
survived and even continued to work.


Decided not to go back there, after “look what we have bought on Ebay, 
an argon fire distinguisher, we just need to mount it”.


On 20 Feb 2021, at 10:15, Eric Kuhnke wrote:

From a datacenter ROI and economics, cooling, HVAC perspective that 
might
just be the best colo customer ever. As long as they're paying full 
price
for the cabinet and nothing is *dangerous* about how they've hung the 
2U
server vertically, using up all that space for just one thing has to 
be a
lot better than a customer that makes full and efficient use of space 
and

all the amperage allotted to them.




Re: RPKI invalid logs?

2021-02-20 Thread Job Snijders via NANOG
Dear Hank,

On Sat, Feb 20, 2021 at 07:37:08PM +0200, Hank Nussbacher wrote:
> Is there a place where one can examine RPKI invalid logs for a specific date
> & time 

I have set up a publicly accessible archiver instance in Dallas, and one
in Amsterdam which capture and archive data every 20 minutes.

Please visit for access to downloadable archives http://www.rpkiviews.org/

> or even better logs showing those that dropped RPKI invalid
> announcements?

You can extract the rpki-client.json file from the archive from the
timestamp you are interested in, and pass it as cache file to
https://github.com/job/rpki-ov-checker, and via STDIN feed it a list of
Prefix + OriginAS combos (sourced from MRT data or your internal
administration / expectations).

If you like this service, please consider making a server in Israel
available to rpkiviews.org. All that is required is a
POSIX.1-ish-compliant server (BSD, Linux, or UNIX), and about 6
terabytes of storage (should be good for next 3 years), and a globally
unique publicly reachable IP address. You pick the hostname.

Kind regards,

Job


Re: public open resolver list?

2021-02-20 Thread Jay R. Ashworth
- Original Message -
> From: "Bill Woodcock" 

> Are all y’all allergic to Wikipedia or something?

Lots of people seem to be... :-}

> https://en.wikipedia.org/wiki/Public_recursive_name_server

I find it interesting that that article mentions alt-roots, but doesn't
have a column for that, nor any actual mention of such resolvers...

Cheers,
-- jra
-- 
Jay R. Ashworth  Baylink   j...@baylink.com
Designer The Things I Think   RFC 2100
Ashworth & Associates   http://www.bcp38.info  2000 Land Rover DII
St Petersburg FL USA  BCP38: Ask For It By Name!   +1 727 647 1274


Re: Support for End User Services

2021-02-20 Thread Mark Tinka



On 2/20/21 19:18, Mike Hammett wrote:

Leave aside any conversation about whether the business has the 
ability (or approval) to pay for it or not.



Is it appropriate for organizations that provide services to end-users 
to require that you are a paying customer to contact their support?


Is it appropriate to pretend to be your complaining customer to get 
support on network-level issues (IP Geolocation, false VPN notices, 
buffering, despite a clean path to their CDN, etc.)?


I'd argue, no... but then, the company does expend resources to deal 
with support queries. It won't scale well to use those resources on 
queries that do not contribute to that cost.


That said, the major content providers have found a way to provide 
support without actually speaking to warm bodies. While it doesn't work 
so well, it is better than nothing for queries that do not directly 
contribute to that cost.


Am I convinced there should be better way? Sure!

Do I know what that is? Not right now!

Mark.


RPKI invalid logs?

2021-02-20 Thread Hank Nussbacher
Is there a place where one can examine RPKI invalid logs for a specific 
date & time or even better logs showing those that dropped RPKI invalid 
announcements?



Thanks,

Hank



Support for End User Services

2021-02-20 Thread Mike Hammett
Leave aside any conversation about whether the business has the ability (or 
approval) to pay for it or not. 




Is it appropriate for organizations that provide services to end-users to 
require that you are a paying customer to contact their support? 


Is it appropriate to pretend to be your complaining customer to get support on 
network-level issues (IP Geolocation, false VPN notices, buffering, despite a 
clean path to their CDN, etc.)? 




- 
Mike Hammett 
Intelligent Computing Solutions 

Midwest Internet Exchange 

The Brothers WISP 



Re: Famous operational issues

2021-02-20 Thread Clayton Zekelman



Not a famous operational issue, but in 2000, we had a major outage of 
our dialup modem pool.


The owner of the building was re-skinning the outside using Styrofoam 
and stucco.  A bunch of the Styrofoam
had blocked the roof drains on the podium section of the building, 
immediately above our equipment room.


A flash rainstorm filled the entire flat roof, and water came back in 
over the flashings, and poured directly in
to our dialup modem pool through the hole in the concrete roof deck 
where the drain pipe protruded through.


In retrospect, it was a monumentally stupid place to put our main 
modem pool, but we didn't realize what was
above the drop ceiling - and that it was roof, not the other 11 
floors of the building.


1 bay of 6 shelves of USR TC 1000 HiperDSPs were now very wet and 
blinking funny patterns on their LEDs.


Fortunately, our vendor in Toronto (4 hour drive away) had stock of 
equipment that another customer kept
delaying shipment on.  They got their staff in, started un-boxing 
and, slotting cards.  We spent a few hours

tearing out the old gear and getting ready for replacements.

We left Windsor, Ontario at around 12:00am - same time they left 
Toronto, heading towards us.  We coordinated
a meet at one of the rural exits along Highway 401 at a closed gas 
station at around 2am.


Everything was going so well until a cop pulled up, and asked us what 
we were doing, as we were slinging
modem chassis between the back of the vendor's SUV and our van... We 
calmly explained
what happened.  He looked between us a couple of times, shook his 
head and said "well, good luck with that",

got back in his car and drove away.

We had everything back online within 14 hours of the initial outage.

At 02:37 PM 16/02/2021, John Kristoff wrote:

Friends,

I'd like to start a thread about the most famous and widespread Internet
operational issues, outages or implementation incompatibilities you
have seen.

Which examples would make up your top three?

To get things started, I'd suggest the AS 7007 event is perhaps  the
most notorious and likely to top many lists including mine.  So if
that is one for you I'm asking for just two more.

I'm particularly interested in this as the first step in developing a
future NANOG session.  I'd be particularly interested in any issues
that also identify key individuals that might still be around and
interested in participating in a retrospective.  I already have someone
that is willing to talk about AS 7007, which shouldn't be hard to guess
who.

Thanks in advance for your suggestions,

John


--

Clayton Zekelman
Managed Network Systems Inc. (MNSi)
3363 Tecumseh Rd. E
Windsor, Ontario
N8W 1H4

tel. 519-985-8410
fax. 519-985-8409



Re: Famous operational issues

2021-02-20 Thread Eric Kuhnke
>From a datacenter ROI and economics, cooling, HVAC perspective that might
just be the best colo customer ever. As long as they're paying full price
for the cabinet and nothing is *dangerous* about how they've hung the 2U
server vertically, using up all that space for just one thing has to be a
lot better than a customer that makes full and efficient use of space and
all the amperage allotted to them.


On Thu, Feb 18, 2021 at 11:38 AM t...@pelican.org  wrote:

> On Thursday, 18 February, 2021 16:23, "Seth Mattinen" 
> said:
>
> > I had a customer that tried to stack their servers - no rails except the
> > bottom most one - using 2x4's between each server. Up until then I
> > hadn't imagined anyone would want to fill their cabinet with wood, so I
> > made a rule to ban wood and anything tangentially related (cardboard,
> > paper, plastic, etc.). Easier to just ban all things. Fire reasons too
> > but mainly I thought a cabinet full of wood was too stupid to allow.
>
> On the "stupid racking" front, I give you most of a rack dedicated to a
> single server.  Not all that high a server, maybe 2U or so, but *way* too
> deep for the rack, so it had been installed vertically.  By looping some
> fairly hefty chain through the handles on either side of the front of the
> chassis, and then bolting the four chain ends to the four rack posts.  I
> wish I'd kept pictures of that one.  Not flammable, but a serious WTF
> moment.
>
> Cheers,
> Tim.
>
>
>


Re: Famous operational issues

2021-02-20 Thread Henry Yen
On Thu, Feb 18, 2021 at 07:34:39AM -0500, Patrick W. Gilmore wrote:
> In 1994, there was a major earthquake near the city of Los Angeles. City hall 
> had to be evacuated and it would take over a year to reinforce the building 
> to make it habitable again. My company moved all the systems in the basement 
> of city hall to a new datacenter a mile or so away. After the install, we 
> spent more than a week coaxing their ancient (even for 1994) machines back 
> online, such as a Prime Computer and an AS400 with tons of DASD. Well, tons 
> of cabinets, certainly less storage than my watch has now.
> 
> I was in the DC going over something with the lady in charge when someone 
> walked in to ask her something. She said “just a second”. That person 
> took one step to the side of the door and leaned against the wall - right on 
> an EPO which had no cover.
> 
> Have you ever heard an entire row of DASD spin down instantly? Or taken 40 
> minutes to IPL an AS400? In the middle of the business day? For the second 
> most populous city in the country?
> 
>   Me: Maybe you should get a cover for that?
>   Her: Good idea.
> 
> Couple weeks later, in the same DC, going over final checklist. A fedex guy 
> walks in. (To this day, no idea how he got in a supposedly locked DC.) She 
> says “just a second”, and I get a very strong deja vu feeling. He takes 
> one step to the side and leans against the wall.
> 
>   Me: Did you order that EPO cover?
>   Her: Nope.

some of the ibm 4300 series mini-mainframes came with a console terminal
that had a very large, raised (completely not flush), alternate power
button on the upper panel of the keyboard, facing the operator. in later
versions, the button was inset in a little open box with high sides. in
earlier versions, there was just a pair of raised ribs on either side of the
button. in the earliest version, if that panel needed to be replaced, the
replacement part didn't even have those protective ribs, this huge button
was just sitting there. on our 4341, someone had dropped the keyboard during
installation and the damaged panel was replaced with the
no-protection-whatsoever part.

i had an operator who, working a double shift into the overnight run,
fell asleep and managed to bang his head square on the button.
the overnight jobs running were left in various states of ruin.

third party manufacturers had an easy sell for lucite power/EPO button covers.

--
Henry Yen   Aegis Information Systems, Inc.
Senior Systems Programmer   Hicksville, New York