Re: CloudFlare issues?

2019-06-24 Thread Christopher Morrow
On Tue, Jun 25, 2019 at 12:49 AM Hank Nussbacher  wrote:
>
> On 25/06/2019 03:03, Tom Beecher wrote:
> > Disclaimer : I am a Verizon employee via the Yahoo acquisition. I do
> > not work on 701.  My comments are my own opinions only.
> >
> > Respectfully, I believe Cloudflare’s public comments today have been a
> > real disservice. This blog post, and your CEO on Twitter today, took
> > every opportunity to say “DAMN THOSE MORONS AT 701!”. They’re not.
> >
> >
> Perhaps suggest to VZ management to use their blog:
> https://www.verizondigitalmedia.com/blog/

#coughwrongvz

I think anyway - you probably mean:
https://enterprise.verizon.com/

GoodLuck! I think it's 3 clicks to: "www22.verizon.com" which gets
even moar fun!
The NOC used to answer if you called: +1-800-900-0241
which is in their whois records...

> to contrandict what CF blogged about?
>
> -Hank
>


Re: CloudFlare issues?

2019-06-24 Thread Hank Nussbacher

On 25/06/2019 03:03, Tom Beecher wrote:
Disclaimer : I am a Verizon employee via the Yahoo acquisition. I do 
not work on 701.  My comments are my own opinions only.


Respectfully, I believe Cloudflare’s public comments today have been a 
real disservice. This blog post, and your CEO on Twitter today, took 
every opportunity to say “DAMN THOSE MORONS AT 701!”. They’re not.




Perhaps suggest to VZ management to use their blog:
https://www.verizondigitalmedia.com/blog/
to contradict what CF blogged about?

-Hank



Re: Cost effective time servers

2019-06-24 Thread Forrest Christian (List Account)
It's about minimizing the impact of the attack vector. And you
shouldn't implicitly trust the second alignment either.

In a potential spoofing attack, if you trust the GPS for all of the
data exclusively, then someone who can spoof your GPS (not as
hard/expensive as one would think) can fully control what time you
think it is.This is obviously bad.

If instead you take the time data from another source, and only take
the second from the GPS, at most you're going to be off a second.
This is less bad but still bad in some cases.

Fortunately, we can easily do better than this.

NTP itself provides the solution.

Ideally you'd get your time from multiple sources and use some sort of
algorithm to determine what the most likely correct time is.   NTP has
this functionality built in.   If you take a stratum 2 or 3 server,
and add multiple, geographically diverse stratum 1 and 2 servers to
it, the stratum 2 or 3 server will look at all of the views of time
including second alignment that it is receiving, and will determine
which servers can be trusted and which can't.   If a stratum 1 server
is being spoofed, the stratum 2 or 3 server will notice that it is out
of alignment and ignore it.   In this way, you don't trust what is
coming down the GPS of one or two stratum 1 servers.

For most people just running a stratum 2 or 3 server with a
well-curated set of stratum 1 or 2 servers scattered around the
internet will be accurate enough, and will provide robust, not easily
spoofed time.   The limitation here is that this is limited by RTT/2
in the worst case, so if you're a long ways away from your closest
stratum 1 server, your clock may be offset by up to RTT/2 plus
whatever systemic errors are inherent in the stratum 1 server (cable
delays, etc).   If you need better alignment, a local stratum 1 server
can be used, but it should just be added to your local stratum 2 or 3
server to improve the alignment of the second.   Once this is added,
the stratum 2 or 3 server will typically notice that it's really close
and will start to follow it's second alignment, but only if it is
within the window that it has determined is likely to be valid by the
'voting' of all of the other stratum 1 and 2 servers which are
scattered around.

One other note:  There are some stratum 1 servers out there which do
not generally rely on GPS for time transfer from their stratum 1
clocks.   For instance, the NIST and USNO ntp servers, along with
others around the world in various standards organizations.  It might
pay to include some of these in your mix as well.

On Mon, Jun 24, 2019 at 8:36 PM Chris Adams  wrote:
>
> Once upon a time, Forrest Christian (List Account)  
> said:
> > I would submit that the proper use of a GPS receiver is for alignment
> > of the start of the second to a more precise value than can be
> > distributed across an asymmetric network like the Internet.  The
> > actual 'time label' for that second doesn't necessarily need to come
> > from GPS at all.  For security reasons, it's probably a good thing to
> > make sure you validate the data received from GPS in any case.
>
> If you don't trust the GPS receiver's idea of the time, why do you trust
> its start of the second?  It seems really odd to trust one and not the
> other.
> --
> Chris Adams 



-- 
- Forrest


Re: Cost effective time servers

2019-06-24 Thread Chris Adams
Once upon a time, Forrest Christian (List Account)  said:
> I would submit that the proper use of a GPS receiver is for alignment
> of the start of the second to a more precise value than can be
> distributed across an asymmetric network like the Internet.  The
> actual 'time label' for that second doesn't necessarily need to come
> from GPS at all.  For security reasons, it's probably a good thing to
> make sure you validate the data received from GPS in any case.

If you don't trust the GPS receiver's idea of the time, why do you trust
its start of the second?  It seems really odd to trust one and not the
other.
-- 
Chris Adams 


Re: Cost effective time servers

2019-06-24 Thread Forrest Christian (List Account)
I would submit that the proper use of a GPS receiver is for alignment
of the start of the second to a more precise value than can be
distributed across an asymmetric network like the Internet.  The
actual 'time label' for that second doesn't necessarily need to come
from GPS at all.  For security reasons, it's probably a good thing to
make sure you validate the data received from GPS in any case.


On Mon, Jun 24, 2019 at 8:23 PM Chris Adams  wrote:
>
> Once upon a time, Jay Hennigan  said:
> > The data from GPS includes the offset value from UTC for leap-second
> > correction. This should be easily included in your time calculation.
>
> Not only that, but at least some GPS receivers/protocols notify of
> pending leap seconds, so software can properly distribute the
> notification in advance.
>
> --
> Chris Adams 



-- 
- Forrest


Re: Cost effective time servers

2019-06-24 Thread Chris Adams
Once upon a time, Jay Hennigan  said:
> The data from GPS includes the offset value from UTC for leap-second
> correction. This should be easily included in your time calculation.

Not only that, but at least some GPS receivers/protocols notify of
pending leap seconds, so software can properly distribute the
notification in advance.

-- 
Chris Adams 


Re: Cost effective time servers

2019-06-24 Thread Jay Hennigan

On 6/21/19 07:57, Quan Zhou wrote:
Yep, went through the same route until I figured out that GPS time is a 
bit ahead of UTC.


The data from GPS includes the offset value from UTC for leap-second 
correction. This should be easily included in your time calculation. 
It's presently 18 seconds.


--
Jay Hennigan - j...@west.net
Network Engineering - CCIE #7880
503 897-8550 - WB6RDV


Re: CloudFlare issues?

2019-06-24 Thread Jared Mauch



> On Jun 24, 2019, at 9:39 PM, Ross Tajvar  wrote:
> 
> 
> On Mon, Jun 24, 2019 at 9:01 PM Jared Mauch  wrote:
> >
> > > On Jun 24, 2019, at 8:50 PM, Ross Tajvar  wrote:
> > >
> > > Maybe I'm in the minority here, but I have higher standards for a T1 than 
> > > any of the other players involved. Clearly several entities failed to do 
> > > what they should have done, but Verizon is not a small or inexperienced 
> > > operation. Taking 8+ hours to respond to a critical operational problem 
> > > is what stood out to me as unacceptable.
> >
> > Are you talking about a press response or a technical one?  The impacts I 
> > saw were for around 2h or so based on monitoring I’ve had up since 2007.  
> > Not great but far from the worst as Tom mentioned.  I’ve seen people cease 
> > to announce IP space we reclaimed from them for months (or years) because 
> > of stale config.  I’ve also seen routes come back from the dead because 
> > they were pinned to an interface that was down for 2 years but never fully 
> > cleaned up.  (Then the telco looped the circuit, interface came up, route 
> > in table, announced globally — bad day all around).
> >
> 
> A technical one - see below from CF's blog post:
> "It is unfortunate that while we tried both e-mail and phone calls to reach 
> out to Verizon, at the time of writing this article (over 8 hours after the 
> incident), we have not heard back from them, nor are we aware of them taking 
> action to resolve the issue.”

I don’t know if CF is a customer (or not) of VZ, but it’s likely easy enough to 
find with a looking glass somewhere, but they were perhaps a few of the 20k 
prefixes impacted (as reported by others).

We have heard from them and not a lot of the other people, but most of them 
likely don’t do business with VZ directly.  I’m not sure VZ is going to contact 
them all or has the capability to respond to them all (or respond to 
non-customers except via a press release).

> > > And really - does it matter if the protection *was* there but something 
> > > broke it? I don't think it does. Ultimately, Verizon failed implement 
> > > correct protections on their network. And then failed to respond when it 
> > > became a problem.
> >
> > I think it does matter.  As I said in my other reply, people do things like 
> > drop ACLs to debug.  Perhaps that’s unsafe, but it is something you do to 
> > debug.  Not knowing what happened, I dunno.  It is also 2019 so I hold 
> > networks to a higher standard than I did in 2009 or 1999.
> >
> 
> Dropping an ACL is fine, but then you have to clean it up when you're done. 
> Your customers don't care that you almost didn't have an outage because you 
> almost did your job right. Yeah, there's a difference between not following 
> policy and not having a policy, but neither one is acceptable behavior from a 
> T1 imo. If it's that easy to cause an outage by not following policy, then I 
> argue that the policy should be better, or something should be better - 
> monitoring, automation, sanity checks. etc. There are lots of ways to solve 
> that problem. And in 2019 I really think there's no excuse for a T1 not to be 
> doing that kind of thing.

I don’t know about the outage (other than what I observed).  I offered some 
suggestions for people to help prevent it from happening, so I’ll leave it 
there.  We all make mistakes, I’ve been part of many and I’m sure that list 
isn’t yet complete.

- Jared

Re: CloudFlare issues?

2019-06-24 Thread Ross Tajvar
On Mon, Jun 24, 2019 at 9:01 PM Jared Mauch  wrote:
>
> > On Jun 24, 2019, at 8:50 PM, Ross Tajvar  wrote:
> >
> > Maybe I'm in the minority here, but I have higher standards for a T1
than any of the other players involved. Clearly several entities failed to
do what they should have done, but Verizon is not a small or inexperienced
operation. Taking 8+ hours to respond to a critical operational problem is
what stood out to me as unacceptable.
>
> Are you talking about a press response or a technical one?  The impacts I
saw were for around 2h or so based on monitoring I’ve had up since 2007.
Not great but far from the worst as Tom mentioned.  I’ve seen people cease
to announce IP space we reclaimed from them for months (or years) because
of stale config.  I’ve also seen routes come back from the dead because
they were pinned to an interface that was down for 2 years but never fully
cleaned up.  (Then the telco looped the circuit, interface came up, route
in table, announced globally — bad day all around).
>

A technical one - see below from CF's blog post:
"It is unfortunate that while we tried both e-mail and phone calls to reach
out to Verizon, at the time of writing this article (over 8 hours after the
incident), we have not heard back from them, nor are we aware of them
taking action to resolve the issue."

> > And really - does it matter if the protection *was* there but something
broke it? I don't think it does. Ultimately, Verizon failed implement
correct protections on their network. And then failed to respond when it
became a problem.
>
> I think it does matter.  As I said in my other reply, people do things
like drop ACLs to debug.  Perhaps that’s unsafe, but it is something you do
to debug.  Not knowing what happened, I dunno.  It is also 2019 so I hold
networks to a higher standard than I did in 2009 or 1999.
>

Dropping an ACL is fine, but then you have to clean it up when you're done.
Your customers don't care that you *almost* didn't have an outage because
you *almost* did your job right. Yeah, there's a difference between not
following policy and not having a policy, but neither one is acceptable
behavior from a T1 imo. If it's that easy to cause an outage by not
following policy, then I argue that the policy should be better, or *something
*should be better - monitoring, automation, sanity checks. etc. There are
lots of ways to solve that problem. And in 2019 I really think there's no
excuse for a T1 not to be doing that kind of thing.

> - Jared


Re: CloudFlare issues?

2019-06-24 Thread Jared Mauch



> On Jun 24, 2019, at 8:50 PM, Ross Tajvar  wrote:
> 
> Maybe I'm in the minority here, but I have higher standards for a T1 than any 
> of the other players involved. Clearly several entities failed to do what 
> they should have done, but Verizon is not a small or inexperienced operation. 
> Taking 8+ hours to respond to a critical operational problem is what stood 
> out to me as unacceptable.

Are you talking about a press response or a technical one?  The impacts I saw 
were for around 2h or so based on monitoring I’ve had up since 2007.  Not great 
but far from the worst as Tom mentioned.  I’ve seen people cease to announce IP 
space we reclaimed from them for months (or years) because of stale config.  
I’ve also seen routes come back from the dead because they were pinned to an 
interface that was down for 2 years but never fully cleaned up.  (Then the 
telco looped the circuit, interface came up, route in table, announced globally 
— bad day all around).

> And really - does it matter if the protection *was* there but something broke 
> it? I don't think it does. Ultimately, Verizon failed implement correct 
> protections on their network. And then failed to respond when it became a 
> problem.

I think it does matter.  As I said in my other reply, people do things like 
drop ACLs to debug.  Perhaps that’s unsafe, but it is something you do to 
debug.  Not knowing what happened, I dunno.  It is also 2019 so I hold networks 
to a higher standard than I did in 2009 or 1999.

- Jared

Re: CloudFlare issues?

2019-06-24 Thread Jared Mauch



> On Jun 24, 2019, at 8:03 PM, Tom Beecher  wrote:
> 
> Disclaimer : I am a Verizon employee via the Yahoo acquisition. I do not work 
> on 701.  My comments are my own opinions only. 
> 
> Respectfully, I believe Cloudflare’s public comments today have been a real 
> disservice. This blog post, and your CEO on Twitter today, took every 
> opportunity to say “DAMN THOSE MORONS AT 701!”. They’re not. 

I presume that seeing a CF blog post isn’t regular for you. :-). — please read 
on

> You are 100% right that 701 should have had some sort of protection mechanism 
> in place to prevent this. But do we know they didn’t? Do we know it was there 
> and just setup wrong? Did another change at another time break what was 
> there? I used 701 many  jobs ago and they absolutely had filtering in place; 
> it saved my bacon when I screwed up once and started readvertising a full 
> table from a 2nd provider. They smacked my session down an I got a nice call 
> about it. 
> 
> You guys have repeatedly accused them of being dumb without even speaking to 
> anyone yet from the sounds of it. Shouldn’t we be working on facts? 
> 
> Should they have been easier to reach once an issue was detected? Probably. 
> They’re certainly not the first vendor to have a slow response time though. 
> Seems like when an APAC carrier takes 18 hours to get back to us, we write it 
> off as the cost of doing business. 
> 
> It also would have been nice, in my opinion, to take a harder stance on the 
> BGP optimizer that generated he bogus routes, and the steel company that 
> failed BGP 101 and just gladly reannounced one upstream to another. 701 is 
> culpable for their mistakes, but there doesn’t seem like there is much 
> appetite to shame the other contributors. 
> 
> You’re right to use this as a lever to push for proper filtering , RPKI, best 
> practices. I’m 100% behind that. We can all be a hell of a lot better at what 
> we do. This stuff happens more than it should, but less than it could. 
> 
> But this industry is one big ass glass house. What’s that thing about stones 
> again? 

I’m careful to not talk about the people impacted.  There were a lot of people 
impacted, roughly 3-4% of the IP space was impacted today and I personally 
heard from more providers than can be counted on a single hand about their 
impact.

Not everyone is going to write about their business impact in public.  I’m not 
authorized to speak for my employer about any impacts that we may have had (for 
example) but if there was impact to 3-4% of IP space, statistically speaking 
there’s always a chance someone was impacted.

I do agree about the glass house thing.  There’s a lot of blame to go around, 
and today I’ve been quoting “go read _normal accidents_” to people.  It’s 
because sufficiently complex systems tend to have complex failures where 
numerous safety systems or controls were bypassed.  Those of us with more than 
a few days of experience likely know what some of them are, we also don’t know 
if those safety systems were disabled as part of debugging by one or more 
parties.  Who hasn’t dropped an ACL to debug why it isn’t working, or if that 
fixed the problem?

I don’t know what happened, but I sure know the symptoms and sets of fixes that 
the industry should apply and enforce.  I have been communicating some of them 
in public and many of them in private today, including offering help to other 
operators with how to implement some of the fixes.

It’s a bad day when someone changes your /16 to two /17’s and sends them out 
regardless of if the packets flow through or not.  These things aren’t new, nor 
do I expect things to be significantly better tomorrow either.  I know people 
at VZ and suspect once they woke up they did something about it.  I also know 
how hard it is to contact someone you don’t have a business relationship with.  
A number of the larger providers have no way for a non-customer to phone, 
message or open a ticket online about problems they may have.  Who knows, their 
ticket system may be in the cloud and was also impacted.

What I do know is that if 3-4% of the home/structures were flooded or 
temporarily unusable because of some form of disaster or evacuation, people 
would be proposing better engineering methods or inspection techniques for 
these structures.

If you are a small network and just point default, there is nothing for you to 
see here and nothing that you can do.  If you speak BGP with your upstream, you 
can filter out some of the bad routes.  You perhaps know that 1239, 3356 and 
others should only be seen directly from a network like 701 and can apply 
filters of this sort to prevent from accepting those more specifics.  I don’t 
believe it’s just 174 that the routes went to, but they were one of the 
networks aside from 701 where I saw paths from today.

(Now the part where you as a 3rd party to this event can help!)

If you peer, build some pre-flight and post-flight scripts to check how many

Re: CloudFlare issues?

2019-06-24 Thread Ross Tajvar
Maybe I'm in the minority here, but I have higher standards for a T1 than
any of the other players involved. Clearly several entities failed to do
what they should have done, but Verizon is not a small or inexperienced
operation. Taking 8+ hours to respond to a critical operational problem is
what stood out to me as unacceptable.

And really - does it matter if the protection *was* there but something
broke it? I don't think it does. Ultimately, Verizon failed implement
correct protections on their network. And then failed to respond when it
became a problem.

On Mon, Jun 24, 2019, 8:06 PM Tom Beecher  wrote:

> Disclaimer : I am a Verizon employee via the Yahoo acquisition. I do not
> work on 701.  My comments are my own opinions only.
>
> Respectfully, I believe Cloudflare’s public comments today have been a
> real disservice. This blog post, and your CEO on Twitter today, took every
> opportunity to say “DAMN THOSE MORONS AT 701!”. They’re not.
>
> You are 100% right that 701 should have had some sort of protection
> mechanism in place to prevent this. But do we know they didn’t? Do we know
> it was there and just setup wrong? Did another change at another time break
> what was there? I used 701 many  jobs ago and they absolutely had filtering
> in place; it saved my bacon when I screwed up once and started
> readvertising a full table from a 2nd provider. They smacked my session
> down an I got a nice call about it.
>
> You guys have repeatedly accused them of being dumb without even speaking
> to anyone yet from the sounds of it. Shouldn’t we be working on facts?
>
> Should they have been easier to reach once an issue was detected?
> Probably. They’re certainly not the first vendor to have a slow response
> time though. Seems like when an APAC carrier takes 18 hours to get back to
> us, we write it off as the cost of doing business.
>
> It also would have been nice, in my opinion, to take a harder stance on
> the BGP optimizer that generated he bogus routes, and the steel company
> that failed BGP 101 and just gladly reannounced one upstream to another.
> 701 is culpable for their mistakes, but there doesn’t seem like there is
> much appetite to shame the other contributors.
>
> You’re right to use this as a lever to push for proper filtering , RPKI,
> best practices. I’m 100% behind that. We can all be a hell of a lot better
> at what we do. This stuff happens more than it should, but less than it
> could.
>
> But this industry is one big ass glass house. What’s that thing about
> stones again?
>
> On Mon, Jun 24, 2019 at 18:06 Justin Paine via NANOG 
> wrote:
>
>> FYI for the group -- we just published this:
>> https://blog.cloudflare.com/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/
>>
>>
>> _
>> *Justin Paine*
>> Director of Trust & Safety
>> PGP: BBAA 6BCE 3305 7FD6 6452 7115 57B6 0114 DE0B 314D
>> 101 Townsend St., San Francisco, CA 94107
>> 
>>
>>
>>
>> On Mon, Jun 24, 2019 at 2:25 PM Mark Tinka  wrote:
>>
>>>
>>>
>>> On 24/Jun/19 18:09, Pavel Lunin wrote:
>>>
>>> >
>>> > Hehe, I haven't seen this text before. Can't agree more.
>>> >
>>> > Get your tie back on Job, nobody listened again.
>>> >
>>> > More seriously, I see no difference between prefix hijacking and the
>>> > so called bgp optimisation based on completely fake announces on
>>> > behalf of other people.
>>> >
>>> > If ever your upstream or any other party who your company pays money
>>> > to does this dirty thing, now it's just the right moment to go explain
>>> > them that you consider this dangerous for your business and are
>>> > looking for better partners among those who know how to run internet
>>> > without breaking it.
>>>
>>> We struggled with a number of networks using these over eBGP sessions
>>> they had with networks that shared their routing data with BGPmon. It
>>> sent off all sorts of alarms, and troubleshooting it was hard when a
>>> network thinks you are de-aggregating massively, and yet you know you
>>> aren't.
>>>
>>> Each case took nearly 3 weeks to figure out.
>>>
>>> BGP optimizers are the bane of my existence.
>>>
>>> Mark.
>>>
>>>


Re: CloudFlare issues?

2019-06-24 Thread James Jun
On Mon, Jun 24, 2019 at 08:03:26PM -0400, Tom Beecher wrote:
> 
> You are 100% right that 701 should have had some sort of protection
> mechanism in place to prevent this. But do we know they didn???t? Do we know
> it was there and just setup wrong? Did another change at another time break
> what was there? I used 701 many  jobs ago and they absolutely had filtering
> in place; it saved my bacon when I screwed up once and started
> readvertising a full table from a 2nd provider. They smacked my session
> down an I got a nice call about it.

In my past (and current) dealings with AS701, I do agree that they have 
generally
been good about filtering customer sessions and running a tight ship.  But, 
manual
config changes being what they are, I suppose an honest mistake or oversight 
issue
had occurred at 701 today that made them contribute significantly to today's 
outage.


> 
> It also would have been nice, in my opinion, to take a harder stance on the
> BGP optimizer that generated he bogus routes, and the steel company that
> failed BGP 101 and just gladly reannounced one upstream to another. 701 is
> culpable for their mistakes, but there doesn???t seem like there is much
> appetite to shame the other contributors.

I think the biggest question to be asked here -- why the hell is a BGP optimizer
(Noction in this case) injecting fake more specifics to steer traffic?  And why 
did a
regional provider providing IP transit (DQE), use such a dangerous 
accident-waiting-to-
happen tool in their network, especially when they have other ASNs taking 
transit
feeds from them, with all these fake man-in-the-middle routes being injected?

I get that BGP optimizers can have some use cases, but IMO, in most of the 
situations,
(especially if you are a network provider selling transit and taking peering 
yourself)
a well crafted routing policy and interconnection strategy eliminates the need 
for 
implementing flawed route selection optimizers in your network.

The notion of BGP Optimizer generating fake more specifics is absurd, and is 
definitely
not a tool that is designed to "fail -> safe".  Instead of failing safe, it has 
failed
epically and catastrophically today.  I remember long time ago, when Internap 
used
to sell their FCP product, Internap SE were advising the customer to make 
appropriate
adjustments to local-preference to prefer the FCP generated routes to ensure 
optimal
selection.  That is a much more sane design choice, than injecting 
man-in-the-middle
attacks and relying on customers to prevent a disaster.

Any time I have a sit down with any engineer who "outsources" responsibility of 
maintaining robustness principle onto their customer, it makes me want to puke.

James


Re: CloudFlare issues?

2019-06-24 Thread Scott Weeks


--- beec...@beecher.cc wrote:
From: Tom Beecher 

:: Shouldn’t we be working on facts?

Nah, this is NANOG...  >;-)



:: But this industry is one big ass glass house. What’s that 
:: thing about stones again?

We all have broken windows?


:)
scott

Re: CloudFlare issues?

2019-06-24 Thread Tom Beecher
Disclaimer : I am a Verizon employee via the Yahoo acquisition. I do not
work on 701.  My comments are my own opinions only.

Respectfully, I believe Cloudflare’s public comments today have been a real
disservice. This blog post, and your CEO on Twitter today, took every
opportunity to say “DAMN THOSE MORONS AT 701!”. They’re not.

You are 100% right that 701 should have had some sort of protection
mechanism in place to prevent this. But do we know they didn’t? Do we know
it was there and just setup wrong? Did another change at another time break
what was there? I used 701 many  jobs ago and they absolutely had filtering
in place; it saved my bacon when I screwed up once and started
readvertising a full table from a 2nd provider. They smacked my session
down an I got a nice call about it.

You guys have repeatedly accused them of being dumb without even speaking
to anyone yet from the sounds of it. Shouldn’t we be working on facts?

Should they have been easier to reach once an issue was detected? Probably.
They’re certainly not the first vendor to have a slow response time though.
Seems like when an APAC carrier takes 18 hours to get back to us, we write
it off as the cost of doing business.

It also would have been nice, in my opinion, to take a harder stance on the
BGP optimizer that generated he bogus routes, and the steel company that
failed BGP 101 and just gladly reannounced one upstream to another. 701 is
culpable for their mistakes, but there doesn’t seem like there is much
appetite to shame the other contributors.

You’re right to use this as a lever to push for proper filtering , RPKI,
best practices. I’m 100% behind that. We can all be a hell of a lot better
at what we do. This stuff happens more than it should, but less than it
could.

But this industry is one big ass glass house. What’s that thing about
stones again?

On Mon, Jun 24, 2019 at 18:06 Justin Paine via NANOG 
wrote:

> FYI for the group -- we just published this:
> https://blog.cloudflare.com/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/
>
>
> _
> *Justin Paine*
> Director of Trust & Safety
> PGP: BBAA 6BCE 3305 7FD6 6452 7115 57B6 0114 DE0B 314D
> 101 Townsend St., San Francisco, CA 94107
> 
>
>
>
> On Mon, Jun 24, 2019 at 2:25 PM Mark Tinka  wrote:
>
>>
>>
>> On 24/Jun/19 18:09, Pavel Lunin wrote:
>>
>> >
>> > Hehe, I haven't seen this text before. Can't agree more.
>> >
>> > Get your tie back on Job, nobody listened again.
>> >
>> > More seriously, I see no difference between prefix hijacking and the
>> > so called bgp optimisation based on completely fake announces on
>> > behalf of other people.
>> >
>> > If ever your upstream or any other party who your company pays money
>> > to does this dirty thing, now it's just the right moment to go explain
>> > them that you consider this dangerous for your business and are
>> > looking for better partners among those who know how to run internet
>> > without breaking it.
>>
>> We struggled with a number of networks using these over eBGP sessions
>> they had with networks that shared their routing data with BGPmon. It
>> sent off all sorts of alarms, and troubleshooting it was hard when a
>> network thinks you are de-aggregating massively, and yet you know you
>> aren't.
>>
>> Each case took nearly 3 weeks to figure out.
>>
>> BGP optimizers are the bane of my existence.
>>
>> Mark.
>>
>>


Re: CloudFlare issues?

2019-06-24 Thread Justin Paine via NANOG
FYI for the group -- we just published this:
https://blog.cloudflare.com/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/


_
*Justin Paine*
Director of Trust & Safety
PGP: BBAA 6BCE 3305 7FD6 6452 7115 57B6 0114 DE0B 314D
101 Townsend St., San Francisco, CA 94107



On Mon, Jun 24, 2019 at 2:25 PM Mark Tinka  wrote:

>
>
> On 24/Jun/19 18:09, Pavel Lunin wrote:
>
> >
> > Hehe, I haven't seen this text before. Can't agree more.
> >
> > Get your tie back on Job, nobody listened again.
> >
> > More seriously, I see no difference between prefix hijacking and the
> > so called bgp optimisation based on completely fake announces on
> > behalf of other people.
> >
> > If ever your upstream or any other party who your company pays money
> > to does this dirty thing, now it's just the right moment to go explain
> > them that you consider this dangerous for your business and are
> > looking for better partners among those who know how to run internet
> > without breaking it.
>
> We struggled with a number of networks using these over eBGP sessions
> they had with networks that shared their routing data with BGPmon. It
> sent off all sorts of alarms, and troubleshooting it was hard when a
> network thinks you are de-aggregating massively, and yet you know you
> aren't.
>
> Each case took nearly 3 weeks to figure out.
>
> BGP optimizers are the bane of my existence.
>
> Mark.
>
>


Re: CloudFlare issues?

2019-06-24 Thread Mark Tinka



On 24/Jun/19 18:09, Pavel Lunin wrote:

>
> Hehe, I haven't seen this text before. Can't agree more.
>
> Get your tie back on Job, nobody listened again.
>
> More seriously, I see no difference between prefix hijacking and the
> so called bgp optimisation based on completely fake announces on
> behalf of other people.
>
> If ever your upstream or any other party who your company pays money
> to does this dirty thing, now it's just the right moment to go explain
> them that you consider this dangerous for your business and are
> looking for better partners among those who know how to run internet
> without breaking it.

We struggled with a number of networks using these over eBGP sessions
they had with networks that shared their routing data with BGPmon. It
sent off all sorts of alarms, and troubleshooting it was hard when a
network thinks you are de-aggregating massively, and yet you know you
aren't.

Each case took nearly 3 weeks to figure out.

BGP optimizers are the bane of my existence.

Mark.



Re: CloudFlare issues?

2019-06-24 Thread Fredrik Korsbäck
On 2019-06-24 20:16, Mark Tinka wrote:
> 
> 
> On 24/Jun/19 16:11, Job Snijders wrote:
> 
>>
>> - deploy RPKI based BGP Origin validation (with invalid == reject)
>> - apply maximum prefix limits on all EBGP sessions
>> - ask your router vendor to comply with RFC 8212 ('default deny')
>> - turn off your 'BGP optimizers'
> 
> I cannot over-emphasize the above, especially the BGP optimizers.
> 
> Mark.
> 

+1

https://honestnetworker.net/2019/06/24/leaking-your-optimized-routes-to-stub-networks-that-then-leak-it-to-a-tier1-transit-that-doesnt-filter/



-- 
hugge



Re: Cost effective time servers

2019-06-24 Thread Eric S. Raymond
Patrick :
> On 2019-06-20 20:18, Jay Hennigan wrote:
> > If you want to go really cheap and don't value your time, but do value
> > knowing the correct time, a GPS receiver with a USB interface and a
> > Raspberry Pi would do the trick.
> 
> https://www.ntpsec.org/white-papers/stratum-1-microserver-howto/
> 
> RPi + GPS Hat because time across USB has much jitter.

I wrote that white paper, and a good big chunk of the software in the recipe is 
mine.
The rest is about 25% percent of Dave Mills's reference implementation of NTP.

USB jitter isn't too bad, actually.  Unacceptable if you're doing pgysics 
experiments but
an order of magitude below the expected accuracy of WAN time synchronization.

That said, my recipe *is* better.  And a fun, simple, dirt-cheap build.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: CloudFlare issues?

2019-06-24 Thread Mark Tinka



On 24/Jun/19 16:11, Job Snijders wrote:

>
> - deploy RPKI based BGP Origin validation (with invalid == reject)
> - apply maximum prefix limits on all EBGP sessions
> - ask your router vendor to comply with RFC 8212 ('default deny')
> - turn off your 'BGP optimizers'

I cannot over-emphasize the above, especially the BGP optimizers.

Mark.


Re: Cost effective time servers

2019-06-24 Thread Joe Abley
On 21 Jun 2019, at 10:57, Quan Zhou  wrote:

> Yep, went through the same route until I figured out that GPS time is a bit 
> ahead of UTC.

The clocks on the GPS satellites are set to GPST which I think (I'm not a time 
geek so this is going to make someone cringe) is UTC without leap seconds or 
other corrections relating to rotation of the earth.

However, the messages sent to GPS receivers include the offset between GPST and 
UTC as well as the GPST timestamp. The receivers can use both together to 
obtain a measure of UTC accurate to about 100 nanoseconds.

Seems to me (again, not time geek, stop throwing things) that the use of GPST 
is an internal implementation detail chosen because it's easier to adjust an 
offset that rarely changes than it is to adjust atomic clocks floating in 
space. The system (including the system-internal adjustment of GPST with the 
offset) still produces a reasonably accurate measure of UTC. Also I imagine 
occasional leap seconds causing GPS navigators to jump spontaneously to the 
left which is probably more amusing in my imagination than in real life.


Joe


signature.asc
Description: Message signed with OpenPGP


Re: Cellular backup connections

2019-06-24 Thread Alex Buie
We deploy routers with Verizon LTE failover - for full functionality, make
sure your MTU is 1428 or less, per their specifications.

Here's an example doc from Spirent that talks about it.

https://support.spirent.com/SC_KnowledgeView?Id=FAQ14556


Alex

On Mon, Jun 24, 2019, 7:51 AM Dovid Bender  wrote:

> I am getting the same for SSH and https traffic. It's strange. Where the
> response is something small like:
> 
> Moved to this https://63.XX.XX.XX:443/auth.asp";>location.
> 
> It works But when I try to load pages that are any bigger it fails. Like I
> said before I assume it's either an issue with the MTU or window szie. I
> was just wondering if anyone encountered such an issue before. It's not
> easy getting to someone that knows something. When you have some sort of
> concrete info the level1 techs tend to pass you along faster.
>
>
>
>
>
> On Mon, Jun 24, 2019 at 7:41 AM J. Hellenthal 
> wrote:
>
>> Could be wrong on this but direct SSH on the LTE side may possibly be not
>> allowed(filtered) and might just be something you could discuss in a ticket
>> with Verizon.
>>
>> --
>>  J. Hellenthal
>>
>> The fact that there's a highway to Hell but only a stairway to Heaven
>> says a lot about anticipated traffic volume.
>>
>> On Jun 24, 2019, at 04:50, Dovid Bender  wrote:
>>
>> All,
>>
>> I finally got around to putting in a Verizon LTE connection and the ping
>> times are pretty good. There is the occasional issue however for the most
>> part ping times are < 50 ms. I have another strange issue though. When I
>> try to ssh or connect via the endpoints web interface it fails. If I first
>> connect via PPTP or SSL VPN then it works. I ruled out it being my IP since
>> if I connect direct from the PPTP or SSL VPN box then it fails as well. It
>> seems the tunnel does something (perhaps lowering the MTU or fragmenting
>> packets) that allows it to work. Any thoughts?
>>
>> TIA.
>>
>>
>>
>>
>> On Mon, Feb 4, 2019 at 8:18 AM Dovid Bender  wrote:
>>
>>> Anyone know if Verizon static IP's over LTE have same issue where they
>>> bounce the traffic around before it gets back to the NY metro area?
>>>
>>>
>>>
>>> On Thu, Jan 3, 2019 at 6:46 PM Dovid Bender  wrote:
>>>
 All,

 Thanks for all of the feedback. I was on site today and noticed two
 things.
 1) As someone mentioned it could be for static IP's they have the
 traffic going to a specific location. The POP is in NJ there was a min.
 latency of 120ms which prob had to do with this.
 2) I was watching the ping times and it looked something like this:
 400ms
 360ms
 330ms
 300ms
 260ms
 210ms
 170ms
 140ms
 120ms
 400ms
 375ms

 It seems to have been coming in "waves". I assume this has to do with
 "how cellular work" and the signal. I tried moving it around by putting it
 down low on the floor, moving it locations etc. and saw the same thing
 every time. I am going to try Verizon next and see how it goes.



 On Sat, Dec 29, 2018 at 12:13 PM Mark Milhollan 
 wrote:

> On Fri, 28 Dec 2018, Dovid Bender wrote:
>
> >I finally got around to setting up a cellular backup device in our
> new POP.
>
> >When SSH'ing in remotely the connection seems rather slow.
>
> Perhaps using MOSH can help make the interactive CLI session less
> annoying.
>
> >Verizon they charge $500.00 just to get a public IP and I want to
> avoid
> >that if possible.
>
> You might look into have it call out / maintain a connection back to
> your infrastructure.
>
>
> /mark
>



Re: CloudFlare issues?

2019-06-24 Thread Jaden Roberts
From https://www.cloudflarestatus.com/​:

Identified - We have identified a possible route leak impacting some Cloudflare 
IP ranges and are working with the network involved to resolve this.
Jun 24, 11:36 UTC

Seeing issues in Australia too for some sites that are routing through 
Cloudflare.


[https://info.serversaustralia.com.au/hubfs/Brand-2018/logo-font-sau.gif]
Jaden Roberts
Senior Network Engineer
4 Amy Close, Wyong, NSW 2259
Need assistance? We are here 24/7 +61 2 8115 
[https://app.frontapp.com/api/1/noauth/companies/servers_australia_pty_ltd/seen/msg_3d1x0hx/0/9d9184ad.gif]
On June 24, 2019, 9:06 PM GMT+10 
daknob@gmail.com wrote:

Yes, traffic from Greek networks is routed through NYC 
(alter.net), and previously it had a 60% packet loss. Now 
it’s still via NYC, but no packet loss. This happens in GR-IX Athens, not GR-IX 
Thessaloniki, but the problem definitely exists.

Antonis

On 24 Jun 2019, at 13:55, Dmitry Sherman 
mailto:dmi...@interhost.net>> wrote:

Hello are there any issues with CloudFlare services now?

Dmitry Sherman
dmi...@interhost.net
Interhost Networks Ltd
Web: http://www.interhost.co.il
fb: https://www.facebook.com/InterhostIL
Office: (+972)-(0)74-7029881 Fax: (+972)-(0)53-7976157




Re: Cost effective time servers

2019-06-24 Thread Michael Rathbun
On Thu, 20 Jun 2019 10:39:41 -0400, David Bass 
wrote:

>What are folks using these days for smaller organizations, that need to
>dole out time from an internal source?

If "internal" means a local NTP server independent of external network
resources, the other responses are apposite.

If "internal" means a stratum 2 NTP server dependent upon other servers on
the network, and if you don't need accuracy in single-digit millisecond or
better range, we've done well with an older ex-Windows machine that now
runs FreeBSD and vanilla NTPD, and is now a pool server at ntp.org.  ± six
milliseconds, cost approaches the cube root of zero.

mdr
-- 
   Sometimes half-ass is exactly the right amount of ass.
   -- Wonderella



Re: CloudFlare issues?

2019-06-24 Thread Pavel Lunin
>I'd like to point everyone to an op-ed I wrote on the topic of "BGP
optimizers": >https://seclists.org/nanog/2017/Aug/318

Hehe, I haven't seen this text before. Can't agree more.

Get your tie back on Job, nobody listened again.

More seriously, I see no difference between prefix hijacking and the so called 
bgp optimisation based on completely fake announces on behalf of other people.

If ever your upstream or any other party who your company pays money to does 
this dirty thing, now it's just the right moment to go explain them that you 
consider this dangerous for your business and are looking for better partners 
among those who know how to run internet without breaking it.

Re: Cost effective time servers

2019-06-24 Thread Quan Zhou
Yep, went through the same route until I figured out that GPS time is a 
bit ahead of UTC. We simply use a windows NTP server for internal use at 
work, and I won't recommend doing so, because it went off the rails once 
for a while despite of having several upstream servers pointed to.


Also there's a community out there dedicated for atomic clocks for civil 
use, it'd be fun to have one, set it up, and watch it ticking.


On 6/21/2019 11:44 AM, Andy Ringsmuth wrote:

On Jun 20, 2019, at 10:18 PM, Jay Hennigan  wrote:

On 6/20/19 07:39, David Bass wrote:

What are folks using these days for smaller organizations, that need to dole 
out time from an internal source?

If you want to go really cheap and don't value your time, but do value knowing 
the correct time, a GPS receiver with a USB interface and a Raspberry Pi would 
do the trick.

Not sure how accurate you need, but I just use a Raspberry Pi as a pool.ntp.org 
node. I thought about going the GPS route with it but didn’t want to mess with 
it.


-Andy


Re: Cost effective time servers

2019-06-24 Thread Patrick
On 2019-06-20 20:18, Jay Hennigan wrote:
> If you want to go really cheap and don't value your time, but do value
> knowing the correct time, a GPS receiver with a USB interface and a
> Raspberry Pi would do the trick.

https://www.ntpsec.org/white-papers/stratum-1-microserver-howto/

RPi + GPS Hat because time across USB has much jitter.


Patrick


AWS 40/100G wholesale Express-Route ?

2019-06-24 Thread Jérôme Nicolle
Hello everyone,

I was wondering, is there any way to get more than a 10G port for a PNI
with AWS customers ?

Right now I'm looking at 4 ridiculously expensive X-Cos (on two
locations, so that makes 8) to establish a redundant 40Gbps backhaul,
where I have 40/100G ports available.

How could we deal with that ? Is there an "off-market" offering for
higher speed interconnects ?

Best regards,

-- 
Jérôme Nicolle
+33 6 19 31 27 14



Re: Russian Anal Probing + Malware

2019-06-24 Thread Tom Beecher
I chuckle the most at the original twitter post from Greynoise :

"We have revoked the benign tag for OpenPortStats[.]com"

Did anyone actually think such a thing would be legitimate to start with?
:)

On Mon, Jun 24, 2019 at 12:26 AM Hank Nussbacher 
wrote:

> On 24/06/2019 00:23, Randy Bush wrote:
> > e.g. i am aware of researchers scanning to see patching spread and
> > trying to make a conext paper dreadline this week or infocom next month.
> >
> > hard to tell the sheep from the goats and the wolf from the sheep.  i
> > get the appended.  sheep or wholf?  i sure do not claim to be smart
> > enough to know.  but i sure am glad others are .
> Greynoise can be your friend:
> https://greynoise.io/about
> https://viz.greynoise.io/table
>
> -Hank
>
> >
> > randy
> >
> > ---
>


Re: CloudFlare issues?

2019-06-24 Thread Max Tulyev

24.06.19 19:04, Matthew Walster пише:



On Mon, 24 Jun 2019, 16:28 Max Tulyev, > wrote:


1. Why Cloudflare did not immediately announced all their address space
by /24s? This can put the service up instantly for almost all places
Probably RPKI and that being a really bad idea that takes a long time to 
configure across every device, especially when you're dealing with an 
anycast network.


Good idea is to prepare it and provisioning tools before ;)


2. Why almost all carriers did not filter the leak on their side, but
waited for "a better weather on Mars" for several hours?


Probably most did not notice immediately, or trusted their fellow large 
carrier peers to fix the matter faster than their own change control 
process would accept such a drastic change that had not been fully 
analysed and identified. The duration was actually quite low, on a human 
scale...


Did not notice a lot of calls "I can't access ..."? Really?
OK, then another question. Which time from that calls starts to "people 
who know BGP know about it" is good?


Re: Verizon Routing issue

2019-06-24 Thread Jared Mauch



> On Jun 24, 2019, at 11:12 AM, Max Tulyev  wrote:
> 
> 24.06.19 17:44, Jared Mauch пише:
>>> 1. Why Cloudflare did not immediately announced all their address space by 
>>> /24s? This can put the service up instantly for almost all places.
>> They may not want to pollute the global routing table with these entries.  
>> It has a cost for everyone.  If we all did this, the table would be a mess.
> 
> yes, it is. But it is a working, quick and temporary fix of the problem.

Like many things (eg; ATT had similar issues with 12.0.0.0/8) now there’s a 
bunch of /9’s in the table that will likely never go away.

>>> 2. Why almost all carriers did not filter the leak on their side, but 
>>> waited for "a better weather on Mars" for several hours?
>> There’s several major issues here
>> - Verizon accepted garbage from their customer
>> - Other networks accepted the garbage from Verizon (eg: Cogent)
>> - known best practices from over a decade ago are not applied
> 
> That's it.
> 
> We have several IXes connected, all of them had a correct aggregated route to 
> CF. And there was one upstream distributed leaked more specifics.
> 
> I think 30min maximum is enough to find out a problem and filter out it's 
> source on their side. Almost nobody did it. Why?

I have heard people say “we don’t look for problems”.  This is often the case, 
there is a lack of monitoring/awareness.  I had several systems detect the 
problem, plus things like bgpmon also saw it.

My guess is people that passed this on weren’t monitoring either.  It’s often 
manual procedures vs automated scripts watching things.  Instrumentation of 
your network elements tends to be a small set of people who invest in it.  You 
tend to need some scale for it to make sense, and it also requires people who 
understand the underlying data for what is “odd”.

This is why I’ve had my monitoring system up for the past 12+ years.  It’s 
super simple (dumb) and catches a lot of issues.  I implemented it again for 
the RIPE RIS Live service, but haven’t cut it over to be the primary (realtime) 
monitoring method vs watching route-views.

I think it’s time to do that.

- Jared



Re: CloudFlare issues?

2019-06-24 Thread Christopher Morrow
On Mon, Jun 24, 2019 at 10:41 AM Filip Hruska  wrote:
>
> Verizon is the one who should've noticed something was amiss and dropped
> their customer's BGP session.
> They also should have had filters and prefix count limits in place,
> which would have prevented this whole disaster.
>

oddly VZ used to be quite good about filtering customer seesions :(
there ARE cases where: "customer says they may announce X" and that
doesn't happen along a path expected :( For instance they end up
announcing a path through their other transit to a prefix in the
permitted list on the VZ side :(  it doesn't seem plausible that that
is what was happening here though, I don't expect the duquesne folk to
have customer paths to (for instance) savi moebel in germany...

there are some pretty fun as-paths in the set of ~25k prefixes leaked
(that routeviews saw).


Re: Verizon Routing issue

2019-06-24 Thread Max Tulyev

24.06.19 17:44, Jared Mauch пише:

1. Why Cloudflare did not immediately announced all their address space by 
/24s? This can put the service up instantly for almost all places.

They may not want to pollute the global routing table with these entries.  It 
has a cost for everyone.  If we all did this, the table would be a mess.


yes, it is. But it is a working, quick and temporary fix of the problem.


2. Why almost all carriers did not filter the leak on their side, but waited for "a 
better weather on Mars" for several hours?

There’s several major issues here

- Verizon accepted garbage from their customer
- Other networks accepted the garbage from Verizon (eg: Cogent)
- known best practices from over a decade ago are not applied


That's it.

We have several IXes connected, all of them had a correct aggregated 
route to CF. And there was one upstream distributed leaked more specifics.


I think 30min maximum is enough to find out a problem and filter out 
it's source on their side. Almost nobody did it. Why?


Re: Verizon Routing issue

2019-06-24 Thread Jared Mauch



> On Jun 24, 2019, at 11:00 AM, ML  wrote:
> 
> 
> On 6/24/2019 10:44 AM, Jared Mauch wrote:
>> It was impacting to many networks.  You should filter your transits to 
>> prevent impact from these more specifics.
>> 
>> - Jared
>> 
>> https://twitter.com/jaredmauch/status/1143163212822720513
>> https://twitter.com/JobSnijders/status/1143163271693963266
>> https://puck.nether.net/~jared/blog/?p=208
> 
> 
> $MAJORNET filters between peers make sense but what can a transit customer do 
> to prevent being affected by leaks like this one?

Block routes from 3356 (for example) that don’t go 701_3356_ 701_2914_ 
701_1239_ 

etc (if 701 is your transit and you are multi homed)

Then you won’t accept the more specifics.

If you point default it may not be any help.

- Jared



Re: Verizon Routing issue

2019-06-24 Thread ML



On 6/24/2019 10:44 AM, Jared Mauch wrote:

It was impacting to many networks.  You should filter your transits to prevent 
impact from these more specifics.

- Jared

https://twitter.com/jaredmauch/status/1143163212822720513
https://twitter.com/JobSnijders/status/1143163271693963266
https://puck.nether.net/~jared/blog/?p=208



$MAJORNET filters between peers make sense but what can a transit 
customer do to prevent being affected by leaks like this one?




Verizon Routing issue

2019-06-24 Thread Jared Mauch
(Updating subject line to be accurate)

> On Jun 24, 2019, at 10:28 AM, Max Tulyev  wrote:
> 
> Hi All,
> 
> here in Ukraine we got an impact as well!
> 
> Have two questions:
> 
> 1. Why Cloudflare did not immediately announced all their address space by 
> /24s? This can put the service up instantly for almost all places.

They may not want to pollute the global routing table with these entries.  It 
has a cost for everyone.  If we all did this, the table would be a mess.

> 2. Why almost all carriers did not filter the leak on their side, but waited 
> for "a better weather on Mars" for several hours?

There’s several major issues here

- Verizon accepted garbage from their customer
- Other networks accepted the garbage from Verizon (eg: Cogent)
- known best practices from over a decade ago are not applied

I’m sure reporters will be reaching out to Verizon about this and their 
response time should be noted.

It was impacting to many networks.  You should filter your transits to prevent 
impact from these more specifics.

- Jared

https://twitter.com/jaredmauch/status/1143163212822720513
https://twitter.com/JobSnijders/status/1143163271693963266
https://puck.nether.net/~jared/blog/?p=208



Re: CloudFlare issues?

2019-06-24 Thread Filip Hruska
Verizon is the one who should've noticed something was amiss and dropped 
their customer's BGP session.
They also should have had filters and prefix count limits in place, 
which would have prevented this whole disaster.


As to why any of that didn't happen, who actually knows.

Regards,
Filip

On 6/24/19 4:28 PM, Max Tulyev wrote:
Why almost all carriers did not filter the leak on their side, but 
waited for "a better weather on Mars" for several hours? 


--
Filip Hruska
Linux System Administrator



Re: CloudFlare issues?

2019-06-24 Thread Max Tulyev

Hi All,

here in Ukraine we got an impact as well!

Have two questions:

1. Why Cloudflare did not immediately announced all their address space 
by /24s? This can put the service up instantly for almost all places.


2. Why almost all carriers did not filter the leak on their side, but 
waited for "a better weather on Mars" for several hours?


24.06.19 13:55, Dmitry Sherman пише:

Hello are there any issues with CloudFlare services now?

Dmitry Sherman
dmi...@interhost.net
Interhost Networks Ltd
Web: http://www.interhost.co.il
fb: https://www.facebook.com/InterhostIL
Office: (+972)-(0)74-7029881 Fax: (+972)-(0)53-7976157




Re: CloudFlare issues?

2019-06-24 Thread Andree Toonk
This is what looked happened:

There was a large scale BGP 'leak' incident causing about 20k prefixes
for 2400 network (ASNs) to be rerouted through AS396531 (a steel plant)
and then on to its transit provider: Verizon (AS701) Start time:
10:34:21 (UTC) End time: 12:37  (UTC)
All ASpaths had the following in common:
701 396531 33154


33154 (DQECOM ) is an ISP providing transit to 396531.
396531 is by the looks of it a steel plant. dual homed to 701 and 33154.
701 is verizon and accepted by the looks of it all BGP announcements
from 396531

What appears to have happened is that 33154  those routes were
propagated to 396531, which then send them to Verizon and voila... there
is the full leak at work.
(DQECOM  runs a BGP optimizer (https://www.noction.com/clients/dqe ,
thanks Job for pointing that out, more below)

As a result traffic for 20k prefixes or so was now rerouted through
verizon and 396531 (the steel plant)

We've seen numerous incidents like this in the past
lessons learned:
1) if you do use a BGP optimizer, please FILTER!
2) Verizon... filter your customers, please!


Since the BGP optimizer introduces new more specific routes, a lot of
traffic for high traffic destinations would have been rerouted through
that path, which would have been congested, causing the outages.
There were many cloudflare prefixes affected, but also folks like
Amazon, Akamai, Facebook, Apple, Linode etc.

here's one example for Amazon - CloudFront : 52.84.32.0/22. Normally
announced as a 52.84.32.0/21 but during the incident as a /22 (remember
more specifics always win)
https://stat.ripe.net/52.84.32.0%2F22#tabId=routing&routing_bgplay.ignoreReannouncements=false&routing_bgplay.resource=52.84.32.0/22&routing_bgplay.starttime=1561337999&routing_bgplay.endtime=1561377599&routing_bgplay.rrcs=0,1,2,5,6,7,10,11,13,14,15,16,18,20&routing_bgplay.instant=null&routing_bgplay.type=bgp

RPKI would have worked here (assuming you're strict with the max length)!


Cheers
 Andree


My secret spy satellite informs me that Dmitry Sherman wrote On
2019-06-24, 3:55 AM:
> Hello are there any issues with CloudFlare services now?
>
> Dmitry Sherman
> dmi...@interhost.net
> Interhost Networks Ltd
> Web: http://www.interhost.co.il
> fb: https://www.facebook.com/InterhostIL
> Office: (+972)-(0)74-7029881 Fax: (+972)-(0)53-7976157
>



Re: CloudFlare issues?

2019-06-24 Thread Job Snijders
On Mon, Jun 24, 2019 at 08:18:27AM -0400, Tom Paseka via NANOG wrote:
> a Verizon downstream BGP customer is leaking the full table, and some more
> specific from us and many other providers.

It appears that one of the implicated ASNs, AS 33154 "DQE Communications
LLC" is listed as customer on Noction's website:
https://www.noction.com/clients/dqe

I suspect AS 33154's customer AS 396531 turned up a new circuit with
Verizon, but didn't have routing policies to prevent sending routes from
33154 to 701 and vice versa, or their router didn't have support for RFC
8212.

I'd like to point everyone to an op-ed I wrote on the topic of "BGP
optimizers": https://seclists.org/nanog/2017/Aug/318

So in summary, I believe the following happened:

- 33154 generated fake more-specifics, which are not visible in the DFZ
- 33154 announces those fake more-specifics to at least one customer 
(396531)
- this customer (396531) propagated to to another upstream provider (701)
- it appears that 701 did not sufficient prefix filtering, or a 
maximum-prefix limit

While it is easy to point at the alleged BGP optimizer as the root
cause, I do think we now have observed a cascading catastrophic failure
both in process and technologies. Here are some recommendations that all
of us can apply, that may have helped dampen the negative effects:

- deploy RPKI based BGP Origin validation (with invalid == reject)
- apply maximum prefix limits on all EBGP sessions
- ask your router vendor to comply with RFC 8212 ('default deny')
- turn off your 'BGP optimizers'

I suspect we, collectively, suffered significant financial damage in
this incident.

Kind regards,

Job


Re: Cellular backup connections

2019-06-24 Thread Mel Beckman
I ran into this problem and Verizon told me that they filter ports 22 and 23 to 
help stem the tide of IoT attacks on their networks by cellular-connected phone 
and alarm systems. They said their operational model assumes that all traffic 
will be encrypted via either SSLVPN or IPSec. I’m using IPSec tuned for low 
traffic volume (i.e., keepalive disabled), and it’s working well for OBM.

 -mel

On Jun 24, 2019, at 4:50 AM, Dovid Bender 
mailto:do...@telecurve.com>> wrote:

I am getting the same for SSH and https traffic. It's strange. Where the 
response is something small like:

Moved to this https://63.XX.XX.XX:443/auth.asp";>location.

It works But when I try to load pages that are any bigger it fails. Like I said 
before I assume it's either an issue with the MTU or window szie. I was just 
wondering if anyone encountered such an issue before. It's not easy getting to 
someone that knows something. When you have some sort of concrete info the 
level1 techs tend to pass you along faster.





On Mon, Jun 24, 2019 at 7:41 AM J. Hellenthal 
mailto:jhellent...@dataix.net>> wrote:
Could be wrong on this but direct SSH on the LTE side may possibly be not 
allowed(filtered) and might just be something you could discuss in a ticket 
with Verizon.

--
 J. Hellenthal

The fact that there's a highway to Hell but only a stairway to Heaven says a 
lot about anticipated traffic volume.

On Jun 24, 2019, at 04:50, Dovid Bender 
mailto:do...@telecurve.com>> wrote:

All,

I finally got around to putting in a Verizon LTE connection and the ping times 
are pretty good. There is the occasional issue however for the most part ping 
times are < 50 ms. I have another strange issue though. When I try to ssh or 
connect via the endpoints web interface it fails. If I first connect via PPTP 
or SSL VPN then it works. I ruled out it being my IP since if I connect direct 
from the PPTP or SSL VPN box then it fails as well. It seems the tunnel does 
something (perhaps lowering the MTU or fragmenting packets) that allows it to 
work. Any thoughts?

TIA.




On Mon, Feb 4, 2019 at 8:18 AM Dovid Bender 
mailto:do...@telecurve.com>> wrote:
Anyone know if Verizon static IP's over LTE have same issue where they bounce 
the traffic around before it gets back to the NY metro area?



On Thu, Jan 3, 2019 at 6:46 PM Dovid Bender 
mailto:do...@telecurve.com>> wrote:
All,

Thanks for all of the feedback. I was on site today and noticed two things.
1) As someone mentioned it could be for static IP's they have the traffic going 
to a specific location. The POP is in NJ there was a min. latency of 120ms 
which prob had to do with this.
2) I was watching the ping times and it looked something like this:
400ms
360ms
330ms
300ms
260ms
210ms
170ms
140ms
120ms
400ms
375ms

It seems to have been coming in "waves". I assume this has to do with "how 
cellular work" and the signal. I tried moving it around by putting it down low 
on the floor, moving it locations etc. and saw the same thing every time. I am 
going to try Verizon next and see how it goes.



On Sat, Dec 29, 2018 at 12:13 PM Mark Milhollan 
mailto:m...@pixelgate.net>> wrote:
On Fri, 28 Dec 2018, Dovid Bender wrote:

>I finally got around to setting up a cellular backup device in our new POP.

>When SSH'ing in remotely the connection seems rather slow.

Perhaps using MOSH can help make the interactive CLI session less
annoying.

>Verizon they charge $500.00 just to get a public IP and I want to avoid
>that if possible.

You might look into have it call out / maintain a connection back to
your infrastructure.


/mark


Re: CloudFlare issues?

2019-06-24 Thread Robbie Trencheny
This is my final update, I’m going back to bed, wake me up when the
internet is working again.

https://news.ycombinator.com/item?id=20262316

——

1230 UTC update We are working with networks around the world and are
observing network routes for Google and AWS being leaked at well.

On Mon, Jun 24, 2019 at 05:20 Robbie Trencheny  wrote:

> *1204 UTC update* This leak is wider spread that just Cloudflare.
>
> *1208 UTC update* Amazon Web Services now reporting external networking
> problem
>
> On Mon, Jun 24, 2019 at 05:18 Tom Paseka  wrote:
>
>> a Verizon downstream BGP customer is leaking the full table, and some
>> more specific from us and many other providers.
>>
>> On Mon, Jun 24, 2019 at 7:56 AM Robbie Trencheny  wrote:
>>
>>> *1147 UTC update* Staring at internal graphs looks like global traffic
>>> is now at 97% of expected so impact lessening.
>>>
>>> On Mon, Jun 24, 2019 at 04:51 Dovid Bender  wrote:
>>>
 We are seeing issues as well getting to HE. The traffic is going via
 Alter.



 On Mon, Jun 24, 2019 at 7:48 AM Robbie Trencheny  wrote:

> From John Graham-Cumming, CTO of Cloudflare, on Hacker News right now:
>
> This appears to be a routing problem with Level3. All our systems are
> running normally but traffic isn't getting to us for a portion of our
> domains.
>
> 1128 UTC update Looks like we're dealing with a route leak and we're
> talking directly with the leaker and Level3 at the moment.
>
> 1131 UTC update Just to be clear this isn't affecting all our traffic
> or all our domains or all countries. A portion of traffic isn't hitting
> Cloudflare. Looks to be about an aggregate 10% drop in traffic to us.
>
> 1134 UTC update We are now certain we are dealing with a route leak.
>
> On Mon, Jun 24, 2019 at 04:04 Antonios Chariton 
> wrote:
>
>> Yes, traffic from Greek networks is routed through NYC (alter.net),
>> and previously it had a 60% packet loss. Now it’s still via NYC, but no
>> packet loss. This happens in GR-IX Athens, not GR-IX Thessaloniki, but 
>> the
>> problem definitely exists.
>>
>> Antonis
>>
>>
>> On 24 Jun 2019, at 13:55, Dmitry Sherman 
>> wrote:
>>
>> Hello are there any issues with CloudFlare services now?
>>
>> Dmitry Sherman
>> dmi...@interhost.net
>> Interhost Networks Ltd
>> Web: http://www.interhost.co.il
>> fb: https://www.facebook.com/InterhostIL
>> Office: (+972)-(0)74-7029881 Fax: (+972)-(0)53-7976157
>>
>>
>> --
> --
> Robbie Trencheny (@robbie )
> 925-884-3728
> robbie.io
>
 --
>>> --
>>> Robbie Trencheny (@robbie )
>>> 925-884-3728
>>> robbie.io
>>>
>> --
> --
> Robbie Trencheny (@robbie )
> 925-884-3728
> robbie.io
>
-- 
--
Robbie Trencheny (@robbie )
925-884-3728
robbie.io


Re: CloudFlare issues?

2019-06-24 Thread Robbie Trencheny
*1204 UTC update* This leak is wider spread that just Cloudflare.

*1208 UTC update* Amazon Web Services now reporting external networking
problem

On Mon, Jun 24, 2019 at 05:18 Tom Paseka  wrote:

> a Verizon downstream BGP customer is leaking the full table, and some more
> specific from us and many other providers.
>
> On Mon, Jun 24, 2019 at 7:56 AM Robbie Trencheny  wrote:
>
>> *1147 UTC update* Staring at internal graphs looks like global traffic
>> is now at 97% of expected so impact lessening.
>>
>> On Mon, Jun 24, 2019 at 04:51 Dovid Bender  wrote:
>>
>>> We are seeing issues as well getting to HE. The traffic is going via
>>> Alter.
>>>
>>>
>>>
>>> On Mon, Jun 24, 2019 at 7:48 AM Robbie Trencheny  wrote:
>>>
 From John Graham-Cumming, CTO of Cloudflare, on Hacker News right now:

 This appears to be a routing problem with Level3. All our systems are
 running normally but traffic isn't getting to us for a portion of our
 domains.

 1128 UTC update Looks like we're dealing with a route leak and we're
 talking directly with the leaker and Level3 at the moment.

 1131 UTC update Just to be clear this isn't affecting all our traffic
 or all our domains or all countries. A portion of traffic isn't hitting
 Cloudflare. Looks to be about an aggregate 10% drop in traffic to us.

 1134 UTC update We are now certain we are dealing with a route leak.

 On Mon, Jun 24, 2019 at 04:04 Antonios Chariton 
 wrote:

> Yes, traffic from Greek networks is routed through NYC (alter.net),
> and previously it had a 60% packet loss. Now it’s still via NYC, but no
> packet loss. This happens in GR-IX Athens, not GR-IX Thessaloniki, but the
> problem definitely exists.
>
> Antonis
>
>
> On 24 Jun 2019, at 13:55, Dmitry Sherman  wrote:
>
> Hello are there any issues with CloudFlare services now?
>
> Dmitry Sherman
> dmi...@interhost.net
> Interhost Networks Ltd
> Web: http://www.interhost.co.il
> fb: https://www.facebook.com/InterhostIL
> Office: (+972)-(0)74-7029881 Fax: (+972)-(0)53-7976157
>
>
> --
 --
 Robbie Trencheny (@robbie )
 925-884-3728
 robbie.io

>>> --
>> --
>> Robbie Trencheny (@robbie )
>> 925-884-3728
>> robbie.io
>>
> --
--
Robbie Trencheny (@robbie )
925-884-3728
robbie.io


Re: CloudFlare issues?

2019-06-24 Thread Tom Paseka via NANOG
a Verizon downstream BGP customer is leaking the full table, and some more
specific from us and many other providers.

On Mon, Jun 24, 2019 at 7:56 AM Robbie Trencheny  wrote:

> *1147 UTC update* Staring at internal graphs looks like global traffic is
> now at 97% of expected so impact lessening.
>
> On Mon, Jun 24, 2019 at 04:51 Dovid Bender  wrote:
>
>> We are seeing issues as well getting to HE. The traffic is going via
>> Alter.
>>
>>
>>
>> On Mon, Jun 24, 2019 at 7:48 AM Robbie Trencheny  wrote:
>>
>>> From John Graham-Cumming, CTO of Cloudflare, on Hacker News right now:
>>>
>>> This appears to be a routing problem with Level3. All our systems are
>>> running normally but traffic isn't getting to us for a portion of our
>>> domains.
>>>
>>> 1128 UTC update Looks like we're dealing with a route leak and we're
>>> talking directly with the leaker and Level3 at the moment.
>>>
>>> 1131 UTC update Just to be clear this isn't affecting all our traffic or
>>> all our domains or all countries. A portion of traffic isn't hitting
>>> Cloudflare. Looks to be about an aggregate 10% drop in traffic to us.
>>>
>>> 1134 UTC update We are now certain we are dealing with a route leak.
>>>
>>> On Mon, Jun 24, 2019 at 04:04 Antonios Chariton 
>>> wrote:
>>>
 Yes, traffic from Greek networks is routed through NYC (alter.net),
 and previously it had a 60% packet loss. Now it’s still via NYC, but no
 packet loss. This happens in GR-IX Athens, not GR-IX Thessaloniki, but the
 problem definitely exists.

 Antonis


 On 24 Jun 2019, at 13:55, Dmitry Sherman  wrote:

 Hello are there any issues with CloudFlare services now?

 Dmitry Sherman
 dmi...@interhost.net
 Interhost Networks Ltd
 Web: http://www.interhost.co.il
 fb: https://www.facebook.com/InterhostIL
 Office: (+972)-(0)74-7029881 Fax: (+972)-(0)53-7976157


 --
>>> --
>>> Robbie Trencheny (@robbie )
>>> 925-884-3728
>>> robbie.io
>>>
>> --
> --
> Robbie Trencheny (@robbie )
> 925-884-3728
> robbie.io
>


Re: CloudFlare issues?

2019-06-24 Thread Robbie Trencheny
*1147 UTC update* Staring at internal graphs looks like global traffic is
now at 97% of expected so impact lessening.

On Mon, Jun 24, 2019 at 04:51 Dovid Bender  wrote:

> We are seeing issues as well getting to HE. The traffic is going via Alter.
>
>
>
> On Mon, Jun 24, 2019 at 7:48 AM Robbie Trencheny  wrote:
>
>> From John Graham-Cumming, CTO of Cloudflare, on Hacker News right now:
>>
>> This appears to be a routing problem with Level3. All our systems are
>> running normally but traffic isn't getting to us for a portion of our
>> domains.
>>
>> 1128 UTC update Looks like we're dealing with a route leak and we're
>> talking directly with the leaker and Level3 at the moment.
>>
>> 1131 UTC update Just to be clear this isn't affecting all our traffic or
>> all our domains or all countries. A portion of traffic isn't hitting
>> Cloudflare. Looks to be about an aggregate 10% drop in traffic to us.
>>
>> 1134 UTC update We are now certain we are dealing with a route leak.
>>
>> On Mon, Jun 24, 2019 at 04:04 Antonios Chariton 
>> wrote:
>>
>>> Yes, traffic from Greek networks is routed through NYC (alter.net), and
>>> previously it had a 60% packet loss. Now it’s still via NYC, but no packet
>>> loss. This happens in GR-IX Athens, not GR-IX Thessaloniki, but the problem
>>> definitely exists.
>>>
>>> Antonis
>>>
>>>
>>> On 24 Jun 2019, at 13:55, Dmitry Sherman  wrote:
>>>
>>> Hello are there any issues with CloudFlare services now?
>>>
>>> Dmitry Sherman
>>> dmi...@interhost.net
>>> Interhost Networks Ltd
>>> Web: http://www.interhost.co.il
>>> fb: https://www.facebook.com/InterhostIL
>>> Office: (+972)-(0)74-7029881 Fax: (+972)-(0)53-7976157
>>>
>>>
>>> --
>> --
>> Robbie Trencheny (@robbie )
>> 925-884-3728
>> robbie.io
>>
> --
--
Robbie Trencheny (@robbie )
925-884-3728
robbie.io


Re: CloudFlare issues?

2019-06-24 Thread Dovid Bender
We are seeing issues as well getting to HE. The traffic is going via Alter.



On Mon, Jun 24, 2019 at 7:48 AM Robbie Trencheny  wrote:

> From John Graham-Cumming, CTO of Cloudflare, on Hacker News right now:
>
> This appears to be a routing problem with Level3. All our systems are
> running normally but traffic isn't getting to us for a portion of our
> domains.
>
> 1128 UTC update Looks like we're dealing with a route leak and we're
> talking directly with the leaker and Level3 at the moment.
>
> 1131 UTC update Just to be clear this isn't affecting all our traffic or
> all our domains or all countries. A portion of traffic isn't hitting
> Cloudflare. Looks to be about an aggregate 10% drop in traffic to us.
>
> 1134 UTC update We are now certain we are dealing with a route leak.
>
> On Mon, Jun 24, 2019 at 04:04 Antonios Chariton 
> wrote:
>
>> Yes, traffic from Greek networks is routed through NYC (alter.net), and
>> previously it had a 60% packet loss. Now it’s still via NYC, but no packet
>> loss. This happens in GR-IX Athens, not GR-IX Thessaloniki, but the problem
>> definitely exists.
>>
>> Antonis
>>
>>
>> On 24 Jun 2019, at 13:55, Dmitry Sherman  wrote:
>>
>> Hello are there any issues with CloudFlare services now?
>>
>> Dmitry Sherman
>> dmi...@interhost.net
>> Interhost Networks Ltd
>> Web: http://www.interhost.co.il
>> fb: https://www.facebook.com/InterhostIL
>> Office: (+972)-(0)74-7029881 Fax: (+972)-(0)53-7976157
>>
>>
>> --
> --
> Robbie Trencheny (@robbie )
> 925-884-3728
> robbie.io
>


Re: Cellular backup connections

2019-06-24 Thread Dovid Bender
I am getting the same for SSH and https traffic. It's strange. Where the
response is something small like:

Moved to this https://63.XX.XX.XX:443/auth.asp";>location.

It works But when I try to load pages that are any bigger it fails. Like I
said before I assume it's either an issue with the MTU or window szie. I
was just wondering if anyone encountered such an issue before. It's not
easy getting to someone that knows something. When you have some sort of
concrete info the level1 techs tend to pass you along faster.





On Mon, Jun 24, 2019 at 7:41 AM J. Hellenthal 
wrote:

> Could be wrong on this but direct SSH on the LTE side may possibly be not
> allowed(filtered) and might just be something you could discuss in a ticket
> with Verizon.
>
> --
>  J. Hellenthal
>
> The fact that there's a highway to Hell but only a stairway to Heaven says
> a lot about anticipated traffic volume.
>
> On Jun 24, 2019, at 04:50, Dovid Bender  wrote:
>
> All,
>
> I finally got around to putting in a Verizon LTE connection and the ping
> times are pretty good. There is the occasional issue however for the most
> part ping times are < 50 ms. I have another strange issue though. When I
> try to ssh or connect via the endpoints web interface it fails. If I first
> connect via PPTP or SSL VPN then it works. I ruled out it being my IP since
> if I connect direct from the PPTP or SSL VPN box then it fails as well. It
> seems the tunnel does something (perhaps lowering the MTU or fragmenting
> packets) that allows it to work. Any thoughts?
>
> TIA.
>
>
>
>
> On Mon, Feb 4, 2019 at 8:18 AM Dovid Bender  wrote:
>
>> Anyone know if Verizon static IP's over LTE have same issue where they
>> bounce the traffic around before it gets back to the NY metro area?
>>
>>
>>
>> On Thu, Jan 3, 2019 at 6:46 PM Dovid Bender  wrote:
>>
>>> All,
>>>
>>> Thanks for all of the feedback. I was on site today and noticed two
>>> things.
>>> 1) As someone mentioned it could be for static IP's they have the
>>> traffic going to a specific location. The POP is in NJ there was a min.
>>> latency of 120ms which prob had to do with this.
>>> 2) I was watching the ping times and it looked something like this:
>>> 400ms
>>> 360ms
>>> 330ms
>>> 300ms
>>> 260ms
>>> 210ms
>>> 170ms
>>> 140ms
>>> 120ms
>>> 400ms
>>> 375ms
>>>
>>> It seems to have been coming in "waves". I assume this has to do with
>>> "how cellular work" and the signal. I tried moving it around by putting it
>>> down low on the floor, moving it locations etc. and saw the same thing
>>> every time. I am going to try Verizon next and see how it goes.
>>>
>>>
>>>
>>> On Sat, Dec 29, 2018 at 12:13 PM Mark Milhollan 
>>> wrote:
>>>
 On Fri, 28 Dec 2018, Dovid Bender wrote:

 >I finally got around to setting up a cellular backup device in our new
 POP.

 >When SSH'ing in remotely the connection seems rather slow.

 Perhaps using MOSH can help make the interactive CLI session less
 annoying.

 >Verizon they charge $500.00 just to get a public IP and I want to
 avoid
 >that if possible.

 You might look into have it call out / maintain a connection back to
 your infrastructure.


 /mark

>>>


Re: CloudFlare issues?

2019-06-24 Thread Robbie Trencheny
>From John Graham-Cumming, CTO of Cloudflare, on Hacker News right now:

This appears to be a routing problem with Level3. All our systems are
running normally but traffic isn't getting to us for a portion of our
domains.

1128 UTC update Looks like we're dealing with a route leak and we're
talking directly with the leaker and Level3 at the moment.

1131 UTC update Just to be clear this isn't affecting all our traffic or
all our domains or all countries. A portion of traffic isn't hitting
Cloudflare. Looks to be about an aggregate 10% drop in traffic to us.

1134 UTC update We are now certain we are dealing with a route leak.

On Mon, Jun 24, 2019 at 04:04 Antonios Chariton 
wrote:

> Yes, traffic from Greek networks is routed through NYC (alter.net), and
> previously it had a 60% packet loss. Now it’s still via NYC, but no packet
> loss. This happens in GR-IX Athens, not GR-IX Thessaloniki, but the problem
> definitely exists.
>
> Antonis
>
>
> On 24 Jun 2019, at 13:55, Dmitry Sherman  wrote:
>
> Hello are there any issues with CloudFlare services now?
>
> Dmitry Sherman
> dmi...@interhost.net
> Interhost Networks Ltd
> Web: http://www.interhost.co.il
> fb: https://www.facebook.com/InterhostIL
> Office: (+972)-(0)74-7029881 Fax: (+972)-(0)53-7976157
>
>
> --
--
Robbie Trencheny (@robbie )
925-884-3728
robbie.io


Re: CloudFlare issues?

2019-06-24 Thread James Jun
On Mon, Jun 24, 2019 at 02:03:47PM +0300, Antonios Chariton wrote:
> Yes, traffic from Greek networks is routed through NYC (alter.net 
> ), and previously it had a 60% packet loss. Now it???s 
> still via NYC, but no packet loss. This happens in GR-IX Athens, not GR-IX 
> Thessaloniki, but the problem definitely exists.
>

It seems Verizon has stopped filtering a downstream customer, or filtering 
broke.

Time to implement peer locking path filters for those using VZ as paid peer..

 Network  Next HopMetric LocPrf Weight Path
 *   2.18.64.0/24 137.39.3.550 701 396531 33154 
174 6057 i
 *   2.19.251.0/24137.39.3.550 701 396531 33154 
174 6057 i
 *   2.22.24.0/23 137.39.3.550 701 396531 33154 
174 6057 i
 *   2.22.26.0/23 137.39.3.550 701 396531 33154 
174 6057 i
 *   2.22.28.0/24 137.39.3.550 701 396531 33154 
174 6057 i
 *   2.24.0.0/16  137.39.3.550 701 396531 33154 
3356 12576 i
 *202.232.0.20 2497 701 396531 
33154 3356 12576 i
 *   2.24.0.0/13  202.232.0.20 2497 701 396531 
33154 3356 12576 i
 *   2.25.0.0/16  137.39.3.550 701 396531 33154 
3356 12576 i
 *202.232.0.20 2497 701 396531 
33154 3356 12576 i
 *   2.26.0.0/16  137.39.3.550 701 396531 33154 
3356 12576 i
 *202.232.0.20 2497 701 396531 
33154 3356 12576 i
 *   2.27.0.0/16  137.39.3.550 701 396531 33154 
3356 12576 i
 *202.232.0.20 2497 701 396531 
33154 3356 12576 i
 *   2.28.0.0/16  137.39.3.550 701 396531 33154 
3356 12576 i
 *202.232.0.20 2497 701 396531 
33154 3356 12576 i
 *   2.29.0.0/16  137.39.3.550 701 396531 33154 
3356 12576 i
 *202.232.0.20 2497 701 396531 
33154 3356 12576 i
 *   2.30.0.0/16  137.39.3.550 701 396531 33154 
3356 12576 i
 *202.232.0.20 2497 701 396531 
33154 3356 12576 i
 *   2.31.0.0/16  137.39.3.550 701 396531 33154 
3356 12576 i
 *202.232.0.20 2497 701 396531 
33154 3356 12576 i
 *   2.56.16.0/22 137.39.3.550 701 396531 33154 
1239 9009 i
 *   2.56.150.0/24137.39.3.550 701 396531 33154 
1239 9009 i
 *   2.57.48.0/22 137.39.3.550 701 396531 33154 
174 50782 i
 *   2.58.47.0/24 137.39.3.550 701 396531 33154 
1239 9009 i
 *   2.59.0.0/23  137.39.3.550 701 396531 33154 
1239 9009 i
 *   2.59.244.0/22137.39.3.550 701 396531 33154 
3356 29119 i
 *   2.148.0.0/14 137.39.3.550 701 396531 33154 
3356 2119 i
 *   3.5.128.0/24 137.39.3.550 701 396531 33154 
3356 16509 i
 *   3.5.128.0/22 137.39.3.550 701 396531 33154 
3356 16509 i 


Re: Cellular backup connections

2019-06-24 Thread J. Hellenthal via NANOG
Could be wrong on this but direct SSH on the LTE side may possibly be not 
allowed(filtered) and might just be something you could discuss in a ticket 
with Verizon.

-- 
 J. Hellenthal

The fact that there's a highway to Hell but only a stairway to Heaven says a 
lot about anticipated traffic volume.

> On Jun 24, 2019, at 04:50, Dovid Bender  wrote:
> 
> All,
> 
> I finally got around to putting in a Verizon LTE connection and the ping 
> times are pretty good. There is the occasional issue however for the most 
> part ping times are < 50 ms. I have another strange issue though. When I try 
> to ssh or connect via the endpoints web interface it fails. If I first 
> connect via PPTP or SSL VPN then it works. I ruled out it being my IP since 
> if I connect direct from the PPTP or SSL VPN box then it fails as well. It 
> seems the tunnel does something (perhaps lowering the MTU or fragmenting 
> packets) that allows it to work. Any thoughts?
> 
> TIA.
> 
> 
> 
> 
>> On Mon, Feb 4, 2019 at 8:18 AM Dovid Bender  wrote:
>> Anyone know if Verizon static IP's over LTE have same issue where they 
>> bounce the traffic around before it gets back to the NY metro area?
>> 
>> 
>> 
>>> On Thu, Jan 3, 2019 at 6:46 PM Dovid Bender  wrote:
>>> All,
>>> 
>>> Thanks for all of the feedback. I was on site today and noticed two things.
>>> 1) As someone mentioned it could be for static IP's they have the traffic 
>>> going to a specific location. The POP is in NJ there was a min. latency of 
>>> 120ms which prob had to do with this.
>>> 2) I was watching the ping times and it looked something like this:
>>> 400ms
>>> 360ms
>>> 330ms
>>> 300ms
>>> 260ms
>>> 210ms
>>> 170ms
>>> 140ms
>>> 120ms
>>> 400ms
>>> 375ms
>>> 
>>> It seems to have been coming in "waves". I assume this has to do with "how 
>>> cellular work" and the signal. I tried moving it around by putting it down 
>>> low on the floor, moving it locations etc. and saw the same thing every 
>>> time. I am going to try Verizon next and see how it goes.
>>> 
>>> 
>>> 
 On Sat, Dec 29, 2018 at 12:13 PM Mark Milhollan  wrote:
 On Fri, 28 Dec 2018, Dovid Bender wrote:
 
 >I finally got around to setting up a cellular backup device in our new 
 >POP.
 
 >When SSH'ing in remotely the connection seems rather slow.
 
 Perhaps using MOSH can help make the interactive CLI session less 
 annoying.
 
 >Verizon they charge $500.00 just to get a public IP and I want to avoid 
 >that if possible.
 
 You might look into have it call out / maintain a connection back to 
 your infrastructure.
 
 
 /mark


smime.p7s
Description: S/MIME cryptographic signature


Re: CloudFlare issues?

2019-06-24 Thread Antonios Chariton
Yes, traffic from Greek networks is routed through NYC (alter.net 
), and previously it had a 60% packet loss. Now it’s still 
via NYC, but no packet loss. This happens in GR-IX Athens, not GR-IX 
Thessaloniki, but the problem definitely exists.

Antonis 

> On 24 Jun 2019, at 13:55, Dmitry Sherman  > wrote:
> 
> Hello are there any issues with CloudFlare services now?
> 
> Dmitry Sherman
> dmi...@interhost.net 
> Interhost Networks Ltd
> Web: http://www.interhost.co.il
> fb: https://www.facebook.com/InterhostIL
> Office: (+972)-(0)74-7029881 Fax: (+972)-(0)53-7976157
> 



CloudFlare issues?

2019-06-24 Thread Dmitry Sherman
Hello are there any issues with CloudFlare services now?

Dmitry Sherman
dmi...@interhost.net
Interhost Networks Ltd
Web: http://www.interhost.co.il
fb: https://www.facebook.com/InterhostIL
Office: (+972)-(0)74-7029881 Fax: (+972)-(0)53-7976157



Re: Cellular backup connections

2019-06-24 Thread Dovid Bender
All,

I finally got around to putting in a Verizon LTE connection and the ping
times are pretty good. There is the occasional issue however for the most
part ping times are < 50 ms. I have another strange issue though. When I
try to ssh or connect via the endpoints web interface it fails. If I first
connect via PPTP or SSL VPN then it works. I ruled out it being my IP since
if I connect direct from the PPTP or SSL VPN box then it fails as well. It
seems the tunnel does something (perhaps lowering the MTU or fragmenting
packets) that allows it to work. Any thoughts?

TIA.




On Mon, Feb 4, 2019 at 8:18 AM Dovid Bender  wrote:

> Anyone know if Verizon static IP's over LTE have same issue where they
> bounce the traffic around before it gets back to the NY metro area?
>
>
>
> On Thu, Jan 3, 2019 at 6:46 PM Dovid Bender  wrote:
>
>> All,
>>
>> Thanks for all of the feedback. I was on site today and noticed two
>> things.
>> 1) As someone mentioned it could be for static IP's they have the traffic
>> going to a specific location. The POP is in NJ there was a min. latency of
>> 120ms which prob had to do with this.
>> 2) I was watching the ping times and it looked something like this:
>> 400ms
>> 360ms
>> 330ms
>> 300ms
>> 260ms
>> 210ms
>> 170ms
>> 140ms
>> 120ms
>> 400ms
>> 375ms
>>
>> It seems to have been coming in "waves". I assume this has to do with
>> "how cellular work" and the signal. I tried moving it around by putting it
>> down low on the floor, moving it locations etc. and saw the same thing
>> every time. I am going to try Verizon next and see how it goes.
>>
>>
>>
>> On Sat, Dec 29, 2018 at 12:13 PM Mark Milhollan 
>> wrote:
>>
>>> On Fri, 28 Dec 2018, Dovid Bender wrote:
>>>
>>> >I finally got around to setting up a cellular backup device in our new
>>> POP.
>>>
>>> >When SSH'ing in remotely the connection seems rather slow.
>>>
>>> Perhaps using MOSH can help make the interactive CLI session less
>>> annoying.
>>>
>>> >Verizon they charge $500.00 just to get a public IP and I want to avoid
>>> >that if possible.
>>>
>>> You might look into have it call out / maintain a connection back to
>>> your infrastructure.
>>>
>>>
>>> /mark
>>>
>>