Re: CISCO 0-day exploits

2020-02-10 Thread Ahmed Borno
Disclaimer, I do not work for any vendor right now, and I don't sell any
product that might benefit from scaring anyone, so this is just some
whining for a real issue that someone needs to do something about.

I've worked for the CDP vendor for a long time, and I do concur to what
Saku is saying...the L3 packet of death [threat] is very real, and I'd like
to take this opportunity (the buzz around CDPwn) to say a thing or two
about these 'soft, mushy and vulnerable' code stacks we have running all
over the world, under our feet, waiting for someone with the right
incentive to take advantage of.

IMHO, "Skilled" software developers, and in parallel...'software
exploiters/reverse engineers' haven't been paying attention to these
'infrastructure' boxes (for now), maybe it is because they always had other
pieces of the technology stack to work with, and these other tech. stacks
were much more rewarding to spend time on (I'm quite sure Node.js /
Kubernetes for example...will have a lot more vulnerability researchers
looking at them than CDP/LLDP/SNMPetc code). < and That is a serious
sustainability issue on our hands, the risk here is very high; when it
comes to infrastructure security of nations, especially in a world where
miscreants are no longer script kiddies but actual nation sponsored
soldiers...Even MBS is doing it in person.

Because the moment some miscreants from some oppressive regime decides to
do damage, and not necessarily remote code execution as many might think,
but more on the 'L3 packet of death' kind of situation that Saku mentioned
earlier, these miscreants have a lot to play with, and the attacks vector
is huge, it is green and it is ready for the ripe.

In my life time, I've looked at so many 'DDTS' descriptions, and I saw
nothing but an unwritten disclaimer of : 'I can be easily used for
DDoS'...and that is the case even if *SIRT did their brief analysis of
these bugs, so again, if some miscreants found it in themselves to look at
bugs with the right 'optics', we are going to be in an interesting
situation.

Luckily, we haven't seen a CDPwn/STPwn/BGPwn/NTPwn/*.*Pwn...etc
worm/ransomware yet, but we also don't have reason to think its not
possible, and to make matters worse, the code these babies are running is
ancient (in every possible way), many of the libraries used to develop that
code is glibc_ishy like in nature, and to make matters a bit more
interesting, patching those babies is not easy, and the nature of their
software architecture makes them even much more fragile than any piece of
cheap IP camera out there on the internet or on enterprise networks.

So yeah iACLs, CoPP and all sorts of basic precautions are needed, but I'm
thinking something more needs to be done, specially if these ancient code
stacks are being imported into new age 'IoT' devices, multiplying the
attack vector by a factor of too many.

~A

On Mon, Feb 10, 2020 at 5:42 AM Saku Ytti  wrote:

> On Mon, 10 Feb 2020 at 13:52, Jean | ddostest.me via NANOG
>  wrote:
>
> > I really thought that more Cisco devices were deployed among NANOG.
> >
> > I guess that these devices are not used anymore or maybe that I
> > understood wrong the severity of this CVE.
>
> Network devices are incredibly fragile and mostly work because no one
> is motivated to bring the infrastructure down. Getting any arbitrary
> vendor down if you have access to it on L2 is usually so easy you
> accidentally find ways to do it.
> There are various L3 packet of deaths where existing infra can be
> crashed with single packet, almost everyone has no or ridiculously
> broken iACL and control-plane protection, yet business does not seem
> to suffer from it.
>
> Probably lower availability if you do upgrade your devices just
> because there is a known issue, due to new production affecting
> issues.
>
> --
>   ++ytti
>


Re: SD-NAP (San Diego) Internet Exchange?

2020-02-10 Thread Bill Woodcock
Last I knew it had pretty much devolved into intra-campus and local A/R 
interconnection, but our contacts here have retired as well. 


-Bill


> On Feb 10, 2020, at 21:15, Matt Peterson  wrote:
> 
> 
> Wondering if SD-NAP is still functional? PeeringDB entry looks pretty stale, 
> haven't been able to reach any contact aware of the current status. 
> Appreciate any help or direction on the status, thanks.
> 
> --Matt


SD-NAP (San Diego) Internet Exchange?

2020-02-10 Thread Matt Peterson
Wondering if SD-NAP is still functional? PeeringDB entry
 looks pretty stale, haven't been able to
reach any contact aware of the current status. Appreciate any help or
direction on the status, thanks.

--Matt


Re: Peering/Transit eBGP sessions -pet or cattle?

2020-02-10 Thread Lukas Tribus
Hello Baldur,


On Mon, 10 Feb 2020 at 19:57, Baldur Norddahl  wrote:
> Many dual homed companies may start out with two routers and two
> transits but without dual links to each transit, as you describe
> above. That will cause significant disruption if one link goes
> down. It is not just about convergence between T1 and T2 but for
> a major part of the internet. Been there, done that, yes you can
> be down for up to several minuttes before everything is normal
> again. Assume tier 1 transits and that contact to T1 was lost.
> This means T1 will have a peering session with T2 somewhere,
> but T1 will not allow peer to peer traffic to go via that link.
> All those peers will need to search for a different way to reach
> you, a way that does not transit T1 (unless they have a contract
> with T1).
>
> Therefore, if being down for several minutes is not ok, you
> should invest in dual links to your transits. And connect those
> to two different routers. If possible with a guarantee the
> transits use two routers at their end and that divergent fiber
> paths are used etc.

That is not my experience *at all*. I have always seen my prefixes
converge in a couple of seconds upstream (vs 2 different Tier1's).
That is with a double-digit number of announcements. Maybe if you
announce tens of thousands of prefixes as a large Tier 2, things are
more problematic, that I can't tell. Or maybe you hit some old-school
route dampening somewhere down the path. Maybe there is another reason
for this. But even if 3 AS hops are involved I don't really understand
how they would spend *minutes* to converge after receiving your BGP
withdraw message.

When I saw *minutes* of brownouts in connectivity it was always
because of ingress prefix convergence (or the lack thereof, due to
slow FIB programing, then temporary internal routing loops, nasty
things like that, but never external).

I agree there are a number of reasons (including best convergence) to
have completely diversified connections to a single transit AS.
Another reason is that when you manually reroute traffic for a certain
AS path (say transit 2 has an always congested PNI towards a third
party ASN), you may not have an alternative to the congested path when
you other transit provider goes away. But I never saw minutes of
brownout because of upstream -> downstream -> downstream convergence
(or whatever the scenario looks like).


lukas


AS7843 at NANOG78

2020-02-10 Thread aaron
Hi,

Would an operator from AS7843 at NANOG78 reach out to me off-list?

Thanks,
Aaron


Re: Charter contact

2020-02-10 Thread Seth Mattinen

On 2/7/20 6:36 PM, Mehmet Akcin wrote:

Hey there

I am looking for a contact in Charter for a 10G wave. Reno > SF or Reno 
to > LA.


Please let me know if you know people who may help.



If you can get them to actually sell you a 10G. Last time I dealt with 
Charter they maxed out at offering 5G in Reno. I use Verizon and AT 
now, both are also cheaper Charter was.


Re: Peering/Transit eBGP sessions -pet or cattle?

2020-02-10 Thread Baldur Norddahl
On Mon, Feb 10, 2020 at 5:42 PM  wrote:

>
> > To be explicit: Router R1 has connections to transits T1 and T2.
> > Router R2 also has connections to the same transits T1 and T2. When
> > router R1 goes down, only small internal changes at T1 and T2 happens.
> > Nobody notices and the recovery is sub second.
> >
> Good point again,
> Though if I had only T1 on R1 and only T2 on R2 then convergence won't
> happen inside each Transit but instead between T1 and T2 which will add to
> the convergence time.
> So thinking about it seems the optimal design pattern in a distributed
> (horizontally scaled out) edge would be to try and pair up -i.e. at least
> two edge nodes per Transit (or Peer for that matter), in order to allow for
> potentially faster intra-Transit convergence rather than arguably slower
> inter-transit convergence.
>
>
 I am assuming R1 and R2 are connected and announcing the same routes. Each
transit is therefore receiving the same routes from two independent routers
(R1 and R2). When R1 goes down, something internally at the transit will
change to reflect that. But peers, other customers at that transit and
higher tier transits will see no difference at all. Assuming R1 and R2 both
announce a default route internally in your network, your internal
convergence will be as fast as your detection of the dead router.

This scheme also protects against link failure or failure at the provider
end (if you make sure the transit is also using two routers).

Therefore even if R1 and R2 are in the same physical location, maybe the
same rack mounted on top of each other, that is a better solution than one
big hunky router with redundant hardware. Having them at different
locations is better of course but not always feasible.

Many dual homed companies may start out with two routers and two transits
but without dual links to each transit, as you describe above. That will
cause significant disruption if one link goes down. It is not just about
convergence between T1 and T2 but for a major part of the internet. Been
there, done that, yes you can be down for up to several minuttes before
everything is normal again. Assume tier 1 transits and that contact to T1
was lost. This means T1 will have a peering session with T2 somewhere, but
T1 will not allow peer to peer traffic to go via that link. All those peers
will need to search for a different way to reach you, a way that does not
transit T1 (unless they have a contract with T1).

Therefore, if being down for several minutes is not ok, you should invest
in dual links to your transits. And connect those to two different routers.
If possible with a guarantee the transits use two routers at their end and
that divergent fiber paths are used etc.

Regards,

Baldur


Re: CISCO 0-day exploits

2020-02-10 Thread Tom Hill
On 10/02/2020 18:13, Scott Weeks wrote:
> Just because you use cisco devices doesn't mean you have to use 
> their proprietary protocols, such as EIGRP or CDP.  OSPF or LLDP
> work just fine and interoperate with other vendors... :)


The CDPwn vulnerability covers similar vulnerabilities in LLDP, and does
indeed demonstrate that network segmentation (i.e. "dude it's just L2")
is not the last word in mitigating against said vulnerabilities.

You ought to all be far more concerned, IMO.

-- 
Tom


Re: CISCO 0-day exploits

2020-02-10 Thread Justin Wilson



> 
> I really thought that more Cisco devices were deployed among NANOG.
> 
> I guess that these devices are not used anymore or maybe that I 
> understood wrong the severity of this CVE.

A proper network design helps to mitigate flaws like this. If you have CDP off, 
which many people do, then this exploit is not that big of a deal to you.  If 
your devices are on a management network then it’s not that big of a deal.  
Just because a certain vendor has vulnerabilities exposed doesn’t it’s an all 
hand on deck scenario.  Many of the folks on NANOG have a good grasp of network 
design.  Sure, some don’t.  But for the most part they do. 

Justin Wilson
li...@mtin.net

—
https://j2sw.com - All things jsw (AS209109)
https://blog.j2sw.com - Podcast and Blog



Re: Peering/Transit eBGP sessions -pet or cattle?

2020-02-10 Thread Lukas Tribus
Hello Adam,


On Mon, 10 Feb 2020 at 13:37,  wrote:
> Would like to take a poll on whether you folks tend to treat your 
> transit/peering connections (BGP sessions in particular) as pets or rather as 
> cattle.

Cattle every day of the week.

I don't trust control-plane resiliency and things like ISSU any
farther than I can throw the big boxes it runs on.

The entire network is engineered so that my customers *do not* feel
the loss of one node (*). That is the design principal here and while
traffic grows and we keep adding more capacity this is something we
always consider.

How difficult it is to achieve that depends on the particular
situation, and it may be quite difficult in some situations, but not
here.


That is why I can upgrade releases on those nodes (without customers,
just transit and peers) quite frequently. I can achieve that with
mostly zero packet loss because of the design and all-around traffic
draining using graceful shutdown and friends. We had quite some issues
to drain traffic from nodes in the past (brownouts due to FIB mismatch
between routers due to IP lookup on both ingress and egress node with
per vrf label allocation, but since we switched to "per-ce" - meaning
per nexthop - label allocation things work great).

On the other side, transit with support for graceful-shutdown is of
course great, but even if there is no support for it, for maintenance
on your or your transit's box, you still know about the maintenance
beforehand, so you can manually drain your egress traffic (you peer
doesn't have to support RFC8326 for you to drop YOUR loc-pref to
zero), and many transit provider have some kind of "set loc-pref below
peer" community, which allows you to do basically the same thing
manually without actual RFC8326 support on the other side. That said,
for ingress traffic, unless you are announcing *A LOT* of routes,
convergence is usually *very* fast anyway.

I can see the benefit of having internal HW redundancy for nodes where
customers are connected (shorter maintenance sessions, less outages in
some single HW failures scenarios, overall theoretical better service
uptime), but it never covers everything and it may just introduce
unnecessary complexity that is actually root-causing outages and
certainly complexity.

Maybe I'm just a lucky fellow, but the hardware has been so reliable
here that I'm pretty sure the complexity of Dual-RSP, ISSU and friends
would have caused more issues over time than what I'm seeing with some
good old and honest HW failures.

Regarding HW redundancy itself: Dual RSP doesn't have any benefit when
the guy in the MMR pulls the wrong fiber, bringing down my transit. It
will still be BGP that has to converge. We don't have PIC today, maybe
this is something to look into in the future, but it isn't something
that internal HW redundancy fixes.

A straightforward and KISS design, where the engineers actually know
"what happens when", and how to do things properly (like draining
traffic), and also quite frankly accepting some brownouts for uncommon
events is the strategy that worked best for us.


(*) sure, if the node with 700k best-paths towards a transit dies
non-gracefully (hw or power failure), there will be a brownout of the
affected prefixes for some minutes. But after convergence my network
will be fine and my customers will stop feeling. They will ask what
happened and I will be able to explain.


cheers,
lukas


Re: CISCO 0-day exploits

2020-02-10 Thread Scott Weeks



--- nanog@nanog.org wrote:
From: "Jean | ddostest.me via NANOG" 

> https://www.armis.com/cdpwn/
>
> What's the impact on your network? Everything is under control?
---

I really thought that more Cisco devices were deployed among NANOG.

I guess that these devices are not used anymore or maybe that I 
understood wrong the severity of this CVE.
---


Just because you use cisco devices doesn't mean you have to use 
their proprietary protocols, such as EIGRP or CDP.  OSPF or LLDP
work just fine and interoperate with other vendors... :)

scott


SLAAC renumbering problems (Fwd: [v6ops] draft-gont-v6ops-slaac-renum **Call for adoption**)

2020-02-10 Thread Fernando Gont
Folks,

A while ago some of us started working on an IETF draft to document and
mitigate some issues experienced by SLAAC in the face of some
renumbering events. Such work has resulted in three small documents.

* draft-gont-v6ops-slaac-renum  (problem statement)
* draft-gont-v6ops-slaac-renum (CPE recommendations)
* draft-gont-6man-slaac-renum (proposed protocol updates)

Two of such documents are being discussed at the v6ops wg of the IETF,
where there are ongoing calls for adoption for two of them.


* The "problem statement" document
(https://tools.ietf.org/html/draft-gont-v6ops-slaac-renum) is being
discussed at the v6ops wg of the IETF in this thread:
https://mailarchive.ietf.org/arch/msg/v6ops/HmcZYGY4Lu2h7NUND3o2UiOsKXA

* The "CPE recommendations" document
(https://tools.ietf.org/html/draft-gont-v6ops-cpe-slaac-renum) is being
discussed in the same group/list, in this thread:
https://mailarchive.ietf.org/arch/msg/v6ops/yW_YdRogwMNCRvK1PKpXWgcBkmQ


Feedback will be highly appreciated, particularly if on the v6ops wg
mailing-list.

Thanks!

Cheers,
Fernando




 Forwarded Message 
Subject:[v6ops] draft-gont-v6ops-slaac-renum **Call for adoption**
Date:   Sat, 4 Jan 2020 22:56:44 +
From:   Ron Bonica 
To: v6...@ietf.org 



Folks,

 

Each week between now and IETF 107, we will review and discuss one draft
with an eye towards progressing it.

 

This week, please review and comment on draft-gont-v6ops-slaac-renum.

 

When reviewing this draft, please indicate whether you think that V6OPS
should adopt it a s a work item.

 

  Fred and Ron

 


Juniper Business Use Only



RE: Peering/Transit eBGP sessions -pet or cattle?

2020-02-10 Thread adamv0025


> Baldur Norddahl
> Sent: Monday, February 10, 2020 3:06 PM
> 
> No matter how much money you put into your peering router, the session 
> will be no more stable that whatever the peer did to their end.
>
Agreed, that's a fair point,  

> Plus at some
> point you will need to reboot due to software upgrade or other reasons. 
>
There are ways of draining traffic for planned maintenance.

> If
> you care at all, you should be doing redundancy by having multiple 
> locations, multiple routers. You can then save the money spent on each 
> router, because a router failure will not cause any change on what the 
> internet sees through BGP.
> 
I think router failure will cause change on what the Internet sees as you 
rightly outlined below:

> Also transits are way more important than peers. Loosing a transit 
> will cause massive route changes around the globe and it will take a 
> few minutes to stabilize. Loosing a peer usually just means the peer 
> switches to the transit route, that they already had available.
> 
agreed and I suppose the questions is whether folks tend to try minimizing 
these impacts by all means possible or just take it as necessary evil that will 
eventually happen.

> Peers are not equal. You may want to ensure redundancy to your biggest 
> peers, while the small fish will be fine without.
> 
> To be explicit: Router R1 has connections to transits T1 and T2. 
> Router R2 also has connections to the same transits T1 and T2. When 
> router R1 goes down, only small internal changes at T1 and T2 happens. 
> Nobody notices and the recovery is sub second.
> 
Good point again,
Though if I had only T1 on R1 and only T2 on R2 then convergence won't happen 
inside each Transit but instead between T1 and T2 which will add to the 
convergence time. 
So thinking about it seems the optimal design pattern in a distributed 
(horizontally scaled out) edge would be to try and pair up -i.e. at least two 
edge nodes per Transit (or Peer for that matter), in order to allow for 
potentially faster intra-Transit convergence rather than arguably slower 
inter-transit convergence.  
 
adam





Re: CISCO 0-day exploits

2020-02-10 Thread Tom Hill
On 10/02/2020 13:40, Saku Ytti wrote:
> There are various L3 packet of deaths where existing infra can be
> crashed with single packet, almost everyone has no or ridiculously
> broken iACL and control-plane protection, yet business does not seem
> to suffer from it.


The cynic in me would suggest that we haven't had a World War in a
while; business is far too good.

-- 
Tom


Re: Peering/Transit eBGP sessions -pet or cattle?

2020-02-10 Thread Baldur Norddahl
No matter how much money you put into your peering router, the session will
be no more stable that whatever the peer did to their end. Plus at some
point you will need to reboot due to software upgrade or other reasons. If
you care at all, you should be doing redundancy by having multiple
locations, multiple routers. You can then save the money spent on each
router, because a router failure will not cause any change on what the
internet sees through BGP.

Also transits are way more important than peers. Loosing a transit will
cause massive route changes around the globe and it will take a few
minutes to stabilize. Loosing a peer usually just means the peer switches
to the transit route, that they already had available.

Peers are not equal. You may want to ensure redundancy to your biggest
peers, while the small fish will be fine without.

To be explicit: Router R1 has connections to transits T1 and T2. Router R2
also has connections to the same transits T1 and T2. When router R1 goes
down, only small internal changes at T1 and T2 happens. Nobody notices and
the recovery is sub second.

Peers are less important: R1 has connection to internet exchange IE1 and R2
to a different internet exchange IE2. When R1 goes down the small peers at
IE1 are lost but will quickly reroute through transit. Large peers may be
present at both internet exchanges and so will instantly switch the traffic
to IE2.

Regards,

Baldur



On Mon, Feb 10, 2020 at 1:38 PM  wrote:

> Hi,
>
>
>
> Would like to take a poll on whether you folks tend to treat your
> transit/peering connections (BGP sessions in particular) as pets or rather
> as cattle.
>
> And I appreciate the answer could differ for transit vs peering
> connections.
>
> However, I’d like to ask this question through a lens of redundant vs
> non-redundant Internet edge devices.
>
> To explain,
>
>1. The “pet” case:
>
> Would you rather try improving the failure rate of your transit/peering
> connections by using resilient Control-Plane (REs/RSPs/RPs) or even
> designing these as link bundles over separate cards and optical modules?
>
> Is this on the bases that doesn’t matter how hard you try on your end
> (i.e. distribute your traffic to multitude of transit and peering
> connections or use BFD or even BGP-PIC Edge to shuffle thing around fast,
> any disruption to the eBGP session itself will still hurt you in some way,
> (i.e. at least some partial outage for some proportion of the traffic for
> not insignificant period of time) until things converge in direction from
> The Internet back to you.
>
>
>
>1. The “cattle” case:
>
> Or would you instead rely on small-ish non-redundant HW at your internet
> edge rather than trying to enhance MTBF with big chassis full of redundant
> HW?
>
> Is this cause eventually the MTBF figure for a particular transit/peering
> eBGP session boils down to the MTBF of the single card or even single
> optical module hosting the link, (and creating bundles over separate cards
> -well you can never be quite sure how the setup looks like on the other end
> of that connection)?
>
> Or is it because the effects of a smaller/non-resilient border edge device
> failure is not that bad in your particular (maybe horizontally scaled)
> setup?
>
>
>
> Would appreciate any pointers, thank you.
>
> Thank you
>
>
>
> adam
>
>
>


Re: Flow based architecture in data centers(more specifically Telco Clouds)

2020-02-10 Thread Warren Kumari
On Sun, Feb 9, 2020 at 4:15 PM Christopher Morrow
 wrote:
>
>
>
> On Sun, Feb 9, 2020 at 1:06 PM Rod Beck  
> wrote:
>>
>> They don't have to be related.
>>
>
> makes a cogent conversation harder :)

Srsly?! Any conversation including Cogent is harder

W
(Sorry, couldn't resist. I tried, but failed...)


>
>>
>> I am curious about the distinction about the flow versus non-flow 
>> architecture for data centers and I am also fascinated by the separate issue 
>> of WAN architecture for these clouds.
>>
>
> WAN is probably: "least expensive option form A to B" plus some effort to 
> standardize across your deployment. Right?
>
> Akamai is probably a good example, from what I can tell they were 
> 'transit/peering only' until they realized their product was sending 'more 
> bits' between deployments than to customers (in some cases). So, pushing the 
> 'between our deployments' bits over dedicated links (be that dark, waves, 
> other L3 transport) made sense budget-wise.
>
> (again.. just a chemical engineer and not a peering engineer, but...)
>
>>
>> Regards,
>>
>> Roderick.
>>
>> 
>> From: Christopher Morrow 
>> Sent: Sunday, February 9, 2020 9:24 PM
>> To: Rod Beck 
>> Cc: Glen Kent ; nanog@nanog.org 
>> Subject: Re: Flow based architecture in data centers(more specifically Telco 
>> Clouds)
>>
>> (caution, I'm just a chemical engineer, but)
>>
>> You appear to ask one question: "What is the difference between flow
>> and non-flow architectures?"
>> then sideline in some discussion about fiber/waves vs
>> layer-3/transit/peering/x-connect
>>
>> I don't think the second part really relates to the first part of your 
>> message.
>> (I didn't put this content in-line because .. it's mostly trying to
>> clarify what you are asking Rod"
>>
>> On Sun, Feb 9, 2020 at 3:19 AM Rod Beck  
>> wrote:
>> >
>> > Please explain for us dumb sales guys the distinction between flow and 
>> > non-flow. My question is the fundamental architecture of these clouds. We 
>> > all know that Amazon is buying dark fiber and building a network based on 
>> > lighting 100 and 10 gig waves on IRU and titled fiber. Same for Microsoft 
>> > (I sold them in a past life some waves) and other large players.
>> >
>> > But there appear to be quite a few cloud players that rely heavily on 
>> > Layer 3 purchased from Level3 (CenturyLink) and other members of the 
>> > august Tier 1 club. And many CDN players are really transit + real estate 
>> > operations as was Akamai until recently.
>> >
>> > It seems the threshold for moving from purchased transit plus peering to a 
>> > Layer 1 and 2 network has risen over time. Many former Tier 2 ISPs pretty 
>> > much gutted their private line networks as transit prices continued 
>> > inexorable declines.
>> >
>> > Best,
>> >
>> > Roderick.
>> >
>> > 
>> > From: NANOG  on behalf of Glen Kent 
>> > 
>> > Sent: Sunday, February 9, 2020 11:02 AM
>> > To: nanog@nanog.org 
>> > Subject: Flow based architecture in data centers(more specifically Telco 
>> > Clouds)
>> >
>> > Hi,
>> >
>> > Are most of the Telco Cloud deployments envisioned to be modeled on a flow 
>> > based or a non flow based architecture? I am presuming that for deeper 
>> > insights into the traffic one would need a flow based architecture, but 
>> > that can have scale issues (# of flows, flow setup rates, etc) and was 
>> > hence checking.
>> >
>> > Thanks, Glen



-- 
I don't think the execution is relevant when it was obviously a bad
idea in the first place.
This is like putting rabid weasels in your pants, and later expressing
regret at having chosen those particular rabid weasels and that pair
of pants.
   ---maf


Re: CISCO 0-day exploits

2020-02-10 Thread Jean | ddostest.me via NANOG
I remember a Cisco device with an ACL that was leaking. It was a 20 
lines ACL with few lines to drop some packets based on UDP ports.


When under heavy stress, nearly line rate, we would see some of these 
packets going through the ACL.


I said to my peers that the ACL was leaking. They didn't believe me so I 
showed them the netflows.


We were very surprised to see that. We thought that drop means drop.

On 2020-02-10 08:40, Saku Ytti wrote:

On Mon, 10 Feb 2020 at 13:52, Jean | ddostest.me via NANOG
 wrote:


I really thought that more Cisco devices were deployed among NANOG.

I guess that these devices are not used anymore or maybe that I
understood wrong the severity of this CVE.

Network devices are incredibly fragile and mostly work because no one
is motivated to bring the infrastructure down. Getting any arbitrary
vendor down if you have access to it on L2 is usually so easy you
accidentally find ways to do it.
There are various L3 packet of deaths where existing infra can be
crashed with single packet, almost everyone has no or ridiculously
broken iACL and control-plane protection, yet business does not seem
to suffer from it.

Probably lower availability if you do upgrade your devices just
because there is a known issue, due to new production affecting
issues.



Re: CISCO 0-day exploits

2020-02-10 Thread Saku Ytti
On Mon, 10 Feb 2020 at 13:52, Jean | ddostest.me via NANOG
 wrote:

> I really thought that more Cisco devices were deployed among NANOG.
>
> I guess that these devices are not used anymore or maybe that I
> understood wrong the severity of this CVE.

Network devices are incredibly fragile and mostly work because no one
is motivated to bring the infrastructure down. Getting any arbitrary
vendor down if you have access to it on L2 is usually so easy you
accidentally find ways to do it.
There are various L3 packet of deaths where existing infra can be
crashed with single packet, almost everyone has no or ridiculously
broken iACL and control-plane protection, yet business does not seem
to suffer from it.

Probably lower availability if you do upgrade your devices just
because there is a known issue, due to new production affecting
issues.

-- 
  ++ytti


Re: CISCO 0-day exploits

2020-02-10 Thread t...@pelican.org
On Monday, 10 February, 2020 11:50, "Jean | ddostest.me via NANOG" 
 said:

> I really thought that more Cisco devices were deployed among NANOG.
> 
> I guess that these devices are not used anymore or maybe that I
> understood wrong the severity of this CVE.

The phones / cameras side of it seems very much like an Enterprise problem.  
I'm not sure what the split is here of people operating Enterprise networks vs 
Service Provider, but I'd expect a skew towards the latter.

There is some SP kit on the vulnerable list too, but in my experience, CDP 
there is used to validate L2 topologies amongst SP kit only, and disabled on 
customer-facing ports.  So maybe a "we *do* have CDP turned off everywhere we 
don't need it, right?" sanity-check, but not necessarily a rush to patch.

I'd have expected greater consternation had this hit vanilla-IOS/XE boxes that 
are likely to be in managed CPE roles, such as ISR and ASR1K.  There I can see 
the potential for CDP to be enabled customer-facing, either for diagnostics 
with the customer, or for the voice / data VLAN stuff outlined in the article.

Regards,
Tim.




Peering/Transit eBGP sessions -pet or cattle?

2020-02-10 Thread adamv0025
Hi,

 

Would like to take a poll on whether you folks tend to treat your
transit/peering connections (BGP sessions in particular) as pets or rather
as cattle.

And I appreciate the answer could differ for transit vs peering connections.

However, I'd like to ask this question through a lens of redundant vs
non-redundant Internet edge devices.

To explain, 

a.  The "pet" case:

Would you rather try improving the failure rate of your transit/peering
connections by using resilient Control-Plane (REs/RSPs/RPs) or even
designing these as link bundles over separate cards and optical modules? 

Is this on the bases that doesn't matter how hard you try on your end (i.e.
distribute your traffic to multitude of transit and peering connections or
use BFD or even BGP-PIC Edge to shuffle thing around fast, any disruption to
the eBGP session itself will still hurt you in some way, (i.e. at least some
partial outage for some proportion of the traffic for not insignificant
period of time) until things converge in direction from The Internet back to
you. 

 

b.  The "cattle" case:

Or would you instead rely on small-ish non-redundant HW at your internet
edge rather than trying to enhance MTBF with big chassis full of redundant
HW? 

Is this cause eventually the MTBF figure for a particular transit/peering
eBGP session boils down to the MTBF of the single card or even single
optical module hosting the link, (and creating bundles over separate cards
-well you can never be quite sure how the setup looks like on the other end
of that connection)?

Or is it because the effects of a smaller/non-resilient border edge device
failure is not that bad in your particular (maybe horizontally scaled)
setup?

 

Would appreciate any pointers, thank you.

Thank you

 

adam

 



Re: CISCO 0-day exploits

2020-02-10 Thread Jean | ddostest.me via NANOG

I really thought that more Cisco devices were deployed among NANOG.

I guess that these devices are not used anymore or maybe that I 
understood wrong the severity of this CVE.


Happy NANOG #78

Cheers

Jean

On 2020-02-07 09:21, Jean | ddostest.me via NANOG wrote:

CDPwn: 5 new zero-day Cisco exploits

https://www.armis.com/cdpwn/

What's the impact on your network? Everything is under control?

Jean



Re: Flow based architecture in data centers(more specifically Telco Clouds)

2020-02-10 Thread Saku Ytti
On Sun, 9 Feb 2020 at 23:09, Rod Beck  wrote:

> I am curious about the distinction about the flow versus non-flow 
> architecture for data centers and I am also fascinated by the separate issue 
> of WAN architecture for these

Based on the context of the OP's question, he is talking about
architecture where some components, potentially network devices, are
flow-aware, instead of doing LPM lookup per packet, they are doing LPM
lookup per flow.
This comes up every few years in various formats, because with
flow-lookup you have one expensive LPM lookup per flow and multiple
cheap LEM lookups. However the LEM table size is unbounded and easily
abusable leading to a set of very complex problems.

There are of course a lot of variation what OP might mean. Network
might be for example entirely LEM lookup with extremely small table,
by using stack of MPLS labels, zero LPM lookups. This architecture
could be made so that when server needs to send something say video to
a client, it asks orchestration for permission, telling I need to send
x GB to DADDR K with rate at least Z and no more than Y, orchestration
could then tell the server to start sending at time T0 and impose MPLS
label stack of [l1, l2, l3, l4, l5]

Orchestration would know exactly which links traffic traverses, how
long will it be utilised and how much free capacity there is. Network
would be extremely dumb, no IP lookups ever, only thousands of MPLS
labels in FIB, so entirely on-chip lookups of trivial cost.

-- 
  ++ytti