Re: Azure Looking Glass

2024-06-28 Thread Lincoln Dale
Presumably nothing stops you spinning up an instance in Azure and doing
pings/traceroutes yourself.
But perhaps you could be doing this from your own IPs towards .

Have you configured your end in a manner that doesn't do MTU 1500 or that
relies on PMTUD to function?
If yes, well perhaps start there... you're not on a solid foundation.


On Sat, Jun 29, 2024 at 7:46 AM John Alcock  wrote:

> I have gotten pretty close to figuring out my issue with the Azure Cloud.
> When I advertise my routes through one specific upstream provider, I have
> an issue. If I pull my routes from them, all works well.
>
> I believe this is some type of MTU issue.  Could be filtering, but I doubt
> it.
>
> Is there an Azure Looking glass that I can use to originate pings and
> traceroutes?  My googlefu is weak and I haven't found it yet.
>
> With that information, I think I can help my upstream provider know where
> the problem lies.
>
> John Acock
> AS395437
>


Re: Peering Contact at AS16509

2024-02-19 Thread Lincoln Dale
Even if you don’t meet the port speed requirements for a PNI, there is
likely something that could work via an IX.

On Tue, Feb 20, 2024 at 12:57 PM Tim Burke  wrote:

> We reached out some time ago using the contact on PeeringDB and had no
> issue, but the amount of transit consumed to get to 16509 is substantial
> enough to make responding worth their while.
>
> Their minimum peering is 100G, with 400G preferred, so it’s very possible
> that if you’re not consuming anywhere close to 100G, the lack of response
> could correlate to a lack of interest on their side.
>
> > On Feb 18, 2024, at 13:09, Peter Potvin via NANOG 
> wrote:
> >
> > 
> > If a contact who manages North American peering at AS16509 could reach
> out off-list, that would be appreciated. Myself and a few colleagues have
> attempted to reach out via the contacts listed on PeeringDB on multiple
> occasions over the last couple of months and have not been successful in
> reaching someone.
> >
> > Kind regards,
> > Peter Potvin
>


Re: Anyone have contacts at the Amazon or OpenAI web spiders?

2024-02-13 Thread Lincoln Dale
On Wed, Feb 14, 2024 at 1:36 PM John Levine  wrote:

> If anyone has contacts at either I would appreciate it.


https://developer.amazon.com/support/amazonbot
probably returned as a result of searching "amazonbot" on your favourite
search engine.


Re: Interesting Ali Express web server behavior...

2023-12-11 Thread Lincoln Dale
On Sun, Dec 10, 2023 at 7:09 PM Christopher Hawker 
wrote:

> How big would a network need to get, in order to come close to
> exhausing RFC1918 address space? There are a total of 17,891,328 IP
> addresses between the 10/8 prefix, 172.16/12 space and 192.168/16 space. If
> one was to allocate 10 addresses to each host, that means it would require
> 1,789,132 hosts to exhaust the space.
>

See 30 minute mark of https://youtu.be/ARlBHmPy7Zc?t=1787.
We talked about this in a NANOG88 presentation too but had a bigger
timeslot to talk at AusNOG so we said a bit more about it.


Re: Alternative Re: ipv4/25s and above Re: 202211210951.AYC

2022-11-21 Thread Lincoln Dale
>
> > As someone who has been involved in the deployment of network gear
> > into class E space (extensively, for our own internal reasons, which
> > doesn't preclude public use of class E), "largely supported" !=
> > "universally supported".
> >
> > There remains hardware devices that blackhole class E traffic, for
> > which there is no fix. https://seclists.org/nanog/2021/Nov/272 is
> > where I list one of them. There are many, many other devices where we
> > have seen interesting behavior, some of which has been fixed, some of
> > which has not.
>
> And I am sure you would agree that un-reserving a decade ago would have
> more than likely resulted in a greatly improved situation now. Along the
> lines that doing so now could still result in a greatly improved
> situation a decade hence. Should we still need it.
>

It may well have helped (a decade ago) past-tense, but it isn't the reality
of today.

I've pointed out there is a non-zero number of existing devices, OSs,
things baked into silicon, even widely used BGP stacks today, that can't
currently use class E, and some of them will never be able to.
You seem to be suggesting that class E could be opened up as valid public
IPv4 space. My experience is that it would not be usable public IPv4
address space any time soon, if ever.

I'm not arguing that unreserving it today may address some of that. But it
will never address all of it.


cheers,

lincoln.


>


Re: Alternative Re: ipv4/25s and above Re: 202211210951.AYC

2022-11-21 Thread Lincoln Dale
On Tue, Nov 22, 2022 at 11:20 AM Joe Maimon  wrote:

> Indeed that is exactly what has been happening since the initial
> proposals regarding 240/4. To the extent that it is now largely
> supported or available across a wide variety of gear, much of it not
> even modern in any way.
>

As someone who has been involved in the deployment of network gear into
class E space (extensively, for our own internal reasons, which doesn't
preclude public use of class E), "largely supported" != "universally
supported".

There remains hardware devices that blackhole class E traffic, for which
there is no fix. https://seclists.org/nanog/2021/Nov/272 is where I list
one of them. There are many, many other devices where we have seen
interesting behavior, some of which has been fixed, some of which has not.


cheers,

lincoln.

>
>


Re: Longest prepend( 255 times) as path found

2022-08-26 Thread Lincoln Dale
>
> If I was running an edge device with a limited FIB, perhaps I might drop
> it to save memory. If I had beefier devices, perhaps I would just depref
> it.
>

Note that if said prefix either existed elsewhere with fewer prepends that
meant it 'won' bgp best-path selection, then it would not result in any
difference in the FIB.
The FIB is where the 'winning' prefixes go as fully-resolved things from
the RIB, but the RIB too would not have it, as an alternative won in BGP.
And even if you depref'd it in BGP, it would still be there in the
control-plane, consuming the same amount of RAM.

Reject it for excess prepends is likely the best choice.


Re: 400G forwarding - how does it work?

2022-07-25 Thread Lincoln Dale
On Mon, Jul 25, 2022 at 11:58 AM James Bensley 
wrote:

> On Mon, 25 Jul 2022 at 15:34, Lawrence Wobker  wrote:
> > This is the parallelism part.  I can take multiple instances of these
> memory/logic pipelines, and run them in parallel to increase the throughput.
> ...
> > I work on/with a chip that can forwarding about 10B packets per second…
> so if we go back to the order-of-magnitude number that I’m doing about
> “tens” of memory lookups for every one of those packets, we’re talking
> about something like a hundred BILLION total memory lookups… and since
> memory does NOT give me answers in 1 picoseconds… we get back to pipelining
> and parallelism.
>
> What level of parallelism is required to forward 10Bpps? Or 2Bpps like
> my J2 example :)
>

I suspect many folks know the exact answer for J2, but it's likely under
NDA to talk about said specific answer for a given thing.

Without being platform or device-specific, the core clock rate of many
network devices is often in a "goldilocks" zone of (today) 1 to 1.5GHz with
a goal of 1 packet forwarded 'per-clock'. As LJ described the pipeline that
doesn't mean a latency of 1 clock ingress-to-egress but rather that every
clock there is a forwarding decision from one 'pipeline', and the MPPS/BPPS
packet rate is achieved by having enough pipelines in parallel to achieve
that.
The number here is often "1" or "0.5" so you can work the number backwards.
(e.g. it emits a packet every clock, or every 2nd clock).

It's possible to build an ASIC/NPU to run a faster clock rate, but gets
back to what I'm hand-waving describing as "goldilocks". Look up power vs
frequency and you'll see its non-linear.
Just as CPUs can scale by adding more cores (vs increasing frequency),
~same holds true on network silicon, and you can go wider, multiple
pipelines. But its not 10K parallel slices, there's some parallel parts,
but there are multiple 'stages' on each doing different things.

Using your CPU comparison, there are some analogies here that do work:
 - you have multiple cpu cores that can do things in parallel -- analogous
to pipelines
 - they often share some common I/O (e.g. CPUs have PCIe, maybe sharing
some DRAM or LLC)  -- maybe some lookup engines, or centralized
buffer/memory
 - most modern CPUs are out-of-order execution, where under-the-covers, a
cache-miss or DRAM fetch has a disproportionate hit on performance, so its
hidden away from you as much as possible by speculative execution
out-of-order
-- no direct analogy to this one - it's unlikely most forwarding
pipelines do speculative execution like a general purpose CPU does - but
they definitely do 'other work' while waiting for a lookup to happen

A common-garden x86 is unlikely to achieve such a rate for a few different
reasons:
 - packets-in or packets-out go via DRAM then you need sufficient DRAM
(page opens/sec, DRAM bandwidth) to sustain at least one write and one read
per packet. Look closer at DRAM and see its speed, Pay attention to page
opens/sec, and what that consumes.
 - one 'trick' is to not DMA packets to DRAM but instead have it go into
SRAM of some form - e.g. Intel DDIO, ARM Cache Stashing, which at least
potentially saves you that DRAM write+read per packet
  - ... but then do e.g. a LPM lookup, and best case that is back to a
memory access/packet. Maybe it's in L1/L2/L3 cache, but likely at large
table sizes it isn't.
 - ... do more things to the packet (urpf lookups, counters) and it's yet
more lookups.

Software can achieve high rates, but note that a typical ASIC/NPU does on
the order of >100 separate lookups per packet, and 100 counter updates per
packet.
Just as forwarding in a ASIC or NPU is a series of tradeoffs, forwarding in
software on generic CPUs is also a series of tradeoffs.


cheers,

lincoln.


Re: Amazon peering revisited

2022-02-03 Thread Lincoln Dale
On Thu, Jan 27, 2022 at 8:22 AM Kelly Littlepage via NANOG 
wrote:

> Hi all, a nanog thread started on November 23, 2018 discussed the
> challenges of getting Amazon peering sessions turned up. Has anyone had
> luck since/does anyone have a contact they could refer me to — off-list or
> otherwise? The process of getting PNI in place with other CSPs was
> straightforward, but I haven't heard back from AWS after a month and
> several follow-ups. Our customers would really benefit from us getting this
> sorted.
>

There are many folks that here that are in AWS. Assuming you have followed
what is in https://aws.amazon.com/peering/ (and
https://aws.amazon.com/peering/policy/) then send me details privately
about what/when/who and I'll reach out internally to the relevant folks.


Re: Redeploying most of 127/8, 0/8, 240/4 and *.0 as unicast

2021-11-22 Thread Lincoln Dale
On Thu, Nov 18, 2021 at 1:21 PM John Gilmore  wrote:

> We have found no ASIC IP implementations that
> hardwire in assumptions about specific IP address ranges.  If you know
> of any, please let us know, otherwise, let's let that strawman rest.
>

There's at least one. Marvell PresteriaCX (its either PresteriaCX or DX,
forget which). It is in Juniper EX4500, among others.
Hardware-based bogon filter when L3 routing that cannot be disabled.


cheers,

lincoln.


Re: BGP peering strategies for smaller routers

2016-05-02 Thread lincoln dale
>
> You have to keep in mind there are two pools of memory on the router.


There's actually three.

1. Prefix (path) via BGP:  "show ip bgp ".  BGP will select the
'best' BGP path (can be multiple if ECMP) and send that through to the RIB.
2. RIB. "show ip route ".  routing table will show the path chosen
- and if there are backup paths etc, but may be recursive, e.g. prefix
a.b.c.d points at e.f.g.h which in turn points at i.j.k.l etc.
3. FIB. basically fully resolved prefixes.

What you otherwise say is correct - you could have N transit providers at
(1) providing lotsOfPaths x N providers which ultimately resolve to
lotsOfRoutes with up to N next-hops.
Much design effort goes into the routing stack to efficiently store
lotsOfPaths.

Can't speak for what an ASR1K does but suggest the OP talk to Cisco.


cheers,

lincoln.


Re: Arista Routing Solutions

2016-04-27 Thread lincoln dale
On Wed, Apr 27, 2016 at 4:41 PM, Peter Kranz  wrote:

> Curious if you have any thoughts on the longevity of the 7500R and
> 7280R survival's with IPv4 full tables? How full are you seeing the TCAM
> getting today (I'm assuming they are doing some form of selective
> download)? And if we are currently adding 100k/routes a year, how much
> longer will it last?
>

I can't speak for Ryan or Netflix, but we (Arista) are stating our
technique is good for 1M+ prefixes of IPv4+v6 combined.  Internet right now
is at between 575K and 635K IPv4 and between 28K and 35K IPv6 right now and
its taken many many many years to get there, its foreseeable there's many
years of growth there.
Note that we don't do static partitioning between IPv4 and IPv6 and our how
we do it has more headroom in it than we state, so we're confident.  We're
also not doing "selective download", this is every prefix in current table.

What I can share is two different scenarios today:

1. a traditional internet edge router with multiple transit/peer providers,
Internet as of right now, and a cloud customer  that also has hundreds of
thousands of prefixes internally
Ryan's case might be different to others, but here are three scenarios
deployed today: 1. a large hosting provider with full tables and many
internal prefixes, 2. a cloud deployment.

The former is at 854K IPv4 and 35K IPv6 of 'internet' as of a few weeks ago:

7500R# show ip route summary | grep Total
Total Routes  575127
7500R# show ipv6 route summary | grep Total
 Total Routes  35511
7500R# show hardware capacity | grep Routing
Forwarding Resources Usage

TableFeatureChip Used   Used  Free   Committed   Best
Case   High
  Entries(%)   Entries Entries
Max  Watermark

 Entries
 -- -  -- - ---
--- -
Routing  Resource1  815   39% 1233   0
 2048817
Routing  Resource2  469   45%  555   0
 1024471
Routing  Resource314074   42%18694   0
32768  14098
Routing  V4Routes696364   88%89753   0
 786432 697110
Routing  V6Routes 00%89753   0
 786432  0


The latter is at 854K IPv4 + 45K IPv6:

7500R# show ip route summary | grep Total
Total Routes  854393
7500R# show ipv6 route summary | grep Total
 Total Routes  45678
7500R# show hardware capacity | grep Routing
Forwarding Resources Usage

TableFeatureChip Used   Used  Free   Committed   Best
Case   High
  Entries(%)   Entries Entries
Max  Watermark

 Entries
 -- -  -- - ---
--- -
Routing  Resource1   131964%   729   0
 2048   1320
Routing  Resource280979%   215   0
 1024814
Routing  Resource3  2410273%  8666   0
32768  24104
Routing  V4Routes  64433683%124302   0
 786432 644364
Routing  V6Routes   1779212%124302   0
 786432  17795


One could ask Geoff Huston where he thinks combined IPv4+v6 will exceed 1M
entries but I would expect it to be many years away based on
http://bgp.potaroo.net/ and we'd welcome discussions about if it you want
to know our opinion [*] on how we're doing it will scale.  What we're doing
doesn't explode at 1M, there's headroom in it hence why we say "1M+". Again
we're happy to talk about it, just ask your friendly arista person and if
you don't know who to ask, ask me and i'll put you in touch with the right
folks.


cheers,

lincoln.  [*] l...@arista.com


Re: Arista Routing Solutions

2016-04-24 Thread lincoln dale
>
> > High Touch / Low Touch
>
> High touch means very general purpose NPU, with off-chip memory. Low
> touch means usually ASIC or otherwise simplified pipeline and on-chip
> memory. Granted Jericho can support off-chip memory too.
>
> L3 switches are canonical example of low touch. EZchip, Trio, Solar,
> FP3 etc are examples of canonical high touch NPUs. What low touch can
> do, it can do fast and economically.
>

Your analogy makes some sense, but what you classify as high-touch /
low-touch is just one dimension and could do with a more modern update.

I'd suggest a more modern analogy would be that historically the difference
between a L3 switch and a router is the former has a fixed processing
pipeline, limited buffering (most are just on-chip buffer) and limited
table sizes.
But more modern packet processors with fixed pipelines often have blocks or
sections that are programmable or flexible. e.g. with a flexible packet
parser, its possible to support new overlay or tunnel mechanisms, flexible
key generation makes it possible to reuse different table resources in
different ways, flexible rewrite engine means egress encap or tunnels or
logic can be done.
There's also often more capacity for recirc or additional stages as
required.

Specific to Jericho, the underlying silicon has all these characteristics.
We [*] used the flexibility in all of the stages both now and in previous
iterations (Arad) to add new features/functionality that wasn't natively
there to start with. And it uses a combination of on-chip & off-chip
buffering with VoQ

Its also not only Arista that call it a router cisco do too (NCS5K5).

Sure, using a NPU for packet processing essentially provided a 100%
programmable packet forwarding pipeline, and maybe even a "run to
completion" kind of packet pipeline where the pipeline could have a long
tail of processing. However, engineering is a zero sum game, and to do that
means you sacrifice power or density, or most often, both.

I agree the lines have been blurred as to the characteristics, and we'd
openly state that its not going to be useful in every use case of where a
router is deployed, but for specific use cases, it fits the bill and has
compelling density, performance and cost dynamics.


To the OPs question, there are people running with this in EFT and others
in production.
My suggestion would be that if you think its of interest, reach out to your
friendly Arista person [*] and try it out or talk through what it is you're
after. We are generally a friendly bunch and often we can be quite creative
in enabling things in different ways to old.



> Yeah they are certainly much behind in features, but if you don't
> need those features, it's probably actually an advantage. For my
> use-cases Arista's MPLS stack is not there.


We've historically had the data-plane but not the control-plane. Thats a
work in progress.
Again, often there are creative solutions to ways of doing things that
aren't necessarily the same as old ways but achieve the same end result.


cheers,

lincoln.
[*] disclosure: i work on said products described l...@arista.com.


Re: New Switches with Broadcom StrataDNX

2016-04-18 Thread lincoln dale
Yes. We also have 1M+ FIB support day one too - hence the letter 'R'
denoting the evolution with 3rd generation of its evolution to internet
edge/router use cases.

Not sure what other vendors are doing but I doubt others are yet shipping
large table support.
(there's more to it than just the underlying native silicon)


cheers,

lincoln. (l...@arista.com)


On Mon, Apr 18, 2016 at 11:01 AM, Colton Conor 
wrote:

> As a follow up to this post, it look like the Arista 7500R series has this
> new chip inside of it.
>
> On Wed, Jan 20, 2016 at 9:34 AM, Jeff Tantsura  >
> wrote:
>
> > That's right, logic is in programming chips, not their property. You just
> > need to know what to program ;-)
> >
> > Regards,
> > Jeff
> >
> > > On Jan 19, 2016, at 10:10 PM, Mark Tinka  wrote:
> > >
> > >
> > >
> > >> On 20/Jan/16 00:17, Phil Bedard wrote:
> > >>
> > >> Good point, there are many people looking at what I call FIB
> > optimization right now.  The key is having the programmability on the
> > device to make it happen.  Juniper/Cisco support it using policies to
> > filter RIB->FIB and I believe both also do per-NPU/PFE localized FIBs
> now.
> > I am not sure if that’s something supported on this new Broadcom chipset.
> > Depends on your network of course and where you are looking to position
> the
> > router.
> > >
> > > I don't think the FIB needs to have specific support for selective
> > > programming.
> > >
> > > I think that comes in the code to instruct the control plane what it
> > > should download to the FIB.
> > >
> > > Cisco's and Juniper's support of this is on FIB that has been in
> > > production long before the feature became available. It was just added
> > > to code.
> > >
> > > Mark.
> >
>


Re: 10G switchrecommendaton

2012-02-09 Thread lincoln dale
hi George,

IGMPv3 snooping has been supported since EOS 4.7.  Its enabled by default
in EOS 4.8.x.

In terms of specifics, there is support for both IGMPv3 snooping & IGMPv3
querier. There isn't currently support for IGMPv3 snooping querier.


cheers,

lincoln.

On Fri, Feb 10, 2012 at 8:17 AM, George Bonser  wrote:

>  Feb  9 07:42:21 SJC-AGS-01 IgmpSnooping:
> %IGMPSNOOPING-4-IGMPV3_UNSUPPORTED: IGMPv3 querier detected on interface
> Port-Channel1 (message repeated 34 times in 625.028 secs)
>
> ** **
>
> SJC-AGS-01#sho ver
>
> Arista DCS-7124S-F
>
> Hardware version:06.02
>
> Serial number:   JSH10130054
>
> System MAC address:  001c.7308.752f
>
> ** **
>
> Software image version: 4.6.4
>
> Architecture:   i386
>
> Internal build version: 4.6.4-434606.EOS464
>
> ** **
>
> Sure, we can discuss it.
>
> ** **
>
> ** **
>
> ** **
>
> *From:* lincoln dale [mailto:l...@interlink.com.au]
> *Sent:* Thursday, February 09, 2012 1:13 PM
> *To:* George Bonser
> *Cc:* Leigh Porter; nanog list
> *Subject:* Re: 10G switchrecommendaton
>
> ** **
>
> On Fri, Feb 10, 2012 at 7:24 AM, George Bonser  wrote:
> 
>
> It's pretty good gear.  The only problem I've had with it is the
> limitation of IGMP not working on mLAG VLANs.
>
>
> IGMP should work just fine with MLAG.  IGMP state is sync'd between the
> MLAG pair. Happy to talk about this more off-list if you wish.
>
>
> cheers,
>
> lincoln.
> (l...@aristanetworks.com)
>


Re: 10G switchrecommendaton

2012-02-09 Thread lincoln dale
On Fri, Feb 10, 2012 at 7:24 AM, George Bonser  wrote:

> It's pretty good gear.  The only problem I've had with it is the
> limitation of IGMP not working on mLAG VLANs.
>

IGMP should work just fine with MLAG.  IGMP state is sync'd between the
MLAG pair. Happy to talk about this more off-list if you wish.


cheers,

lincoln.
(l...@aristanetworks.com)


RE: NTP Md5 or AutoKey?

2008-11-04 Thread Lincoln Dale
> There is an emerging need to distribute highly accurate time
> information over IP and over MPLS packet switched networks (PSNs).

good of you to ask. it exists today.
http://ieee1588.nist.gov/


cheers,

lincoln.





RE: Best utilizing fat long pipes and large file transfer

2008-06-12 Thread Lincoln Dale

> I'm looking for input on the best practices for sending large files over
> a long fat pipe between facilities (gigabit private circuit, ~20ms RTT).

providing you have RFC1323 type extensions enabled on a semi-decent OS, a 4MB
TCP window should be more than sufficient to fill a GbE pipe over 30msec.

with a modified TCP stack, that uses TCP window sizes up to 32MB, i've worked
with numerous customers to achieve wire-rate GbE async replication for
storage-arrays with FCIP.

the modifications to TCP were mostly to adjust how it reacts to packet loss,
e.g. don't "halve the window".
the intent of those modifications is that it doesn't use the "greater internet"
but is more suited for private connections within an enterprise customer
environment.

that is used in production today on many Cisco MDS 9xxx FC switch environments.


> I'd like to avoid modifying TCP windows and options on end hosts where
> possible (I have a lot of them). I've seen products that work as
> "transfer stations" using "reliable UDP" to get around the windowing
> problem.

given you don't want to modify all your hosts, you could 'proxy' said TCP
connections via 'socat' or 'netcat++'.


cheers,

lincoln.




RE: too many variables

2007-08-09 Thread Lincoln Dale

>  I asked this question to a couple of folks:
> 
>   "at the current churn rate/ration, at what size doe the FIB need to
>  be before it will not converge?"
> 
>  and got these answers:
> 
> - jabber log -
> a fine question, has been asked many times, and afaik noone has
> provided any empirically grounded answer.
> 
> a few realities hinder our ability to answer this question.
> 
> (1) there are technology factors we can't predict, e.g.,
> moore's law effects on hardware development

Moore's Law is only half of the equation. It is the part that deals with route
churn & the rate at which those can be processed (both peer notification and
control-plane programming data-plane in the form of FIB changes).

Moore's Law almost has zero relevance to FIB sizes. It doesn't map to growth in
SRAM or innovations/mechanisms for how to reduce the requirements for SRAM
while growing FIB sizes.


cheers,

lincoln.