Re: Azure Looking Glass
Presumably nothing stops you spinning up an instance in Azure and doing pings/traceroutes yourself. But perhaps you could be doing this from your own IPs towards . Have you configured your end in a manner that doesn't do MTU 1500 or that relies on PMTUD to function? If yes, well perhaps start there... you're not on a solid foundation. On Sat, Jun 29, 2024 at 7:46 AM John Alcock wrote: > I have gotten pretty close to figuring out my issue with the Azure Cloud. > When I advertise my routes through one specific upstream provider, I have > an issue. If I pull my routes from them, all works well. > > I believe this is some type of MTU issue. Could be filtering, but I doubt > it. > > Is there an Azure Looking glass that I can use to originate pings and > traceroutes? My googlefu is weak and I haven't found it yet. > > With that information, I think I can help my upstream provider know where > the problem lies. > > John Acock > AS395437 >
Re: Peering Contact at AS16509
Even if you don’t meet the port speed requirements for a PNI, there is likely something that could work via an IX. On Tue, Feb 20, 2024 at 12:57 PM Tim Burke wrote: > We reached out some time ago using the contact on PeeringDB and had no > issue, but the amount of transit consumed to get to 16509 is substantial > enough to make responding worth their while. > > Their minimum peering is 100G, with 400G preferred, so it’s very possible > that if you’re not consuming anywhere close to 100G, the lack of response > could correlate to a lack of interest on their side. > > > On Feb 18, 2024, at 13:09, Peter Potvin via NANOG > wrote: > > > > > > If a contact who manages North American peering at AS16509 could reach > out off-list, that would be appreciated. Myself and a few colleagues have > attempted to reach out via the contacts listed on PeeringDB on multiple > occasions over the last couple of months and have not been successful in > reaching someone. > > > > Kind regards, > > Peter Potvin >
Re: Anyone have contacts at the Amazon or OpenAI web spiders?
On Wed, Feb 14, 2024 at 1:36 PM John Levine wrote: > If anyone has contacts at either I would appreciate it. https://developer.amazon.com/support/amazonbot probably returned as a result of searching "amazonbot" on your favourite search engine.
Re: Interesting Ali Express web server behavior...
On Sun, Dec 10, 2023 at 7:09 PM Christopher Hawker wrote: > How big would a network need to get, in order to come close to > exhausing RFC1918 address space? There are a total of 17,891,328 IP > addresses between the 10/8 prefix, 172.16/12 space and 192.168/16 space. If > one was to allocate 10 addresses to each host, that means it would require > 1,789,132 hosts to exhaust the space. > See 30 minute mark of https://youtu.be/ARlBHmPy7Zc?t=1787. We talked about this in a NANOG88 presentation too but had a bigger timeslot to talk at AusNOG so we said a bit more about it.
Re: Alternative Re: ipv4/25s and above Re: 202211210951.AYC
> > > As someone who has been involved in the deployment of network gear > > into class E space (extensively, for our own internal reasons, which > > doesn't preclude public use of class E), "largely supported" != > > "universally supported". > > > > There remains hardware devices that blackhole class E traffic, for > > which there is no fix. https://seclists.org/nanog/2021/Nov/272 is > > where I list one of them. There are many, many other devices where we > > have seen interesting behavior, some of which has been fixed, some of > > which has not. > > And I am sure you would agree that un-reserving a decade ago would have > more than likely resulted in a greatly improved situation now. Along the > lines that doing so now could still result in a greatly improved > situation a decade hence. Should we still need it. > It may well have helped (a decade ago) past-tense, but it isn't the reality of today. I've pointed out there is a non-zero number of existing devices, OSs, things baked into silicon, even widely used BGP stacks today, that can't currently use class E, and some of them will never be able to. You seem to be suggesting that class E could be opened up as valid public IPv4 space. My experience is that it would not be usable public IPv4 address space any time soon, if ever. I'm not arguing that unreserving it today may address some of that. But it will never address all of it. cheers, lincoln. >
Re: Alternative Re: ipv4/25s and above Re: 202211210951.AYC
On Tue, Nov 22, 2022 at 11:20 AM Joe Maimon wrote: > Indeed that is exactly what has been happening since the initial > proposals regarding 240/4. To the extent that it is now largely > supported or available across a wide variety of gear, much of it not > even modern in any way. > As someone who has been involved in the deployment of network gear into class E space (extensively, for our own internal reasons, which doesn't preclude public use of class E), "largely supported" != "universally supported". There remains hardware devices that blackhole class E traffic, for which there is no fix. https://seclists.org/nanog/2021/Nov/272 is where I list one of them. There are many, many other devices where we have seen interesting behavior, some of which has been fixed, some of which has not. cheers, lincoln. > >
Re: Longest prepend( 255 times) as path found
> > If I was running an edge device with a limited FIB, perhaps I might drop > it to save memory. If I had beefier devices, perhaps I would just depref > it. > Note that if said prefix either existed elsewhere with fewer prepends that meant it 'won' bgp best-path selection, then it would not result in any difference in the FIB. The FIB is where the 'winning' prefixes go as fully-resolved things from the RIB, but the RIB too would not have it, as an alternative won in BGP. And even if you depref'd it in BGP, it would still be there in the control-plane, consuming the same amount of RAM. Reject it for excess prepends is likely the best choice.
Re: 400G forwarding - how does it work?
On Mon, Jul 25, 2022 at 11:58 AM James Bensley wrote: > On Mon, 25 Jul 2022 at 15:34, Lawrence Wobker wrote: > > This is the parallelism part. I can take multiple instances of these > memory/logic pipelines, and run them in parallel to increase the throughput. > ... > > I work on/with a chip that can forwarding about 10B packets per second… > so if we go back to the order-of-magnitude number that I’m doing about > “tens” of memory lookups for every one of those packets, we’re talking > about something like a hundred BILLION total memory lookups… and since > memory does NOT give me answers in 1 picoseconds… we get back to pipelining > and parallelism. > > What level of parallelism is required to forward 10Bpps? Or 2Bpps like > my J2 example :) > I suspect many folks know the exact answer for J2, but it's likely under NDA to talk about said specific answer for a given thing. Without being platform or device-specific, the core clock rate of many network devices is often in a "goldilocks" zone of (today) 1 to 1.5GHz with a goal of 1 packet forwarded 'per-clock'. As LJ described the pipeline that doesn't mean a latency of 1 clock ingress-to-egress but rather that every clock there is a forwarding decision from one 'pipeline', and the MPPS/BPPS packet rate is achieved by having enough pipelines in parallel to achieve that. The number here is often "1" or "0.5" so you can work the number backwards. (e.g. it emits a packet every clock, or every 2nd clock). It's possible to build an ASIC/NPU to run a faster clock rate, but gets back to what I'm hand-waving describing as "goldilocks". Look up power vs frequency and you'll see its non-linear. Just as CPUs can scale by adding more cores (vs increasing frequency), ~same holds true on network silicon, and you can go wider, multiple pipelines. But its not 10K parallel slices, there's some parallel parts, but there are multiple 'stages' on each doing different things. Using your CPU comparison, there are some analogies here that do work: - you have multiple cpu cores that can do things in parallel -- analogous to pipelines - they often share some common I/O (e.g. CPUs have PCIe, maybe sharing some DRAM or LLC) -- maybe some lookup engines, or centralized buffer/memory - most modern CPUs are out-of-order execution, where under-the-covers, a cache-miss or DRAM fetch has a disproportionate hit on performance, so its hidden away from you as much as possible by speculative execution out-of-order -- no direct analogy to this one - it's unlikely most forwarding pipelines do speculative execution like a general purpose CPU does - but they definitely do 'other work' while waiting for a lookup to happen A common-garden x86 is unlikely to achieve such a rate for a few different reasons: - packets-in or packets-out go via DRAM then you need sufficient DRAM (page opens/sec, DRAM bandwidth) to sustain at least one write and one read per packet. Look closer at DRAM and see its speed, Pay attention to page opens/sec, and what that consumes. - one 'trick' is to not DMA packets to DRAM but instead have it go into SRAM of some form - e.g. Intel DDIO, ARM Cache Stashing, which at least potentially saves you that DRAM write+read per packet - ... but then do e.g. a LPM lookup, and best case that is back to a memory access/packet. Maybe it's in L1/L2/L3 cache, but likely at large table sizes it isn't. - ... do more things to the packet (urpf lookups, counters) and it's yet more lookups. Software can achieve high rates, but note that a typical ASIC/NPU does on the order of >100 separate lookups per packet, and 100 counter updates per packet. Just as forwarding in a ASIC or NPU is a series of tradeoffs, forwarding in software on generic CPUs is also a series of tradeoffs. cheers, lincoln.
Re: Amazon peering revisited
On Thu, Jan 27, 2022 at 8:22 AM Kelly Littlepage via NANOG wrote: > Hi all, a nanog thread started on November 23, 2018 discussed the > challenges of getting Amazon peering sessions turned up. Has anyone had > luck since/does anyone have a contact they could refer me to — off-list or > otherwise? The process of getting PNI in place with other CSPs was > straightforward, but I haven't heard back from AWS after a month and > several follow-ups. Our customers would really benefit from us getting this > sorted. > There are many folks that here that are in AWS. Assuming you have followed what is in https://aws.amazon.com/peering/ (and https://aws.amazon.com/peering/policy/) then send me details privately about what/when/who and I'll reach out internally to the relevant folks.
Re: Redeploying most of 127/8, 0/8, 240/4 and *.0 as unicast
On Thu, Nov 18, 2021 at 1:21 PM John Gilmore wrote: > We have found no ASIC IP implementations that > hardwire in assumptions about specific IP address ranges. If you know > of any, please let us know, otherwise, let's let that strawman rest. > There's at least one. Marvell PresteriaCX (its either PresteriaCX or DX, forget which). It is in Juniper EX4500, among others. Hardware-based bogon filter when L3 routing that cannot be disabled. cheers, lincoln.
Re: BGP peering strategies for smaller routers
> > You have to keep in mind there are two pools of memory on the router. There's actually three. 1. Prefix (path) via BGP: "show ip bgp ". BGP will select the 'best' BGP path (can be multiple if ECMP) and send that through to the RIB. 2. RIB. "show ip route ". routing table will show the path chosen - and if there are backup paths etc, but may be recursive, e.g. prefix a.b.c.d points at e.f.g.h which in turn points at i.j.k.l etc. 3. FIB. basically fully resolved prefixes. What you otherwise say is correct - you could have N transit providers at (1) providing lotsOfPaths x N providers which ultimately resolve to lotsOfRoutes with up to N next-hops. Much design effort goes into the routing stack to efficiently store lotsOfPaths. Can't speak for what an ASR1K does but suggest the OP talk to Cisco. cheers, lincoln.
Re: Arista Routing Solutions
On Wed, Apr 27, 2016 at 4:41 PM, Peter Kranz wrote: > Curious if you have any thoughts on the longevity of the 7500R and > 7280R survival's with IPv4 full tables? How full are you seeing the TCAM > getting today (I'm assuming they are doing some form of selective > download)? And if we are currently adding 100k/routes a year, how much > longer will it last? > I can't speak for Ryan or Netflix, but we (Arista) are stating our technique is good for 1M+ prefixes of IPv4+v6 combined. Internet right now is at between 575K and 635K IPv4 and between 28K and 35K IPv6 right now and its taken many many many years to get there, its foreseeable there's many years of growth there. Note that we don't do static partitioning between IPv4 and IPv6 and our how we do it has more headroom in it than we state, so we're confident. We're also not doing "selective download", this is every prefix in current table. What I can share is two different scenarios today: 1. a traditional internet edge router with multiple transit/peer providers, Internet as of right now, and a cloud customer that also has hundreds of thousands of prefixes internally Ryan's case might be different to others, but here are three scenarios deployed today: 1. a large hosting provider with full tables and many internal prefixes, 2. a cloud deployment. The former is at 854K IPv4 and 35K IPv6 of 'internet' as of a few weeks ago: 7500R# show ip route summary | grep Total Total Routes 575127 7500R# show ipv6 route summary | grep Total Total Routes 35511 7500R# show hardware capacity | grep Routing Forwarding Resources Usage TableFeatureChip Used Used Free Committed Best Case High Entries(%) Entries Entries Max Watermark Entries -- - -- - --- --- - Routing Resource1 815 39% 1233 0 2048817 Routing Resource2 469 45% 555 0 1024471 Routing Resource314074 42%18694 0 32768 14098 Routing V4Routes696364 88%89753 0 786432 697110 Routing V6Routes 00%89753 0 786432 0 The latter is at 854K IPv4 + 45K IPv6: 7500R# show ip route summary | grep Total Total Routes 854393 7500R# show ipv6 route summary | grep Total Total Routes 45678 7500R# show hardware capacity | grep Routing Forwarding Resources Usage TableFeatureChip Used Used Free Committed Best Case High Entries(%) Entries Entries Max Watermark Entries -- - -- - --- --- - Routing Resource1 131964% 729 0 2048 1320 Routing Resource280979% 215 0 1024814 Routing Resource3 2410273% 8666 0 32768 24104 Routing V4Routes 64433683%124302 0 786432 644364 Routing V6Routes 1779212%124302 0 786432 17795 One could ask Geoff Huston where he thinks combined IPv4+v6 will exceed 1M entries but I would expect it to be many years away based on http://bgp.potaroo.net/ and we'd welcome discussions about if it you want to know our opinion [*] on how we're doing it will scale. What we're doing doesn't explode at 1M, there's headroom in it hence why we say "1M+". Again we're happy to talk about it, just ask your friendly arista person and if you don't know who to ask, ask me and i'll put you in touch with the right folks. cheers, lincoln. [*] l...@arista.com
Re: Arista Routing Solutions
> > > High Touch / Low Touch > > High touch means very general purpose NPU, with off-chip memory. Low > touch means usually ASIC or otherwise simplified pipeline and on-chip > memory. Granted Jericho can support off-chip memory too. > > L3 switches are canonical example of low touch. EZchip, Trio, Solar, > FP3 etc are examples of canonical high touch NPUs. What low touch can > do, it can do fast and economically. > Your analogy makes some sense, but what you classify as high-touch / low-touch is just one dimension and could do with a more modern update. I'd suggest a more modern analogy would be that historically the difference between a L3 switch and a router is the former has a fixed processing pipeline, limited buffering (most are just on-chip buffer) and limited table sizes. But more modern packet processors with fixed pipelines often have blocks or sections that are programmable or flexible. e.g. with a flexible packet parser, its possible to support new overlay or tunnel mechanisms, flexible key generation makes it possible to reuse different table resources in different ways, flexible rewrite engine means egress encap or tunnels or logic can be done. There's also often more capacity for recirc or additional stages as required. Specific to Jericho, the underlying silicon has all these characteristics. We [*] used the flexibility in all of the stages both now and in previous iterations (Arad) to add new features/functionality that wasn't natively there to start with. And it uses a combination of on-chip & off-chip buffering with VoQ Its also not only Arista that call it a router cisco do too (NCS5K5). Sure, using a NPU for packet processing essentially provided a 100% programmable packet forwarding pipeline, and maybe even a "run to completion" kind of packet pipeline where the pipeline could have a long tail of processing. However, engineering is a zero sum game, and to do that means you sacrifice power or density, or most often, both. I agree the lines have been blurred as to the characteristics, and we'd openly state that its not going to be useful in every use case of where a router is deployed, but for specific use cases, it fits the bill and has compelling density, performance and cost dynamics. To the OPs question, there are people running with this in EFT and others in production. My suggestion would be that if you think its of interest, reach out to your friendly Arista person [*] and try it out or talk through what it is you're after. We are generally a friendly bunch and often we can be quite creative in enabling things in different ways to old. > Yeah they are certainly much behind in features, but if you don't > need those features, it's probably actually an advantage. For my > use-cases Arista's MPLS stack is not there. We've historically had the data-plane but not the control-plane. Thats a work in progress. Again, often there are creative solutions to ways of doing things that aren't necessarily the same as old ways but achieve the same end result. cheers, lincoln. [*] disclosure: i work on said products described l...@arista.com.
Re: New Switches with Broadcom StrataDNX
Yes. We also have 1M+ FIB support day one too - hence the letter 'R' denoting the evolution with 3rd generation of its evolution to internet edge/router use cases. Not sure what other vendors are doing but I doubt others are yet shipping large table support. (there's more to it than just the underlying native silicon) cheers, lincoln. (l...@arista.com) On Mon, Apr 18, 2016 at 11:01 AM, Colton Conor wrote: > As a follow up to this post, it look like the Arista 7500R series has this > new chip inside of it. > > On Wed, Jan 20, 2016 at 9:34 AM, Jeff Tantsura > > wrote: > > > That's right, logic is in programming chips, not their property. You just > > need to know what to program ;-) > > > > Regards, > > Jeff > > > > > On Jan 19, 2016, at 10:10 PM, Mark Tinka wrote: > > > > > > > > > > > >> On 20/Jan/16 00:17, Phil Bedard wrote: > > >> > > >> Good point, there are many people looking at what I call FIB > > optimization right now. The key is having the programmability on the > > device to make it happen. Juniper/Cisco support it using policies to > > filter RIB->FIB and I believe both also do per-NPU/PFE localized FIBs > now. > > I am not sure if that’s something supported on this new Broadcom chipset. > > Depends on your network of course and where you are looking to position > the > > router. > > > > > > I don't think the FIB needs to have specific support for selective > > > programming. > > > > > > I think that comes in the code to instruct the control plane what it > > > should download to the FIB. > > > > > > Cisco's and Juniper's support of this is on FIB that has been in > > > production long before the feature became available. It was just added > > > to code. > > > > > > Mark. > > >
Re: 10G switchrecommendaton
hi George, IGMPv3 snooping has been supported since EOS 4.7. Its enabled by default in EOS 4.8.x. In terms of specifics, there is support for both IGMPv3 snooping & IGMPv3 querier. There isn't currently support for IGMPv3 snooping querier. cheers, lincoln. On Fri, Feb 10, 2012 at 8:17 AM, George Bonser wrote: > Feb 9 07:42:21 SJC-AGS-01 IgmpSnooping: > %IGMPSNOOPING-4-IGMPV3_UNSUPPORTED: IGMPv3 querier detected on interface > Port-Channel1 (message repeated 34 times in 625.028 secs) > > ** ** > > SJC-AGS-01#sho ver > > Arista DCS-7124S-F > > Hardware version:06.02 > > Serial number: JSH10130054 > > System MAC address: 001c.7308.752f > > ** ** > > Software image version: 4.6.4 > > Architecture: i386 > > Internal build version: 4.6.4-434606.EOS464 > > ** ** > > Sure, we can discuss it. > > ** ** > > ** ** > > ** ** > > *From:* lincoln dale [mailto:l...@interlink.com.au] > *Sent:* Thursday, February 09, 2012 1:13 PM > *To:* George Bonser > *Cc:* Leigh Porter; nanog list > *Subject:* Re: 10G switchrecommendaton > > ** ** > > On Fri, Feb 10, 2012 at 7:24 AM, George Bonser wrote: > > > It's pretty good gear. The only problem I've had with it is the > limitation of IGMP not working on mLAG VLANs. > > > IGMP should work just fine with MLAG. IGMP state is sync'd between the > MLAG pair. Happy to talk about this more off-list if you wish. > > > cheers, > > lincoln. > (l...@aristanetworks.com) >
Re: 10G switchrecommendaton
On Fri, Feb 10, 2012 at 7:24 AM, George Bonser wrote: > It's pretty good gear. The only problem I've had with it is the > limitation of IGMP not working on mLAG VLANs. > IGMP should work just fine with MLAG. IGMP state is sync'd between the MLAG pair. Happy to talk about this more off-list if you wish. cheers, lincoln. (l...@aristanetworks.com)
RE: NTP Md5 or AutoKey?
> There is an emerging need to distribute highly accurate time > information over IP and over MPLS packet switched networks (PSNs). good of you to ask. it exists today. http://ieee1588.nist.gov/ cheers, lincoln.
RE: Best utilizing fat long pipes and large file transfer
> I'm looking for input on the best practices for sending large files over > a long fat pipe between facilities (gigabit private circuit, ~20ms RTT). providing you have RFC1323 type extensions enabled on a semi-decent OS, a 4MB TCP window should be more than sufficient to fill a GbE pipe over 30msec. with a modified TCP stack, that uses TCP window sizes up to 32MB, i've worked with numerous customers to achieve wire-rate GbE async replication for storage-arrays with FCIP. the modifications to TCP were mostly to adjust how it reacts to packet loss, e.g. don't "halve the window". the intent of those modifications is that it doesn't use the "greater internet" but is more suited for private connections within an enterprise customer environment. that is used in production today on many Cisco MDS 9xxx FC switch environments. > I'd like to avoid modifying TCP windows and options on end hosts where > possible (I have a lot of them). I've seen products that work as > "transfer stations" using "reliable UDP" to get around the windowing > problem. given you don't want to modify all your hosts, you could 'proxy' said TCP connections via 'socat' or 'netcat++'. cheers, lincoln.
RE: too many variables
> I asked this question to a couple of folks: > > "at the current churn rate/ration, at what size doe the FIB need to > be before it will not converge?" > > and got these answers: > > - jabber log - > a fine question, has been asked many times, and afaik noone has > provided any empirically grounded answer. > > a few realities hinder our ability to answer this question. > > (1) there are technology factors we can't predict, e.g., > moore's law effects on hardware development Moore's Law is only half of the equation. It is the part that deals with route churn & the rate at which those can be processed (both peer notification and control-plane programming data-plane in the form of FIB changes). Moore's Law almost has zero relevance to FIB sizes. It doesn't map to growth in SRAM or innovations/mechanisms for how to reduce the requirements for SRAM while growing FIB sizes. cheers, lincoln.