Re: Input for Draft Document on Terminology in BGP/Global Routing
On Thu, 3 Oct 2024 at 07:04, Jeff Behrns via NANOG wrote > This seems like a total misuse of the RFC framework / process and more a grab > at publicity, but I'll play along...bogon. You should include the term > "bogon". Someday when I'm done keeping actual production networks alive, I > may wade into the morass of IETF & IEEE and work on trimming the fat...or > maybe just retire. It's a tough call. Not saying I agree or disagree, what is the definition of appropriate use and how does this particular draft violate in comparison to the existing corpus? Someone might want to argue RFCs are for technical implementations, but there are increasingly many RFCs with no relevance to technical implementations at all, which are significantly more soft ball than the work proposed here. -- ++ytti
Re: Server rental inside of One Wilshire in Los Angeles
On Wed, 7 Aug 2024 at 20:05, Christopher Morrow wrote: > I'd bet the real answer is that someone wants to connect a commodity > server to an IX and pretend to be > some network/asn and then do some not terrific things with that setup :( > > seen this in AMSIX and DECIX ... don't know that I've not seen it also > at 1-wilshire ;( This seems very plausible, considering the chosen demo. Thanks. -- ++ytti
Re: Server rental inside of One Wilshire in Los Angeles
On Wed, 7 Aug 2024 at 17:41, Brandon Martin wrote: > Among the other reasons folks have given, the 10GBASE-T PHY has added > latency beyond the basic packetization/serialization delay inherent to > Ethernet due to the use of a relatively long line code plus LDPC. It's > not much (2-4us which is still less than 1000BASE-T > serialization+packetization latency with larger packets), but it's more > than 10GBASE-R PHYs. The HFT guys may care, but most other folks > probably don't give a hoot. I think this is the least bad explanation, some explanations are that copper may not be available, but that doesn't explain preference. Nor do I think wattage/heat explains preference, as it's hosted, so customers probably shouldn't care. Latency could very well explain preference, but it seems doubtful, when hardware is so underspecified, surely if you are talking in single microseconds or nanoseconds budget, the actual hardware becomes very important, so i think lack of specificity there implies it's not about latency. -- ++ytti
Re: Server rental inside of One Wilshire in Los Angeles
I can't help you, but I'm just awfully curious and must ask, why specifically optical ports? Seems very strange and a limiting requirement for upside that my imagination struggles to find. On Tue, 6 Aug 2024 at 21:51, Walt wrote: > > Asking for a friend, please contact me off list. > > > > The ask: > > > > Multi-core server + 32G memory (or 64G) > > more than 1T storage space. > > At least 4 10GE optical ports. > > Linux OS > > 1 year term > > > > Thanks > > > > Walt > > > > -- ++ytti
Re: TCP-AO for BGP Peering?
I don't think that URL explains how commonly it is used. In my experience TCP-AO use is extremely limited, partly because it's super new in practice. Juniper had it for a long time, but it was pre-standard even years after the standard was published, which probably didn't matter much, as no one else had it at all until somewhat recently. I suspect this type of order of events may have led many people to look into TCP-AO early on, and decided correctly it was not operationally feasible, and that has stuck. On Wed, 12 Jun 2024 at 13:57, Marco Paesani wrote: > > Hi, > you can start from here: > https://www.juniper.net/documentation/us/en/software/junos/transport-ip/topics/topic-map/tcp-configure-ao-bgp-ldp.html > > Regards, > > > - > > Marco Paesani > > > > > Skype: mpaesani > Mobile: +39 348 6019349 > Success depends on the right choice ! > Email: ma...@paesani.it > > > > > Il giorno mer 12 giu 2024 alle ore 12:52 7ri...@gmail.com <7ri...@gmail.com> > ha scritto: >> >> Y'all -- >> >> Does anyone know of a survey or study showing the rate of uptake for BGP >> over TCP-AO? I've poked around some and asked in a few places and not found >> anything, but I probably missed something out there. >> >> If there's no studies, does anyone have any experiences possibly indicating >> BGP over TCP-AO usage they can share? >> >> :-) /r -- ++ytti
Re: Free(opensource) Ticketing solutions
This threat is interesting to me, but I'm surprised how no requirements or use-cases are mentioned. Like what is OP trying to do? Looking at some of the proposals, it's obvious some of them are intended for use cases where one side is an external party with email, and one side is an internal party with application. And this system would be obviously terribly for internal use, where teams ask each other to do things via the system. I find the customer facing ticketing system a far easier problem, than the internal one. For the internal one, my MVP requirements would be - Everyone has their own ticket view, and see just tickets that are actionable to them, right now - Tickets can have dependencies, maybe I've been assigned a ticket, and I figure out what I need to do, and what I need from others to do it, so I can create new tickets and have my ticket depend on them. This way I can get my ticket out of my view, until dependencies are solved, in which case I get it back - Tickets could have parent and child, where parent automatically tracks progress through children, perhaps my parent ticket is 'enabled ISIS' and I have child ticket for each device. - API for users, not just developers - I strongly believe the market has understood UX wrong, I believe WebUX is great for problems where users use the UX rarely, but CliUX actually is desirable by users and management for problems where users use the UX every day hours on end. The Cli/Curses UX is blazing fast and predictable layout, so after onboarding, users don't even look at the screen and are exceedingly efficient. When I check-in to a hotel, buy a SIM or rent a car, I often have to wait silently when the clerk spends literally 10min clicking and typing, on the most common use case they have. I don't expect market to ever agree, but at least if it has API, I can write my own CliUX to be fast on the couple things I need to do The commercial solutions I've used, Remedy and ServiceNow are absolutely horrible, and it shocks me this is the state of the art. Companies where those are used, you have to force employees to use them, and if you are senior enough that rules don't apply, you don't use it, because it's so bad UX. Both would basically require an internal team to develop them actively, at which point you might wonder, why didn't we just NIH this? But usually this internal team doesn't exist due to cost reasons, and the outcome is really poor and expensive. Someone is going to do what Slack did to chat, and make a ticketing system that people actually want to use rather than email, because they think it makes their work easier. -- ++ytti
Re: Cogent-TATA peering dispute?
On Sat, 18 May 2024 at 10:38, Bill Woodcock wrote: > So, yes, I think having an open peering policy should be a requirement for > operating a root nameserver. I don’t think there’s any defensible rationale > that would support root nameservers being a private benefit to be used to > worsen the digital divide or create leverage in commercial disputes. They > should, indeed, all be accessible to all networks. What type of network reach is required? Is single pop enough, that as long as you have single pop, and open policy to peer with anyone who wants to connect to your pop, you qualify? -- ++ytti
Re: Cogent-TATA peering dispute?
On Sat, 18 May 2024 at 01:07, William Herrin wrote: > I don't understand why Cogent is allowed to operate one of the root > servers. Doesn't ICANN do any kind of technical background check on > companies when letting the contract? > > For those who haven't been around long enough, this isn't Cogent's > first depeering argument. Nor their second. And they're behaving > unreasonably. I don't know any of the details -this time- but > historically speaking Cogent is behaving badly -again- and you can > take that to the bank. This seems awfully simplistic, 'Cogent at 100% fault, in each case'. It doesn't match my understanding, and therein lies the problem. In my understanding of the issues, in a few of them, I would rate 100% fault at the other side. What are we asking in terms of your proposed policy change of allowing host a root DNS? You must peer with everyone and anyone, at any terms? I think we would struggle to form policy to capture the problem in a fair and equitable manner. As long as our toolbox only has a capitalist hammer, peering disputes are going to be a thing. Cogent has outlived many of its peering dispute history partners. They are the grandfather of disruptive transit pricing, which many others have struggled to meet profitably, and I believe they are a big reason for current transit pricing being as low as it is in the US and EU. -- ++ytti
Re: Opengear alternatives that support 5g?
On Fri, 26 Apr 2024 at 19:43, Warren Kumari wrote: > I've been on the same quest, and I have some additional requests / features. > Ideally it: > > 1: would be small - my particular use-case is for a "traveling rack", and so > 0U is preferred. > 2: would be fairly cheap. > 3: would not be a Raspberry-Pi, a USB hub and USB-to-serial cables. We tried > that for a while, and it was clunky — the SD card died a few times (and > jumped out entirely once!), people kept futzing with the OS and fighting over > which console software to use, installing other packages, etc. > 4: support modern SSH clients (it seems like you shouldn't have to say this, > but… ) > 5: actually be designed as a termserver - the current thing we are using > doesn't really understand terminals, and so we need to use 'socat > -,raw,echo=0,escape=0x1d TCP::' to get things like > tab-completion and "up-arrow for last command" to work. > 6: support logging of serial (e.g crash-messages) to some sort of log / > buffer / similar (it's useful to be able to see what a device barfed all over > the console when it crashes. Decouple your needs, use whatever hardware to translate RS232 into SSH, and then use 'conserver' to maintain 24/7 logging and multiplexing SSH sessions to each console port. Then you have your logs in your existing NMS box filesystem, and consistent UX independent of hardware to reach, monitor and multiplex consoles. For me Cisco is great here, because it's something an organisation already knows how to source, turn-up, upgrade, troubleshoot, maintain. And you get a broad set of features you might want, IPSEC, DMVPN, BGP, ISIS, and so forth. I keep wondering why everyone is so focused on OOB hardware cost, when in my experience the ethernet connection is ~200-300USD (150USD can be just xconn) MRC. So in 10 years, you'll pay 24k to 36k just for the OOB WAN, masking the hardware price. And 10years, to me, doesn't sound even particularly long a time for a console setup. > > > The Get Console Airconsole TS series meets many of these requirements, but it > doesn't do #6. It also doesn't really feel like they have been updating / > maintaining these. > > Yes, I fully acknowledge that #3 falls into the "Doctor, Doctor, it hurts > when I do this" camp, but, well… > > W > > >> >> -- >> ++ytti > > -- ++ytti
Re: Opengear alternatives that support 5g?
On Fri, 26 Apr 2024 at 19:43, Warren Kumari wrote: >> Curious if anyone has particular hardware they like for OOB / serial >> management, similar to OpenGear, but preferably with 5G support, maybe even >> T-Mobile support? It’s becoming increasingly difficult to get static IP 4g >> machine accounts out of Verizon, and the added speed would be nice too. Or >> do you separate the serial from the access device (cell+firewall, etc.)? Does it? To me OP implied they need 5G, because they can get static in 5G product, but not on 4G. So if need for static is solved, they can keep existing investments. -- ++ytti
Re: Opengear alternatives that support 5g?
On Fri, 26 Apr 2024 at 03:11, David H wrote: > Curious if anyone has particular hardware they like for OOB / serial > management, similar to OpenGear, but preferably with 5G support, maybe even > T-Mobile support? It’s becoming increasingly difficult to get static IP 4g > machine accounts out of Verizon, and the added speed would be nice too. Or > do you separate the serial from the access device (cell+firewall, etc.)? You could get a 5G Catalyst with an async NIM or SM. But I think you're setting up yourself for unnecessary costs and failures by designing your OOB to require static IP. You could design it so that the OOB spokes dial-in to the central OOB hub, and the OOB hub doesn't care what IP they come from, using certificates or PSK for identity, instead of IP. -- ++ytti
Re: constant FEC errors juniper mpc10e 400g
On Sun, 21 Apr 2024 at 09:05, Mark Tinka wrote: > Technically, what you are describing is EoS (Ethernet over SONET, Ethernet > over SDH), which is not the same as WAN-PHY (although the working groups that > developed these nearly confused each other in the process, ANSI/ITU for the > former vs. IEEE for the latter). > > WAN-PHY was developed to be operated across multiple vendors over different > media... SONET/SDH, DWDM, IP/MPLS/Ethernet devices and even dark fibre. The > goal of WAN-PHY was to deliver a low-cost Ethernet interface that was > SONET/SDH-compatible, as EoS interfaces were too costly for operators and > their customers. > > As we saw in real life, 10GE ports out-sold STM-64/OC-192 ports, as networks > replaced SONET/SDH backbones with DWDM and OTN. Key difference being, WAN-PHY does not provide synchronous timing, so it's not SDH/SONET compatible for strict definition for it, but it does have the frame format. And the optical systems which could regenerate SONET/SDH framing, didn't care about timing, they just wanted to be able to parse and generate those frames, which they could, but they could not do it for ethernet frames. I think it is pretty clear, the driver was to support long haul regeneration, so it was always going to be a stop-gap solution. Even though I know some networks, who specifically wanted WAN-PHY for its error reporting capabilities, I don't think this was majority driver, majority driver almost certainly was 'thats only thing we can put on this circuit'. -- ++ytti
Re: constant FEC errors juniper mpc10e 400g
On Sat, 20 Apr 2024 at 14:35, Mark Tinka wrote: > Even when our market seeks OTN from European backhaul providers to extend > submarine access into Europe and Asia-Pac, it is often for structured > capacity grooming, and not for OAM benefit. > > It would be interesting to learn whether other markets in the world still > make a preference for OTN in lieu of Ethernet, for the OAM benefit, en masse. > When I worked in Malaysia back in the day (2007 - 2012), WAN-PHY was > generally asked for for 10G services, until about 2010; when folk started to > choose LAN-PHY. The reason, back then, was to get that extra 1% of pipe > bandwidth :-). Oh I don't think OTN or WAN-PHY have any large deployment future, the cheapest option is 'good enough' and whatever value you could extract from OTN or WAN-PHY, will be difficult to capitalise, people usually don't even capitalise the capabilities they already pay for in the cheaper technologies. Of course WAN-PHY is dead post 10GE, a big reason for it to exist was very old optical systems which simply could not regenerate ethernet framing, not any features or functional benefits. -- ++ytti
Re: constant FEC errors juniper mpc10e 400g
On Sat, 20 Apr 2024 at 10:00, Mark Tinka wrote: > This would only matter on ultra long haul optical spans where the signal > would need to be regenerated, where - among many other values - FEC would > need to be decoded, corrected and re-applied. In most cases, modern optical long haul has a transponder, which terminates your FEC, because clients offer gray, and you like something a bit less depressing, like 1570.42nm. This is not just FEC terminating, but also to a degree autonego terminating, like RFI signal would be between you and transponder, so these connections can be, and regularly are, provided without proper end-to-end hardware liveliness, and even if they were delivered and tested to have proper end-to-end HW liveliness, that may change during operation, so line faults may or may not be propagated to both ends as RFI assertion, and even if they are, how delayed they are, they may suffer delay to allow for optical protection to engage, which may be undesirable, as it eats into your convergence budget. Of course the higher we go in the abstraction, the less likely you are to get things like HW livelines detection, like I don't really see anyone asking for this in their pseudowire services, even though it's something that actually can be delivered. In Junos it's a single config stanza in interface, to assert RFI to client port, if pseudowire goes down in the operator network. -- ++ytti
Re: constant FEC errors juniper mpc10e 400g
On Fri, 19 Apr 2024 at 10:55, Mark Tinka wrote:> FEC is amazing. > At higher data rates (100G and 400G) for long and ultra long haul optical > networks, SD-FEC (Soft Decision FEC) carries a higher overhead penalty > compared to HD-FEC (Hard Decision FEC), but the net OSNR gain more than > compensates for that, and makes it worth it to increase transmission distance > without compromising throughput. Of course there are limits to this, as FEC is hop-by-hop, so in long-haul you'll know about circuit quality to the transponder, not end-to-end. Unlike in wan-phy, OTN where you know both. Technically optical transport could induce FEC errors, if there are FEC errors on any hop, so consumers of optical networks need not have access to optical networks to know if it's end-to-end clean. Much like cut-through switching can induce errors via some symbols to communicate the CRC errors happened earlier, so the receiver doesn't have to worry about problems on their end. -- ++ytti
Re: constant FEC errors juniper mpc10e 400g
On Thu, 18 Apr 2024 at 21:49, Aaron Gould wrote: > Thanks. What "all the ethernet control frame juju" might you be referring > to? I don't recall Ethernet, in and of itself, just sending stuff back and > forth. Does anyone know if this FEC stuff I see concurring is actually > contained in Ethernet Frames? If so, please send a link to show the ethernet > frame structure as it pertains to this 400g fec stuff. If so, I'd really > like to know the header format, etc. The frames in FEC are idle frames between actual ethernet frames. So you recall right, without FEC, you won't see this idle traffic. It's very very good, because now you actually know before putting the circuit in production, if the circuit works or not. Lot of people have processes to ping from router-to-router for N time, trying to determine circuit correctness before putting traffic on it, which looks absolutely childish compared to FEC, both in terms of how reliable the presumed outcome is and how long it takes to get to that presumed outcome. -- ++ytti
Re: TFTP over anycast
On Sat, 6 Apr 2024 at 12:00, Bill Woodcock wrote: > That’s been the normal way of doing it for some 35 years now. iBGP > advertise, or don’t advertise, the service address, which is attached to the > loopback, depending whether you’re ready to service traffic. If we are talking about eBGP, then pulling routes makes sense. If we are talking about iBGP and controlled environment, you should never pull anycast routes, because eventually you will have failure mode, where the check mechanism itself is broken, and you'll pull all routes. If instead of pulling the routes, you make them inferior, you are covered for the failure mode of check itself being broken. -- ++ytti
Re: Open source Netflow analysis for monitoring AS-to-AS traffic
On Fri, 29 Mar 2024 at 20:10, Steven Bakker wrote: > To top it off, both the sFlow and IPFIX specs are sufficiently vague about > the meaning of the "frame size", so vendors can implement whatever they want > (include/exclude padding, include/exclude FCS). This implies that you > shouldn't trust these fields. I share this concern, but in my experience the market simply does not care at all what the data means. People happily graph L3 rate from Junos, and L2 rate from other boxes, using them interchangeably as well as using them to determine if or not there is congestion. While in reality, what you really want is L1 speed, so you can actually see if the interface is full or not. Luckily we are starting to see more and more devices also support peak-buiffer-util in previous N seconds, which is far more useful for congestion monitoring, unfortunately it is not IF-MIB so most will never ever collect it. Note, it is possible to get most Juniper gear to report L2 rate like IF-MIB specifies, but it's a non-standard configuration option, therefore very rarely used. I also wholeheartedly agree on inline templates being near peak insanity. Huge complexity for upside that is completely beyond my understanding. If I decide to collect a new metric, then punching in the metric number+name somewhere is the least of my worries. Idea that the costs are lowered by having machines dynamically determine what is being collected and monitored is just bizarre. Most of the cost of starting to collect a new metric is figuring out how it is actionable, what needs to happen to the metric to trigger a given action, and how exactly we are extracting value from this action. Definitely Netflow v9/v10 should have done out-of-band templates, and left it to operator concern to communicate to the collector what it is seeing. Even exceedingly trivial things in v9/v10 entities can be broken for years and years before anyone notices, like for example the original sampling entities are deprecated, they are replaced with new entities, which communicate 'every N packets, sample C packets', this is very very good, because it allows you to do stateless sampling, while still filling out export packet with MTU or larger size to keep export PPS rate same before/after axing cache. However, by the time I was looking into this, only pmacct correctly understood how to use these entities, nfcapd and arbor either didn't understand them, or understood them incorrectly (both were fixed in a timely manner by responsible maintainers, thank you). -- ++ytti
Re: Open source Netflow analysis for monitoring AS-to-AS traffic
On Fri, 29 Mar 2024 at 02:15, Nick Hilliard wrote: > Overall, sflow has one major advantage over netflow/ipfix, namely that > it's a stateless sampling mechanism. Once you have hardware that can > Obviously, not all netflow/ipfix implementations implement flow state, > but most do; some implement stateless sampling ala sflow. Also many > Tools should be chosen to fit the job. There are plenty of situations > where sflow is ideal. There are others where netflow is preferable. This seems like a long-winded way of saying, sFlow is a perfect subset of IPFIX. We will increasingly see IPFIX implementations omit state, because states don't do anything anymore in high-volume networks, you will only ever create flow in cache, then delay exporting the information for some seconds, but the flow is never hit twice, therefore paying massive cost for caching, without getting anything out of it. Anyone who actually needs caching, will have to buy specialised devices, as it will no longer be economical for peering-routers to offer such memory bandwidth and cache sizes that caches will actually do something. In a particular network we tried 1:5000 and 1:500 and in both cases flow records were 1 packet long, at which point we hit record export policer limit, and couldn't determine at which sampling rate we will start to see cache being useful. I've wondered for a long time, what would a graph look like, where you graph sampling ratio and percentage of flows observed, it will be linear to very high sampling ratios, but eventually it will start to taper off, I just don't have any intuitive idea when. And I don't think anyone really knows what ratio of flows they are observing in the sFlow/IPFIX, if you keep sampling ratio static over a period of time, say decade, you will continuously reduce your resolution, seeing a smaller percentage of flows. This worries me a lot, because statistician would say that you need this share of volume or this share of flows if you want to use the data like this with this confidence, therefore if we formally think the problem, we should constantly adjust our sampling ratios to fit our statistical model to keep same promises about data quality. -- ++ytti
Re: Open source Netflow analysis for monitoring AS-to-AS traffic
On Thu, 28 Mar 2024 at 20:36, Peter Phaal wrote: > The documentation for IOS-XR suggests that enabling extended-router in the > sFlow configuration should export "Autonomous system path to the > destination", at least on the 8000 series routers: > https://www.cisco.com/c/en/us/td/docs/iosxr/cisco8000/netflow/command/reference/b-netflow-cr-cisco8k/m-sflow-commands.html > I couldn't find a similar option in the NetFlow/IPFIX configuration guide, > but I might have missed it. Hope this clarifies. --- https://www.cisco.com/c/en/us/td/docs/routers/asr9000/software/asr9k-r7-9/configuration/guide/b-netflow-cg-asr9k-79x/configuring-netflow.html Use the record ipv4 [peer-as] command to record peer AS. Here, you collect and export the peer AS numbers. Note Ensure that the bgp attribute-download command is configured. Else, no AS is collected when the record ipv4 or record ipv4 peer-as command is configured. -- ++ytti
Re: Open source Netflow analysis for monitoring AS-to-AS traffic
Hey, On Thu, 28 Mar 2024 at 17:49, Peter Phaal wrote: > sFlow was mentioned because I believe Brian's routers support the feature and > may well export the as-path data directly via sFlow (I am not aware that it > is a feature widely supported in vendor NetFlow/IPFIX implementations?). Exporting AS information is wire-format agnostic feature, if it's supported or not, it can equally be injected into sFlow, NetflowV5 (src and dst only), NetflowV9 and IPFIX. The cost is that you need to program in FIB entries the information, so that the information becomes available at look-up time for record creation. In OP's case (IOS-XR) this means enabling 'attribute-download' for BGP, and I believe IOS-XR will never download any other asn but src and dst, therefore full information cannot be injected into any emitted wire-format. -- ++ytti
Re: Open source Netflow analysis for monitoring AS-to-AS traffic
On Wed, 27 Mar 2024 at 21:02, Peter Phaal wrote: > Brian, you may want to see if your routers support sFlow (vendors have added > the feature over the last few years). Why is this a solution, what does it solve for OP? Why is it meaningful what the wire-format of the records are? I read OP's question at a much higher level, about how to interact and reason about data, rather than how to emit it. Ultimately sFlow is a perfect subset of IPFIX, when you run IPFIX without caching you get the functional equivalent of sFlow (there is an IPFIX entity for emitting n bytes from frame as well as data). -- ++ytti
Re: Best TAC Services from Equipment Vendors
On Wed, 6 Mar 2024 at 22:57, michael brooks - ESC wrote: > Funny you should mention this now, we were just discussing (more like > lamenting...) if support is a dying industry. It seems as though vendor > budgets are shrinking to the point they only have a Sales/Pre-Sales > department, and from Day Two on you are on your own. Dramatic take of course, > but if we are speaking in trajectories My personal experience extending in three different decades is that there is no meaningful change in support quality or amount of issues encountered. Support quality has always been very modest, unless you specifically pay to have access to named engineers. And this is not because quality of the engineers changes, this is because vast majority of support cases are useless cases, and to handle this massive volume support tries to assume which support cases are legitimate problems, which are PEBKAC and in which cases the user already solved their problem by the time you read their ticket and will never respond back. The last case is so common that every first-line adopts the strategy of 'pinging' you, regardless how good and clear information you provide, they ask some soft-ball question, to see if you're still engaged. Having a named engineer changes this process, because the engineer will quickly learn that you don't open useless cases, that the issue you're having is legitimate, and will actually read the ticket and think about the problem. To me this seems an inevitable outcome, if your product is popular, most of its users are users who don't do their homework and do not respect the support line's time, which ends up being a disservice to the whole ecosystem, because legitimate problems will take longer to fix, or in case of open source software, authors just burn out and kill the project. What shocks me more than the low quality support is the low quality software, decades pass along, and everyone still is having show-stopper level of issues in basic functions on a regular basis, the software quality is absolutely abysmal. I fear low software quality is organically market-driven, no one is trying to make poor NOS, it's just market incentives drive poor quality NOS. When no one has high quality NOS, there is no reason to develop one, because most of your revenue is support contracts, not hardware sales, and if the NOS wouldn't be out-right broken needing to be recompiled regularly to get basic things working, lot of users might stop buying support, because they don't need the hand-holding part of it, they just need working software. This is not something that vendors actively drive, I'm sure most companies believe they are making an honest attempt to improve quality, but it is visible in where investments are put. One vendor had a very promising project to take a holistic look into their NOS quality issue, by senior subject matter experts, this project was killed (I'm sure funding was needed somewhere with better returns), and the responsible senior person went to Amazon instead. > > > > > michael brooks > Sr. Network Engineer > Adams 12 Five Star Schools > michael.bro...@adams12.org > > "flying is learning how to throw yourself at the ground and miss" > > > > On Wed, Mar 6, 2024 at 11:25 AM Pascal Masha wrote: >> >> Thought about it but so far I believe companies from China provide better >> and fast TAC responses to their customers than the likes of Cisco and >> perhaps that’s why some companies(where there are no restrictions)prefer >> them for critical services. >> >> For a short period in TAC call you can have over 10 R&D engineers and >> solutions provided in a matter of hours even if it involves software >> changes.. while these other companies even before you get in a call with a >> TAC engineer it’s hours and when they join you hear something like “my shift >> ended 15 minutes ago, hold let me look for another engineer”. WHY? Thoughts > > > This is a staff email account managed by Adams 12 Five Star Schools. This > email and any files transmitted with it are confidential and intended solely > for the use of the individual or entity to whom they are addressed. If you > have received this email in error please notify the sender. -- ++ytti
Re: Network chatter generator
On Fri, 23 Feb 2024 at 19:42, Brandon Martin wrote: > Before I go to the trouble of making one myself, does anybody happen to > know of a pre-canned program to generate realistic and scalable amounts > of broadcast/broad-multicast network background "chatter" seen on > typical consumer and business networks? This would be things like lots > of ARP traffic to/from various sources/destinations within a subnet, > SSDP, MDNS-SD, SMB browser traffic, DHCP requests, etc.? For protocol fuzzing I've used 'Codenomicon', which since has been acquired by synopsys: (this is about trying to offer various type of bad PDUs to protocol) https://www.synopsys.com/software-integrity/security-testing/fuzz-testing.html For volumetric protocol testing I've used 'Spirent Avalanche': (this is more like https or imaps users etc) https://www.spirent.com/products/avalanche-security-testing There are other commercial options in this space and I'm not familiar with recent developments. Not sure if either really fit your bill. I guess you could ask someone with a chatty LAN to record it, and play the pcap back. -- ++ytti
Re: Twelve99 / AWS usw2 significant loss
On Fri, 26 Jan 2024 at 10:23, Phil Lavin via NANOG wrote: > 88.99.88.67 to 216.147.3.209: > Host Loss% Snt Last Avg > Best Wrst StDev > 1. 10.88.10.254 0.0% 1760.2 0.1 > 0.1 0.3 0.1 > 7. nug-b1-link.ip.twelve99.net 0.0% 1763.3 3.5 > 3.1 24.1 1.6 > 8. hbg-bb2-link.ip.twelve99.net86.9% 175 18.9 18.9 > 18.7 19.2 0.1 > 9. ldn-bb2-link.ip.twelve99.net92.0% 175 30.5 30.6 > 30.4 30.8 0.1 > 10. nyk-bb1-link.ip.twelve99.net 4.6% 175 99.5 99.5 > 99.3 100.1 0.2 > 11. sjo-b23-link.ip.twelve99.net56.3% 175 296.8 306.0 > 289.7 315.0 5.5 > 12. amazon-ic-366608.ip.twelve99-cust.net 80.5% 175 510.0 513.5 > 500.7 539.7 8.4 This implies the problem is not on this path, because #10 is not experiencing it, possibly because it happens to return a packet via another option, but certainly shows the problem didn't happen in this direction yet at #10, but because #8 and #9 saw it, they already saw it on the other direction. > 44.236.47.236 to 178.63.26.145: > Host Loss% Snt Last Avg > Best Wrst StDev > 1. ip-10-96-50-153.us-west-2.compute.internal 0.0% 2670.2 0.2 > 0.2 0.4 0.0 > 11. port-b3-link.ip.twelve99.net 0.0% 2675.8 5.9 > 5.6 11.8 0.5 > 12. palo-b24-link.ip.twelve99.net 4.9% 267 21.1 21.5 > 21.0 58.4 3.1 > 13. sjo-b23-link.ip.twelve99.net 0.0% 266 21.4 22.7 > 21.3 86.2 6.5 > 14. nyk-bb1-link.ip.twelve99.net 58.1% 266 432.7 422.7 > 407.2 438.5 6.5 > 15. ldn-bb2-link.ip.twelve99.net 98.1% 266 485.6 485.4 > 481.6 491.1 3.9 > 16. hbg-bb2-link.ip.twelve99.net 92.5% 266 504.1 499.8 > 489.8 510.1 5.9 > 17. nug-b1-link.ip.twelve99.net 55.5% 266 523.5 519.6 > 504.4 561.7 7.6 > 18. hetzner-ic-340780.ip.twelve99-cust.net53.6% 266 524.4 519.2 > 506.0 545.5 6.9 > 19. core22.fsn1.hetzner.com 70.2% 266 521.7 519.2 > 498.5 531.7 6.6 > 20. static.213-239-254-150.clients.your-server.de 33.2% 266 382.4 375.4 > 364.9 396.5 4.1 > 21. static.145.26.63.178.clients.your-server.de 62.0% 266 529.9 518.4 > 506.9 531.3 6.1 This suggests the congestion point is from sjo to nyk, in 1299, not AWS at all. You could try to fix SPORT/DPORT, and do several SPORT options, to see if loss goes away with some, to determine if all LAG members are full or just one. At any rate, this seems business as usual, sometimes internet is very lossy, you should contact your service provider, which I guess is AWS here, so they can contact their service provider or 1299. -- ++ytti
Re: "Hypothetical" Datacenter Overheating
On Wed, 17 Jan 2024 at 03:18, wrote: > Others have pointed to references, I found some others, it's all > pretty boring but perhaps one should embrace the general point that > some equipment may not like abrupt temperature changes. Can you share them? Only one I've found is: https://www.ashrae.org/file%20library/technical%20resources/bookstore/supplemental%20files/referencecard_2021thermalguidelines.pdf Which quotes 20c/h, which is a much higher rate than almost anyone has ability to perform in their DC ambient. But it makes no explanation where this comes from. I believe in reality there is immense complexity here - Gradient depends on processes and materials used in manufacturing (like pre/post ROHS will certainly have different gradient) - Gradient has directionality, unlike ASHRAE quotes, because devices are engineered to go from 20C to 90C in very short moment, when turned on, but there was less engineering pressure for similar cooling rates - Gradient has positionality going 20C between any two pairs does not mean equal risk And likely no one knows well, because no one has had to know well, because it's not expensive enough to derisk. But what we do know well - ASHRAE quotes rate which you are unlikely to be able to hit - Devices that travel with you, regularly see 50c instant ambient gradients, both directions, multiple times a day - Devices see large fast gradients when turned on, but slower when turned off - Compute people quote ASHRAE, Networking people appear not to, perhaps like you say spindles is the ultimately reason for the limits to exist I think generally we have bias in that we like to identify risks and then add them as organisational knowledge, but ultimately all these new rules and exceptions you introduce, increase cost, complexity, reduce efficiency and productivity. So we should be very critical about them. It is fine to realise risks, and use realised risks as data to analyse if avoiding those risks makes sense. It's very easy to build poorly defined rules over poorly defined rules and arrive in high cost, low efficiency operations. Like this 'few centigrades per hour' is an exceedingly palatable rule-of-thumb, it sounds good, unless you stop to think about it. I would not recommend spending any time or money derisking gradients, I would hope that rules that redisk condensation are enough to cover derisking gradients and I would re-evaluate after sufficient realised risks. -- ++ytti
Re: "Hypothetical" Datacenter Overheating
On Tue, 16 Jan 2024 at 12:22, Nathan Ward wrote: > Here’s some manufacturer specs: > https://www.dell.com/support/manuals/en-nz/poweredge-r6515/per6515_ts_pub/environmental-specifications?guid=guid-debd273c-0dc8-40d8-abbc-be059a0ce59c&lang=en-us > > 3rd section, “Maximum temperature gradient”. Thanks. It seems quite many compute context quote ASHRAE gradients, but in networking kit context it seems very rarely quoted (unless indirectly via NEBS), while I wouldn't expect intuitively their tolerances to be significantly different. -- ++ytti
Re: "Hypothetical" Datacenter Overheating
On Tue, 16 Jan 2024 at 11:00, William Herrin wrote: > You have a computer room humidified to 40% and you inject cold air > below the dew point. The surfaces in the room will get wet. I think humidity and condensation is well understood and indeed documented but by NEBS and vendors as verboten. I am more interested in temperature changes when not condensating and causing water damage. Like we could theorise, some soldering will expand/contract too fast, breaking or various other types of scenarios one might guess without context, and indeed electronics often have to experience large temperature gradients and appear to survive. When you turn these things on, various parts rapidly heat from ambient to 80-90c. So I have some doubts if this is actually a problem you need to consider, in absence of condensation. -- ++ytti
Re: "Hypothetical" Datacenter Overheating
On Tue, 16 Jan 2024 at 08:51, wrote: > A rule of thumb is a few degrees per hour change but YMMV, depends on > the equipment. Sometimes manufacturer's specs include this. Is this common sense, or do you have reference to this, like paper showing at what temperature change at what rate occurs what damage? I regularly bring fine electronics, say iPhone, through significant temperature gradients, as do most people who have to live in places where inside and outside can be wildly different temperatures, with no particular observable effect. iPhone does go into 'thermometer' mode, when it overheats though. Manufacturers, say Juniper and Cisco describe humidity, storage and operating temperatures, but do not define temperature change rate. Does NEBS have an opinion on this, or is this just a common case of yours? -- ++ytti
Re: IPv6 Traffic Re: IPv6? Re: Where to Use 240/4 Re: 202401100645.AYC Re: IPv4 address block
On Mon, 15 Jan 2024 at 21:08, Michael Thomas wrote: > An ipv4 free network would be nice, but is hardly needed. There will > always be a long tail of ipv4 and so what? You deal with it at your I mean Internet free DFZ, so that everyone is not forced to maintain two stacks at extra cost, fragility and time. Any protocols at the inside networks are fine, as long as you're meeting the Internet with IPv6-only stack. I'm sure there are CLNS, IPX, AppleTalk etc networks there, but that doesn't impose a cost to everyone wanting to play. -- ++ytti
Re: IPv6 Traffic Re: IPv6? Re: Where to Use 240/4 Re: 202401100645.AYC Re: IPv4 address block
On Mon, 15 Jan 2024 at 10:59, jordi.palet--- via NANOG wrote: > No, I’m not saying that. I’m saying "in actual deployments", which doesn’t > mean that everyone is deploying, we are missing many ISPs, we are missing > many enterprises. Because of low entropy of A-B pairs in bps volume, seeing massive amounts of IPv6 in IPv6 enabled networks is not indicative of IPv6 success. I don't disagree with your assertion, I just think it's damaging, because readers without context will form an idea that things are going smoothly. We should rightly be in panic mode and forget all the IPv4 extension crap and start thinking how do we ensure IPv6 happens and how do we ensure we get back to single stack Internet. IPv6 is very much an afterthought, a 2nd class citizen today. You can deploy new features and software without IPv6, and it's fine. IPv6 can be broken, and it's not an all-hands-on-deck problem, no one is calling. -- ++ytti
Re: IPv6 Traffic Re: IPv6? Re: Where to Use 240/4 Re: 202401100645.AYC Re: IPv4 address block
On Mon, 15 Jan 2024 at 10:05, jordi.palet--- via NANOG wrote: > In actual customer deployments I see the same levels, even up to 85% of IPv6 > traffic. It basically depends on the usage of the caches and the % of > residential vs corporate customers. You think you are contributing to the IPv6 cause, by explaining how positive the situation is. But in reality you are damaging it greatly, because you're not communicating that we are not on a path to IPv4 free Internet. If we had been on such a path, we would have been IPv4 free for more than a decade. And unless we admit we are not on that path, we will not work to get on that path. -- ++ytti
Re: IPv6 Traffic Re: IPv6? Re: Where to Use 240/4 Re: 202401100645.AYC Re: IPv4 address block
On Mon, 15 Jan 2024 at 06:18, Forrest Christian (List Account) < li...@packetflux.com> wrote: If 50٪ of the servers and 50% of the clients can do IPv6, the amount of > IPv6 traffic will be around 25% since both ends have to do IPv6. > This assumes cosmological principle applies to the Internet, but Internet traffic is not uniformly distributed. It is entirely possible, and even reasonable, that AMSIX ~5% and GOOG 40% are bps shares, and both are correct. Because AMSIX sees large entropy between A-B end-points, GOOG sees very low entropy, it being always the B. Certain tier1 transit network could see traffic being >50% IPv6 between two specific pops, so great IPv6 adoption? Except it was a single CDN sending traffic from them to them, if you'd exclude that CDN flows between the pop, the IPv6 traffic share was low single digit percentage. I am not saying IPv6 traffic is not increasing, I am saying that we are not doing any favours to anyone, pretending we are on-track and that this will happen, and that there are organic drivers which will ensure we are going to end up with IPV6-only Internet. -- ++ytti
Re: 202401100645.AYC Re: IPv4 address block
On Thu, 11 Jan 2024 at 12:57, Christopher Hawker wrote: > Reclassifying this space, would add 10+ years onto the free pool for each > RIR. Looking at the APNIC free pool, I would estimate there is about 1/6th of > a /8 pool available for delegation, another 1/6th reserved. Reclassification > would see available pool volumes return to pre-2010 levels. Just enough time for us to retire comfortably and let some other fool fix the mess we built? We don't need to extend IPv4, we need to figure out why we are in this dual-stack mess, which was never intended, and how to get out of it. We've created this stupid anti-competitive IPv4 market and as far as I can foresee, we will never organically stop using IPv4. We've added CAPEX and OPEX costs and a lot of useless work, for no other reason, but our failure to provide a reasonable solution going from IPv4 to IPv6. I can't come up with a less stupid way to fix this, than major players commonly signing a pledge to drop IPv4 in their edge at 2040-01-01, or some such. To finally create an incentive and date when you need to get your IPv6 affairs in order, and to fix the IPv4 antitrust issue. Only reason people need IPv4 to offer service is because people offering connectivity have no incentive to offer IPv6. In fact if you've done any IPv6 at all, you're wasting money and acting against the best interest of your shareholders, because there is no good reason to spend time and money on IPv6, but there should be. -- ++ytti
Re: Sufficient Buffer Sizes
On Wed, 3 Jan 2024 at 01:05, Mike Hammett wrote: > It suggests that 60 meg is what you need at 10G. Is that per interface? Would > it be linear in that I would need 600 meg at 100G? Not at all. You need to understand WHY buffering is needed, to determine how much you want to offer buffering. Big buffering is needed, when: - Sender is faster than Receiver - Receiver wants to receive single flow at maximum rate - Sender is sending window growth at sender-rate, instead of estimated receiver-rate (Common case, but easy to change, as Linux already estimates receiver-rate, and 'tc' command can change this behaviour) Amount of big buffering depends on: - How much can the window grow, when it grows. Windows grow exponentially, so you need (RTT*receiver-rate)/2, /2 because if the window grows the first half is already done and is dropping in at receiver-rate, as ACKs come by. Let's imagine your sender is 100GE connected, and your receiver is 10GE connected. And you want to achieve a 10Gbps single flow rate. 10ms RTT - 12.5MB window size, worst case you need to grow 6.25MB and -10% off, because some of the growth you can send to the receiver, instead of buffering all of the growth, so you'd need 5.5-6MB. 100ms RTT would be ~60MB 200ms RTT would be ~600MB Now decide the answer you want to give in your products for these. At what RTT you want to guarantee what single-flow maximum rate? I do believe many of the CDNs are already using estimated receiver-rate to grow windows, which basically removes the need for buffering. But any standard cubic without tuning (i.e. all OS) will burst at line-rate window growth, causing the need for buffering. -- ++ytti
Re: CPE/NID options
On Mon, 27 Nov 2023 at 21:45, Josh Luthman wrote: Can you have an ethernet switch with dying gasp? > Our ONTs (Calix, PON) have it but I don't see how you'd do it with > ethernet. > At least via efm-oam you can have a dying gasp. You could probably add it to autonegotiation, by sending some symbol. There is already something similar in autonegotiation, like autonegotiation can inform the far end, when it is locally shutdown. That is, if I have A-B link, and B does 'shutdown' on the interface, A could emit syslog 'far-end administratively down'. This is supported by many common PHYs, but for some reason I've never seen software implementation. Of course this same thing 'admin down', could be abused by sending it always when you know you are going down. So an adventurous operator who controls their environment could add this today with just code. -- ++ytti
Re: swedish dns zone enumerator
On Thu, 2 Nov 2023 at 10:32, Mark Andrews wrote: > You missed the point I was trying to make. While I think that that source is > trying to enumerate some part of the namespace. NS queries by themselves > don’t indicate an attack. Others would probably see the series of NS queries > as a signature of an attack when they are NOT. There needs to be much more > than that to make that conclusion. I might be reading this wrong, but I don't think the point Randy was trying to make was 'NS queries are an attack', 'UDP packets are an attack' or 'IP packets are an attack' . I base this on the list of queries Randy decided to include as relevant to the thesis Randy was trying to make, instead of wholesale warning of IP, UDP or NS queries. -- ++ytti
Re: Congestion/latency-aware routing for MPLS?
On Wed, 18 Oct 2023 at 17:39, Tom Beecher wrote: > Auto-bandwidth won't help here if the bandwidth reduction is 'silent' as > stated in the first message. A 1G interface , as far as RSVP is concerned, is > a 1G interface, even if radio interference across it means it's effectively a > 500M link. Jason also explained the TWAMP + latency solution, which is an active solution and doesn't rely on operator or automatic bandwidth providing information, but network automatically measures latency and encodes this information in ISIS, allowing automatic traffic engineering for LSP to choose the lowest latency path. I believe Jason's proposal is exactly what OP is looking for. -- ++ytti
Re: MX204 tunnel services BW
On Mon, 16 Oct 2023 at 22:49, wrote: > JTAC says we must disable a physical port to allocate BW for tunnel-services. > Also leaving tunnel-services bandwidth unspecified is not possible on the > 204. I haven't independently tested / validated in lab yet, but this is what > they have told me. I advised JTAC to update the MX204 "port-checker" tool > with a tunnel-services knob to make this caveat more apparent. Did they explain why you need to disable the physical port? I'd love to hear that explanation. The MX204 is single Trio EA, so you can't even waste serdes sending the packet to remote PFE after first lookup, it would only bounce between local XM/MQ and LU/XL, wasting that serdes. -- ++ytti
Re: MX204 tunnel services BW
On Tue, 17 Oct 2023 at 00:28, Delong.com wrote: > The MX-204 appears to be an entirely fixed configuration chassis and looks > from the literature like it is based on pre-trio chipset technology. > Interesting that there are 100Gbe interfaces implemented with this seemingly > older technology, but yes, looks like the PFE on the MX-204 has all the same > restrictions as a DPC-based line card in other MX-series routers. It is 100% normal Trio EA. -- ++ytti
Re: Add communities on direct routes in Juniper
Unfortunately not yet, as far as I know. Long time ago I gave this to my account team Title: Direct routes must support tag and or community Platform: Trio, priority MX80, MPC2 JunOS: 12.4Rx Command: 'set interfaxe ge-4/2.0 family inet address 10.42.42.1/24 tag|community X' JTAC: n/a ER: - Router must be able to add tags communities to direct routes directly, like it does for static routes Usage Case: Trivial way to signal route information to BGP. Often tag/community is used by service providers to singal 'this is PI/PA prefix, leak it to internet' or 'this is backup route, reduce its MED'. However for some reason it is only supported for static routes, while usage scenario and benefits are exactly the same for direct routes. On Sun, 15 Oct 2023 at 15:27, Stanislav Datskevych via NANOG wrote: > > Dear all, > > Is there a way to add BGP communities on direct (interface) routes in > Junipers? The task looks to be simple but the solution eludes me. > In Cisco/Arista, for example, I could use "network 192.0.2.0/24 route-map > ". > > In Juniper it seems to be impossible. I even tried putting interface-routes > into rib-group with an import policy. > But it seems the import policy only works on importing routes into Secondary > routing tables (e.g. inet.50), and not into the Primary one (inet.0). > > I know it's possible to add communities on later stage while announcing > networks to peers, in [protocols bgp group export]. But I'd better > slap the community on the routes right when they're imported into RIB, not > when they announced to peers. > > Thanks in advance. > -- ++ytti
Re: Using RFC1918 on Global table as Loopbacks
On Thu, 5 Oct 2023 at 20:45, Niels Bakker wrote: > The recommendation is to make Router-IDs globally unique. They're used > in collision detection. What if you and a peer pick the same non > globally unique address? Any session will never come up. https://datatracker.ietf.org/doc/html/rfc6286 If the BGP Identifiers of the peers involved in the connection collision are identical, then the connection initiated by the BGP speaker with the larger AS number is preserved -- ++ytti
Re: MX204 tunnel services BW
On Mon, 2 Oct 2023 at 20:21, Jeff Behrns via NANOG wrote: > Encountered an issue with an MX204 using all 4x100G ports and a logical > tunnel to hairpin a VRF. The tunnel started dropping packets around 8Gbps. > I bumped up tunnel-services BW from 10G to 100G which made the problem > worse; the tunnel was now limited to around 1.3Gbps. To my knowledge with > Trio PFE you shouldn't have to disable a physical port to allocate bandwidth > for tunnel-services. Any helpful info is appreciated. You might have more luck in j-nsp. But yes you don't need any physical interface in trio to do tunneling. I can't explain your problem, and you probably need JTAC help. I would appreciate it if you'd circle back and tell what the problem was. How it works is that when PPE decides it needs to tunnel the packet, you're going to send the packet back to MQ via SERDES (which will then send it again to some PPE, not the same). I think what that bandwidth command does is change the stream allocation, you should see it in 'show <#> stream'. In theory, because PPE can process packet forever (well, until watchdog kills the PPE for thinking it is stuck) you could very cheaply do outer+inner at the local PPE, but I think that would mean that certain features like QoS would not work on the inner interface, so I think all this expensive recirculation and SERDES consumption is to satisfy quite limited need, and it should be possible to implement some 'performance mode' for tunneling, where these MQ/XM provided features are not available, but performance cost in most cases is negligible. In parallel to opening the JTAC case, you might want to try to experiment in which FPC/PIC you set the tunneling bandwidth to. I don't understand how the tunneling would work if the MQ/XM is remote, like would you then also steal fabric capacity every time you tunnel, not just MQ>LU>MQ>LU SERDES, but MQ>LU>MQ>FAB>MQ>LU. So intuitively I would recommend ensuring you have the bandwidth configured at the local PFE, if you don't know what the local PFE is, just configure it everywhere? Also you could consult several counters to see if some stream or fabric is congested, and these tunneled packets are being sent over congested fabric every time with lower fabric qos. I don't understand why the bandwidth command is a thing, and why you can configure where it is. To me it seems obvious they should always handle tunneling strictly locally, never over fabric, because you always end up stealing more capacity if you send it to remote MQ. That is, implicitly it should be on for every MQ, and every PPE tunnel via local MQ. -- ++ytti
Re: maximum ipv4 bgp prefix length of /24 ?
On Sun, 1 Oct 2023 at 21:19, Matthew Petach wrote: > Unfortunately, many coders today have not read Godel, Escher, Bach: An > Eternal Golden Braid, > and like the unfortunate Crab, consider their FIB compression algorithms to > be unbreakable[0]. > > In short: if you count on FIB compression working at a compression ratio > greater than 1 in order for your network to function, you had better have a > good plan for what to do when your phone rings at 3am because your FIB has > just become incompressible. ^_^; I think if we make the argument 'devices must always work' no device satisfies it today. There are already a lot of assumptions and compromises which cause them to work 'highly likely in most practical scenarios'. Certainly if we were to try to formally prove, we could prove that everything is terrible, PPS under the worst-case situation is beyond useless on devices people intuitively consider 'wire speed'. I fully agree fundamentally FIB compression is not safe, but also that ship has sailed, nothing we do is safe. But is it marketable? Likely answer is resoundingly yes. I do feel that often people underestimate the amount of risk they carry, and overestimate the importance of the risks they understand. Since the vast majority of risks are carried without understanding them. But intuitively we like to think we have good visibility into our risks and any recognised risk therefore automatically is an important risk. -- ++ytti
Re: maximum ipv4 bgp prefix length of /24 ?
On Sun, 1 Oct 2023 at 06:07, Owen DeLong via NANOG wrote: > Not sure why you think FIB compression is a risk or will be a mess. It’s a > pretty straightforward task. Also people falsely assume that the parts they don't know about, are risk free and simple. While in reality there are tons of proprietary engineering choices to make devices perform in expected environments, not arbitrary environments. So already today you could in many cases construct specific FIB, which exposes these compromises and makes devices not perform. There are dragons everywhere, but we can remain largely ignorant of them, as these engineering choices tend to be reasonable. Sometimes they are abused by shops like EANTC and Miercom for marketing reasons for ostensibly 'independent' tests. I think this compression is part of this continuum, magic inside the box I hope works because I can't begin to have a comprehensive understanding exactly how much risk I am carrying. Pretty much all performant boxes no longer have bandwidth to store all packets in memory (partial buffering), many of them have 'hot' and 'cold' prefixes. You just gotta hope, you're not gonna be able to prove anything, and by trying to do so, you're more likely to increase your costs due to false positives than you are to find an actionable problem. Most problems don't matter, figuring out which problem needs to be fixed is hard. -- ++ytti
Re: maximum ipv4 bgp prefix length of /24 ?
On Sat, 30 Sept 2023 at 09:42, Mark Tinka wrote: > > But when everybody upgrades, memory and processor unit prices > > decrease.. Vendors gain from demand. > > > I am yet to see that trend... Indeed. If you look like 10k/10q for Juniper their business is fairly stable in revenue and ports sold. so 1GE port costs the ~same as 1TE port, not more, not less. If there was reduction in port prices over time, then revenue would have to go down or ports sold up. Of course all this makes perfect sense, the sand order doesn't affect the sand price, all the cost is in people thinking how sand should be ordered and then designing machines which put the sand together. -- ++ytti
Re: maximum ipv4 bgp prefix length of /24 ?
On Fri, 29 Sept 2023 at 23:43, William Herrin wrote: > My understanding of Juniper's approach to the problem is that instead > of employing TCAMs for next-hop lookup, they use general purpose CPUs > operating on a radix tree, exactly as you would for an all-software They use proprietary NPUs, with proprietary IA. Which is called 'Trio'. Single Trio can have hundreds of PPEs, packet processing engines, these are all identical. Packets are sprayed to PPEs, PPEs do not run constant time, so reordering occurs always. Juniper is a pioneer in FIB in DRAM, and has patente gated it to a degree. So it takes a very very long time to get an answer from memory. To amortise this, PPEs have a lot of threads, and while waiting for memory, another packet is worked on. But there is no pre-emption, there is no kind of moving register/memory around or cache-misses here as a function of FIB size. PPE does all the work it has, then it requests an answer from memory, then goes to sleep, then comes back when the answer arrives and does all the work it has, never pre-empted. But there is a lot more complexity here, memory used to be in the original Trio RLDRAM which was a fairly simple setup. Once they changed to HMC, they added a cache in front of memory, a proprietary chip called CAE. IFLs were dynamically allocated one of multiple CAEs they'd use to access memory. Single CAE wouldn't have 'wire rate' performance. So if you had pathological setup, like 2 IFL, and you'd get unlucky, you'd get both IFLs in some boots assigned to same CAE, instead of spread to two CAEs, you would on some boots see lower PPS performance than other boots, because you were hot-banking the CAE. This is only type of cache problem I can recall related to Juniper. But these devices are entirely proprietary and things move relatively fast and complexity increases all the time. > router. This makes each lookup much slower than a TCAM can achieve. > However, that doesn't matter much: the lookup delays are much shorter > than the transmission delays so it's not noticeable to the user. To In DRAM lookups, like what Juniper does, most of the time you're waiting for the memory. With DRAM, FIB size is trivial engineering problem, memory bandwidth and latency is the hard problem. Juniper does not do TC AMs on it's service provider class devices. -- ++ytti
Re: maximum ipv4 bgp prefix length of /24 ?
On Fri, 29 Sept 2023 at 08:24, William Herrin wrote: > Maybe. That's where my comment about CPU cache starvation comes into > play. I haven't delved into the Juniper line cards recently so I could > easily be wrong, but if the number of routes being actively used > pushes past the CPU data cache, the cache miss rate will go way up and > it'll start thrashing main memory. The net result is that the > achievable PPS drops by at least an order of magnitude. When you say, you've not delved into the Juniper line cards recently, to which specific Juniper linecard your comment applies to? -- ++ytti
Re: what is acceptible jitter for voip and videoconferencing?
On Wed, 20 Sept 2023 at 19:06, Chris Boyd wrote: > We run Teams Telephony in $DAYJOB, and it does use SILK. > > https://learn.microsoft.com/en-us/microsoftteams/platform/bots/calls-and-meetings/real-time-media-concepts Looks like codecs still are rapidly evolving in walled gardens. I just learned about 'Satin'. https://en.wikipedia.org/wiki/Satin_(codec) https://ibb.co/jfrD6yk - notice 'payload description' from Teams admin portal. So at least in some cases Teams switches from Silk to Satin, wiki suggests 1on1 only, but I can't confirm or deny this. -- ++ytti
Re: what is acceptible jitter for voip and videoconferencing?
On Wed, 20 Sept 2023 at 03:15, Dave Taht wrote: > I go back many, many years as to baseline numbers for managing voip networks, > including things like CISCO LLQ, diffserv, fqm prioritizing vlans, and running > voip networks entirely separately... I worked on codecs, such as oslec, and > early sip stacks, but that was over 20 years ago. I don't believe LLQ has utility in hardware based routers, packets stay inside hardware based routers single digit microseconds with nanoseconds of jitter. For software based devices, I'm sure the situation is different. Practical example, tier1 network running 3 vendors, with no LLQ can go across the globe with lower jitter (microseconds) than I can ping my M1 laptop 127.0.0.1, because I have to do context switches, the network does not. This is in the BE queue measured in real operation under long periods, without any engineering effort to try to achieve low jitter. > The thing is, I have been unable to find much research (as yet) as to why my > number exists. Over here I am taking a poll as to what number is most correct > (10ms, 30ms, 100ms, 200ms), I know there are academic papers as well as vendor graphs showing the impact of jitter on quality. Here is one: https://scholarworks.gsu.edu/cgi/viewcontent.cgi?article=1043&context=cs_theses - this appears to roughly say '20ms' G711 is fine. But I'm sure this is actually very complex to answer well, and I'm sure choice of codec greatly impacts the answer, like whatsapp uses Opus, skype uses Silk (maybe teams too?). And there are many more rare/exotic codecs optimised for very specific scenarios, like massive packet loss. -- ++ytti
Re: Lossy cogent p2p experiences?
On Sat, 9 Sept 2023 at 21:36, Benny Lyne Amorsen wrote: > The Linux TCP stack does not immediately start backing off when it > encounters packet reordering. In the server world, packet-based > round-robin is a fairly common interface bonding strategy, with the > accompanying reordering, and generally it performs great. If you have Linux - 1RU cat-or-such - Router - Internet Mostly round-robin between Linux-1RU is gonna work, because it satisfies the a) non congested b) equal rtt c) non-distributed (single pipeline ASIC switch, honoring ingress order on egress), requirements. But it is quite a special case, and of course there is only a round-robin on one link in one direction. Between 3.6-4.4 all multipath in Linux was broken, and I still to this day help people with problems on multipath complaining it doesn't perform (in LAN!). 3.6 introduced FIB to replace flow-cache, and made multipath essentially random 4.4 replaced random with hash When I ask them 'do you see reordering', people mostly reply 'no', because they look at PCAP and it doesn't look important to the human observer, it is such an insignificant amount.. Invariable problem goes away with hashing. (netstat -s is better than intuition on PCAP). -- ++ytti
Re: Lossy cogent p2p experiences?
On Fri, 8 Sept 2023 at 09:17, Mark Tinka wrote: > > Unfortunately that is not strict round-robin load balancing. > > Oh? What is it then, if it's not spraying successive packets across > member links? I believe the suggestion is that round-robin out-performs random spray. Random spray is what the HPC world is asking, not round-robin. Now I've not operated such network where per-packet is useful, so I'm not sure why you'd want round-robin over random spray, but I can see easily why you'd want either a) random traffic or b) random spray, if neither are true, if you have strict round-robin and you have non-random traffic, say every other packet is big data delivery, every other packet is small ACK, you can easily synchronise one link to 100% util, and and another near 0%, if you do true round-robin, but not of you do random spray. I don't see downside random spray would have over round-robin, but I wouldn't be shocked if there is one. I see this thread is mostly starting to loop around two debates 1) Reordering is not a problem - if you control the application, you can make it 0 problem - if you use TCP shipping in Androids, iOS, macOS, Windows, Linux, BSD reordering is in practice as bad as packet loss. - people who know this in the list, don't know it because they read it, they know it, because they got caught pants down and learned it, because they had reordering and tcp performance was destroyed, even at very low reorder rates - we could design TCP congestion control that is very tolerant to reordering, but I cannot say if it would be overall win or loss 2) Reordering won't happen in per-packet, if there is no congestion and latencies are equal - the receiving distributed router (~all of them) do not have global synchronisation, they do not make any guarantees that ingress order is honored for egress, when ingress is >1 interface, the amount of reordering this alone causes will destroy customer expectation of TCP performance - we could quite easily guarantee order as long as interfaces are in same hardware complex, but it would be very difficult to guarantee between hardware complexes -- ++ytti
Re: Lossy cogent p2p experiences?
On Thu, 7 Sept 2023 at 15:45, Benny Lyne Amorsen wrote: > Juniper's solution will cause way too much packet reordering for TCP to > handle. I am arguing that strict round-robin load balancing will > function better than hash-based in a lot of real-world > scenarios. And you will be wrong. Packet arriving out of order, will be considered previous packet lost by host, and host will signal need for resend. -- ++ytti
Re: Lossy cogent p2p experiences?
On Thu, 7 Sept 2023 at 00:00, David Bass wrote: > Per packet LB is one of those ideas that at a conceptual level are great, but > in practice are obvious that they’re out of touch with reality. Kind of like > the EIGRP protocol from Cisco and using the load, reliability, and MTU > metrics. Those multi metrics are in ISIS as well (if you don't use wide). And I agree those are not for common cases, but I wouldn't be shocked if someone has legitimate MTR use-case where different metric-type topologies are very useful. But as long as we keep context as the Internet, true. 100% reordering does not work for the Internet, not without changing all end hosts. And by changing those, it's not immediately obvious how we end-up in better place, like if we wait bit longer to signal packet-loss, likely we end up in worse place, as reordering just is so dang rare today, because congestion control choices have made sure no one reorders, or customers will yell at you, yet packet-loss remains common. Perhaps if congestion control used latency or FEC instead of loss, we could tolerate reordering while not underperforming under loss, but I'm sure in decades following that decision we'd learn new ways how we don't understand any of this. But for non-internet applications, where you control hosts, per-packet is used and needed, I think HPC applications, and GPU farms etc. are the users who asked JNPR to implement this. -- ++ytti
Re: Lossy cogent p2p experiences?
On Wed, 6 Sept 2023 at 19:28, Mark Tinka wrote: > Yes, this has been my understanding of, specifically, Juniper's > forwarding complex. Correct, packet is sprayed to some PPE, and PPEs do not run in deterministic time, after PPEs there is reorder block that restores flow, if it has to. EZchip is same with its TOPs. > Packets are chopped into near-same-size cells, sprayed across all > available fabric links by the PFE logic, given a sequence number, and > protocol engines ensure oversubscription is managed by a request-grant > mechanism between PFE's. This isn't the mechanism that causes reordering, it's the ingress and egress lookup where Packet or PacketHead is sprayed to some PPE where it can occur. Can find some patents on it: https://www.freepatentsonline.com/8799909.html When a PPE 315 has finished processing a header, it notifies a Reorder Block 321. The Reorder Block 321 is responsible for maintaining order for headers belonging to the same flow, and pulls a header from a PPE 315 when that header is at the front of the queue for its reorder flow. Note this reorder happens even when you have exactly 1 ingress interface and exactly 1 egress interface, as long as you have enough PPS, you will reorder outside flows, even without fabric being involved. -- ++ytti
Re: Lossy cogent p2p experiences?
On Wed, 6 Sept 2023 at 17:10, Benny Lyne Amorsen wrote: > TCP looks quite different in 2023 than it did in 1998. It should handle > packet reordering quite gracefully; in the best case the NIC will I think the opposite is true, TCP was designed to be order agnostic. But everyone uses cubic, and for cubic reorder is the same as packet loss. This is a good trade-off. You need to decide if you want to recover fast from occasional packet loss, or if you want to be tolerant of reordering. The moment cubic receives frame+1 it expects, it acks frame-1 again, signalling loss of packet, causing unnecessary resend and window size reduction. > will never even know they were reordered. Unfortunately current > equipment does not seem to offer per-packet load balancing, so we cannot > test how well it works. For example Juniper offers true per-packet, I think mostly used in high performance computing. -- ++ytti
Re: Lossy cogent p2p experiences?
On Wed, 6 Sept 2023 at 10:27, Mark Tinka wrote: > I recognize what happens in the real world, not in the lab or text books. Fun fact about the real world, devices do not internally guarantee order. That is, even if you have identical latency links, 0 congestion, order is not guaranteed between packet1 coming from interfaceI1 and packet2 coming from interfaceI2, which packet first goes to interfaceE1 is unspecified. This is because packets inside lookup engine can be sprayed to multiple lookup engines, and order is lost even for packets coming from interface1 exclusively, however after the lookup the order is restored for _flow_, it is not restored between flows, so packets coming from interface1 with random ports won't be same order going out from interface2. So order is only restored inside a single lookup complex (interfaces are not guaranteed to be in the same complex) and only for actual flows. It is designed this way, because no one runs networks which rely on order outside these parameters, and no one even knows their kit works like this, because they don't have to. -- ++ytti
Re: Lossy cogent p2p experiences?
On Fri, 1 Sept 2023 at 22:56, Mark Tinka wrote: > PTX1000/10001 (Express) offers no real configurable options for load > balancing the same way MX (Trio) does. This is what took us by surprise. What in particular are you missing? As I explained, PTX/MX both allow for example speculating on transit pseudowires having CW on them. Which is non-default and requires 'zero-control-word'. You should be looking at 'hash-key' on PTX and 'enhanced-hash-key' on MX. You don't appear to have a single stanza configured, but I do wonder what you wanted to configure when you noticed the missing ability to do so. -- ++ytti
Re: Lossy cogent p2p experiences?
On Fri, 1 Sept 2023 at 18:37, Lukas Tribus wrote: > On the hand a workaround at the edge at least for EoMPLS would be to > enable control-word. Juniper LSR can actually do heuristics on pseudowires with CW. -- ++ytti
Re: Lossy cogent p2p experiences?
On Fri, 1 Sept 2023 at 16:46, Mark Tinka wrote: > Yes, this was our conclusion as well after moving our core to PTX1000/10001. Personally I would recommend turning off LSR payload heuristics, because there is no accurate way for LSR to tell what the label is carrying, and wrong guess while rare will be extremely hard to root cause, because you will never hear it, because the person suffering from it is too many hops away from problem being in your horizon. I strongly believe edge imposing entropy or fat is the right way to give LSR hashing hints. -- ++ytti
Re: Lossy cogent p2p experiences?
On Fri, 1 Sept 2023 at 14:54, Mark Tinka wrote: > When we switched our P devices to PTX1000 and PTX10001, we've had > surprisingly good performance of all manner of traffic across native > IP/MPLS and 802.1AX links, even without explicitly configuring FAT for > EoMPLS traffic. PTX and MX as LSR look inside pseudowire to see if it's IP (dangerous guess to make for LSR), CSR/ASR9k does not. So PTX and MX LSR will balance your pseudowire even without FAT. I've had no problem having ASR9k LSR balancing FAT PWs. However this is a bit of a sidebar, because the original problem is about elephant flows, which FAT does not help with. But adaptive balancing does. -- ++ytti
Re: Lossy cogent p2p experiences?
On Thu, 31 Aug 2023 at 23:56, Eric Kuhnke wrote: > The best working theory that several people I know in the neteng community > have come up with is because Cogent does not want to adversely impact all > other customers on their router in some sites, where the site's upstreams and > links to neighboring POPs are implemented as something like 4 x 10 Gbps. In > places where they have not upgraded that specific router to a full 100 Gbps > upstream. Moving large flows >2Gbps could result in flat topping a traffic > chart on just 1 of those 10Gbps circuits. It is a very plausible theory, and everyone has this problem to a lesser or greater degree. There was a time when edge interfaces were much lower capacity than backbone interfaces, but I don't think that time will ever come back. So this problem is systemic. Luckily there is quite a reasonable solution to the problem, called 'adaptive load balancing', where software monitors balancing, and biases the hash_result => egress_interface tables to improve balancing when dealing with elephant flows. -- ++ytti
Re: JunOS config yacc grammar?
On Tue, 22 Aug 2023 at 03:30, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote: > Because I've been writing yacc grammars for decades. I just wanted to > see if someone had already done it, as that would save me some time. > But if there's nothing out there I'll just roll one myself. I sympathise with your problem and I've always wanted vendors to publish their parsers, there are many use cases. But as such does not exist, this avenue of attack seems very problematic, unless this whole network lives and dies with you. If not, then your feature velocity now depends on someone adding support for new keywords to the parser generator, no one who comes after you will thank you for adding this dependency to the process. But they might call you and pay stupid money for a 5 min job, so maybe it is a great idea. -- ++ytti
rfc5837 in the wild?
Does anyone have a traceroute path example where transit responds with RFC5837 EO? https://github.com/8enet/traceroute/blob/master/traceroute/extension.c#L101 Output should be '2/x: ' At least JNPR seems to support this: https://www.juniper.net/documentation/us/en/software/junos/transport-ip/topics/topic-map/icmp.html - although support may be just QFX5100, documentation is ambiguous. There is also a patch ( https://lore.kernel.org/all/6a7f33a5-13ca-e009-24ac-fde59fb1c...@gmail.com/T/ ) for linux, but it's not included in the kernel. -- ++ytti
Re: Test Dual Queue L4S (if you are on Comcast)
This seems worse :) 'we are collecting data about you, but didn't bother thinking if it is needed' On Fri, 16 Jun 2023 at 22:55, Livingood, Jason via NANOG wrote: > > In the meantime please just select some unrelated industry on the form. We > don’t care – it seems to be boilerplate. > > > > From: "Livingood, Jason" > Date: Friday, June 16, 2023 at 15:46 > To: "Eric C. Miller" , nanog > Subject: Re: [EXTERNAL] RE: Test Dual Queue L4S (if you are on Comcast) > > > > We’re working to fix that. Sorry! > > > > From: "Eric C. Miller" > Date: Friday, June 16, 2023 at 15:18 > To: Jason Livingood , nanog > Subject: [EXTERNAL] RE: Test Dual Queue L4S (if you are on Comcast) > > > > FYI, when trying to sign up, it tells me that my input isn’t required because > I work in the telco industry. > > > > Eric > > > > From: NANOG On Behalf Of > Livingood, Jason via NANOG > Sent: Friday, June 16, 2023 2:30 PM > To: nanog > Subject: Test Dual Queue L4S (if you are on Comcast) > > > > FYI that today we (Comcast) have announced the start of low latency > networking (L4S) field trials. If you are a customer and would like to > volunteer, please visit this page. > > > > For more info, there is a blog post that just went up at > https://corporate.comcast.com/stories/comcast-kicks-off-industrys-first-low-latency-docsis-field-trials > > > > We anticipate testing with several different cable modems and a range of > applications that are marking. We plan to share detailed results of the trial > at IETF-118 in November. > > > > Any app developers interested in working with us can either email me > direction or low-latency-partner-inter...@comcast.com. > > > > Thanks! > Jason > > > > > > > > > > -- ++ytti
Re: Do ISP's collect and analyze traffic of users?
I can't tell what large is. But I've worked for enterprise ISP and consumer ISPs, and none of the shops I worked for had capability to monetise information they had. And the information they had was increasingly low resolution. Infraprovider are notoriously bad even monetising their infra. I'm sure do monetise. But generally service providers are not interesting or have active shareholders, so very little pressure to make more money, hence firesales happen all the time due infrastructure increasingly seen as a liability, not an asset. They are generally boring companies and internally no one has incentive to monetise data, as it wouldn't improve their personal compensation. And regulations like GDPR create problems people rather not solve, unless pressured. Technically most people started 20 years ago with some netflow sampling ratio, and they still use the same sampling ratio, despite many orders of magnitude more packets. Meaning previously the share of flows captured was magnitude higher than today, and today only very few flows are seen in very typical applications, and netflow is largely for volumetric ddos and high level ingressAS=>egressAS metrics. Hardware offered increasingly does IPFIX as if it was sflow, that is, 0 cache, immediately exported after sampled, because you'd need like 1:100 or higher resolution, to have any significant luck in hitting the same flow twice. PTX has stopped supporting flow-cache entirely because of this, at the sampling rate where cache would do something, the cache would overflow. Of course there are other monetisation opportunities via other mechanism than data-in-the-wire, like DNS On Tue, 16 May 2023 at 15:57, Tom Beecher wrote: > > Two simple rules for most large ISPs. > > 1. If they can see it, as long as they are not legally prohibited, they'll > collect it. > 2. If they can legally profit from that information, in any way, they will. > > Now, ther privacy policies will always include lots of nice sounding clauses, > such as 'We don't see your personally identifiable information'. This of > course allows them to sell 'anonymized' sets of that data, which sounds great > , except as researchers have proven, it's pretty trivial to scoop up > multiple, discrete anonymized data sets, and cross reference to identify > individuals. Netflow data may not be as directly 'valuable' as other types of > data, but it can be used in the blender too. > > Information is the currency of the realm. > > > > On Mon, May 15, 2023 at 7:00 PM Michael Thomas wrote: >> >> >> And maybe try to monetize it? I'm pretty sure that they can be compelled >> to do that, but do they do it for their own reasons too? Or is this way >> too much overhead to be doing en mass? (I vaguely recall that netflow, >> for example, can make routers unhappy if there is too much "flow"). >> >> Obviously this is likely to depend on local laws but since this is NANOG >> we can limit it to here. >> >> Mike >> -- ++ytti
Re: Reverse DNS for eyeballs?
On Fri, 21 Apr 2023 at 20:44, Jason Healy via NANOG wrote: > This is not intended as snark: what do people recommend for IPv6? I try to > maintain forward/reverse for all my server/infrastructure equipment. But > clients? They're making up temporary addresses all day long. So far, I've > given up on trying to keep track of those addresses, even though it's a > network under my direct control. Stateless generation at query time - https://github.com/cmouse/pdns-v6-autorev/blob/master/rev.pl I wrote some POCs quite bit long ago http://p.ip.fi/L5PK - base36 http://p.ip.fi/CAtB - rfc2289 -- ++ytti
Re: 1.1.1.1 support?
On Wed, 22 Mar 2023 at 16:04, Alexander Huynh via NANOG wrote: > I'll take this feedback to our developers. Many thanks. > I took a look at the above tickets, and it seems that one of the egress > ranges from that datacenter cannot connect to the authoritative > nameservers of `www.moi.gov.cy`: `ns01.gov.cy` and `ns02.gov.cy`. > > Here's a redacted pcap for those who like details, showing no response: > > IP a.b.c.d.56552 > 212.31.118.19.53: 51873+ [1au] A? www.moi.gov.cy. (55) > IP a.b.c.d.51718 > 212.31.118.20.53: 31021+ [1au] A? www.moi.gov.cy. (55) > > TCP behaves similarly. The recursor response suggests a loop, so network problem is highly likely. > I'm filing an internal ticket right now to investigate, but I'd > appreciate if you could also help us on your end for any possible > solutions regarding this connectivity failure. Sure, you might also want to look into nlnog ring, which allows a broad perspective to issues. > As a general note regarding the two community posts: the straight deep > dive into technical information makes it more difficult for others to > interpret the request. As you said in a later post here: This is a very difficult subject. How to get help. If I had made it more genetic, we could refute it as it doesn't contain needed information. If I made it longer we could refute that it's not terse enough. However we submit it, we can argue it wasn't the right way. As seen in the original post, I fully appreciate almost every single case about 1.1.1.1 is incorrect and user error. But I proposed a mechanism to by-pass community forums and reach people who are able to help and understand. If there is disagreement in 1.1.1.1, 8.8.8.8 and 9.9.9.9 then let humans analyse it. The ticket volume would be trivial, if we look at community forums and see how many 1.1.1.1 complaints would bypass this filter. > Not everyone in the Community Forum (nor our company) can pull out the > specific datacenter used, the specific machine(s) used, and the source > ASN from the `my.ip.fi` curl. I gave the specific unicast ID for the DNS server in addition to my IP. I cannot glean any other information. I don't think we can fairly fault either of the cases in the community forum. We must fault the process itself and look for ways to improve. -- ++ytti
Re: 1.1.1.1 support?
Yes, it works in every other CF except LCA-CF. Thank you for the additional data point. You can use `dig CHAOS TXT id.server @1.1.1.1 +nsid` to get two unicast identifiers for the server you got the response from. On Wed, 22 Mar 2023 at 15:49, Josh Luthman wrote: > > Try asking dns-operati...@lists.dns-oarc.net for someone at CloudFlare. > > For what it's worth, it works for me. I'm in Troy, OH. > > C:\Users\jluthman>dig www.moi.gov.cy @1.1.1.1 +short > 212.31.118.26 > > > On Wed, Mar 22, 2023 at 9:43 AM Saku Ytti wrote: >> >> >> >> On Wed, 22 Mar 2023 at 15:26, Matt Harris wrote: >> >>> >>> When something is provided at no cost, I don't see how it can be unethical >>> unless they are explicitly lying about the ways in which they use the data >>> they gather. >>> Ultimately, you're asking them to provide a costly service (support for >>> end-users, the vast majority of whom will not ask informed, intelligent >>> questions like the members of this list would be able to, but would still >>> demand the same level of support) on top of a service they are already >>> providing at no cost. That's both unrealistic and unnecessary. There's an >>> exceedingly simple solution, here, after all: if you don't like their >>> service or it isn't working for you as an end-user, don't use it. >> >> >> Thank you for the philosophical perspective, but currently my interest is >> not to debate merits or lack thereof in laissez-faire economics. >> >> The problem is, a large number of people will use 1.1.1.1, 8.8.8.8 or >> 9.9.9.9 despite my or your position about it. There is incentive for >> providers to provide it 'for free', as it adds value to their products as >> users are compensating providers with the data. >> >> Occasionally things don't work and when they do not, we need a way to inform >> the provider 'hey you have a problem'. You could be anywhere in this chain, >> with no ability to impact any of the decisions. >> >> I know there is a real problem, I know real users are impacted, I know >> almost none of them will have the ability to understand why there is a >> problem or remediate it. >> >> -- >> ++ytti -- ++ytti
Re: 1.1.1.1 support?
On Wed, 22 Mar 2023 at 15:26, Matt Harris wrote: > When something is provided at no cost, I don't see how it can be unethical > unless they are explicitly lying about the ways in which they use the data > they gather. > Ultimately, you're asking them to provide a costly service (support for > end-users, the vast majority of whom will not ask informed, intelligent > questions like the members of this list would be able to, but would still > demand the same level of support) on top of a service they are already > providing at no cost. That's both unrealistic and unnecessary. There's an > exceedingly simple solution, here, after all: if you don't like their > service or it isn't working for you as an end-user, don't use it. > Thank you for the philosophical perspective, but currently my interest is not to debate merits or lack thereof in laissez-faire economics. The problem is, a large number of people will use 1.1.1.1, 8.8.8.8 or 9.9.9.9 despite my or your position about it. There is incentive for providers to provide it 'for free', as it adds value to their products as users are compensating providers with the data. Occasionally things don't work and when they do not, we need a way to inform the provider 'hey you have a problem'. You could be anywhere in this chain, with no ability to impact any of the decisions. I know there is a real problem, I know real users are impacted, I know almost none of them will have the ability to understand why there is a problem or remediate it. -- ++ytti
Re: 1.1.1.1 support?
If you wish to consult people on how to configure DNS, please reach out to the responsible folk. I am discussing a specific recursor in anycasted setup not resolving domain and provider offering no remediation channel. These are two entirely different classes of problem and collapsing them into a single problem is not going to help in either case. On Wed, 22 Mar 2023 at 12:25, Mark Andrews wrote: > > What about the zone not having a single point of failure? Both servers > are covered by the same /24. > > % dig www.moi.gov.cy @212.31.118.19 +norec +dnssec > > ; <<>> DiG 9.19.11-dev <<>> www.moi.gov.cy @212.31.118.19 +norec +dnssec > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17380 > ;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 3 > > ;; OPT PSEUDOSECTION: > ; EDNS: version: 0, flags:; udp: 4096 > ; COOKIE: 6387183a6031ef182fa6ade7641ad4ff2a078213f4e24fc9 (good) > ;; QUESTION SECTION: > ;www.moi.gov.cy. IN A > > ;; ANSWER SECTION: > www.moi.gov.cy. 3600 IN A 212.31.118.26 > > ;; AUTHORITY SECTION: > moi.gov.cy. 3600 IN NS ns01.gov.cy. > moi.gov.cy. 3600 IN NS ns02.gov.cy. > > ;; ADDITIONAL SECTION: > ns02.gov.cy. 86400 IN A 212.31.118.20 > ns01.gov.cy. 86400 IN A 212.31.118.19 > > ;; Query time: 374 msec > ;; SERVER: 212.31.118.19#53(212.31.118.19) (UDP) > ;; WHEN: Wed Mar 22 21:14:23 AEDT 2023 > ;; MSG SIZE rcvd: 157 > > % > > > On 22 Mar 2023, at 19:36, Saku Ytti wrote: > > > > Am I correct to understand that 1.1.1.1 only does support via community > > forum? > > > > They had just enough interest in the service to collect user data to > > monetise, but 0 interest in trying to figure out how to detect and > > solve problems? > > > > Why not build a web form where they ask you to explain what is not > > working, in terms of automatically testable. Like no A record for X. > > Then after you submit this form, they test against all 1.1.1.1 and > > some 9.9.9.9 and 8.8.8.8 and if they find a difference in behaviour, > > the ticket is accepted and sent to someone who understands DNS? If > > there is no difference in behaviour, direct people to community > > forums. > > This trivial, cheap and fast to produce support channel would ensure > > virtually 0 trash support cases, so you wouldn't even have to hire > > people to support your data collection enterprise. > > The number of times that 8.8.8.8 “works” but there is an actual error > is enormous. 8.8.8.8 tolerates lots of protocol errors which ends up > causing support cases for others where the result is “the servers are > broken in this way”. You then try to report the issue but the report > is ignored because “It works with 8.8.8.8”. > > > Very obviously they selfishly had no interest in ensuring 1.1.1.1 > > actually works, as long as they are getting the data. I do not know > > how to characterise this as anything but unethical. > > > > https://community.cloudflare.com/t/1-1-1-1-wont-resolve-www-moi-gov-cy-in-lca-235m3/487469 > > https://community.cloudflare.com/t/1-1-1-1-failing-to-resolve/474228 > > > > If you can't due to resources or competence support DNS, do not offer one. > > > > -- > > ++ytti, cake having and cake eating user > > -- > Mark Andrews, ISC > 1 Seymour St., Dundas Valley, NSW 2117, Australia > PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org > -- ++ytti
1.1.1.1 support?
Am I correct to understand that 1.1.1.1 only does support via community forum? They had just enough interest in the service to collect user data to monetise, but 0 interest in trying to figure out how to detect and solve problems? Why not build a web form where they ask you to explain what is not working, in terms of automatically testable. Like no A record for X. Then after you submit this form, they test against all 1.1.1.1 and some 9.9.9.9 and 8.8.8.8 and if they find a difference in behaviour, the ticket is accepted and sent to someone who understands DNS? If there is no difference in behaviour, direct people to community forums. This trivial, cheap and fast to produce support channel would ensure virtually 0 trash support cases, so you wouldn't even have to hire people to support your data collection enterprise. Very obviously they selfishly had no interest in ensuring 1.1.1.1 actually works, as long as they are getting the data. I do not know how to characterise this as anything but unethical. https://community.cloudflare.com/t/1-1-1-1-wont-resolve-www-moi-gov-cy-in-lca-235m3/487469 https://community.cloudflare.com/t/1-1-1-1-failing-to-resolve/474228 If you can't due to resources or competence support DNS, do not offer one. -- ++ytti, cake having and cake eating user
Re: Reverse Traceroute
On Mon, 27 Feb 2023 at 10:16, Rolf Winter wrote: > "https://downforeveryoneorjustme.com/";. But, somebody might use your > server for this. How do people feel about this? Restrict the reverse > traceroute operation to be done back to the source or allow it more > freely to go anywhere? What are the pros and cons of this? Let's call it destination TLV. If I am someone who wants to do volumetric attack, I won't set any destination TLV, because without destination TLV and by spoofing my source, I get more leverage. If my source and destination TLV differ, then I have less leverage. So in this sense, it adds no security implications, but adds a massive amount of diagnostic power, as one very common request is to ask traceroute between nodes you have no access to. What it would allow is port knocking the ports used through proxy, if this matters or not might be debatable. Perhaps the standard should consider some abilities to be default on, and others default off, and let the operator decide if they want to turn some default off abilities on, such as honoring destination TLV. -- ++ytti
Re: intuit DNS
╰─ dig NS intuit.com|grep ^intuit|ruby -nae 'puts $F[-1]'|while read dns;do echo $dns:;dig smartlinks.intuit.com @$dns|grep CNAME done a7-66.akam.net.: smartlinks.intuit.com. 30 IN CNAME cegnotificationsvc.intuit.com. a11-64.akam.net.: smartlinks.intuit.com. 30 IN CNAME cegnotificationsvc.intuit.com. a24-67.akam.net.: smartlinks.intuit.com. 30 IN CNAME cegnotificationsvc.intuit.com. a1-182.akam.net.: smartlinks.intuit.com. 30 IN CNAME cegnotificationsvc.intuit.com. a6-66.akam.net.: smartlinks.intuit.com. 30 IN CNAME cegnotificationsvc.intuit.com. a18-64.akam.net.: smartlinks.intuit.com. 30 IN CNAME cegnotificationsvc.intuit.com. dns1.p01.nsone.net.: dns2.p01.nsone.net.: dns3.p01.nsone.net.: dns4.p01.nsone.net.: ╭─ ytti@ytti ~ 0|0|0|1 ↵ 09:58:40 On Sat, 11 Feb 2023 at 23:01, Daniel Sterling wrote: > > Someone at Intuit please look into why your DNS for this A record > hasn't been consistently resolving, this has been going on for several > days if not weeks > > https://dnschecker.org/#A/smartlinks.intuit.com > > -- Dan -- ++ytti
Re: Typical last mile battery runtime (protecting against power cuts)
On Sun, 5 Feb 2023 at 07:50, Chris Adams wrote: > Electric heat pumps are great for power efficiency until the temperature > drops and they switch over to pure electric heat. Here is graph from popular air heat pump Mitsubishi MSZ/MUZ 25 https://scanoffice.fi/wp-content/uploads/2022/09/rw-vtt-tuntikeskiarvo.jpg https://scanoffice.fi/wp-content/uploads/2022/09/rw-vttn-testitulos.png At -30c external, with +20c internal the units produce heat at approximately 2x of the electric input. But many other units do not perform that well even at -20c external. And these units are premium priced. Modern R32 units consistently outperform old R410A units. -- ++ytti
Re: Typical last mile battery runtime (protecting against power cuts)
On Fri, 3 Feb 2023 at 16:15, Israel G. Lugo wrote: > Could anyone with last mile experience help with some ballpark figures? > I.e. 15 min vs 8h or 8 days. This would be highly market specific. In many cases, probably most cases, there is no regulatory requirement for availability for internet service whatsoever. One specific case where it is regulated, Finland, the regulation is available in Finnish, Swedish and English, the English document is available at: https://www.finlex.fi/data/normit/47143/05_Regulation_on_resilience_of_communications_networks_and_services_and_of_synchronisation_of_communications_networks.pdf It classifies service to five priorities with different availability requirements. From your ballpark, 8h would be the closest fit, but in theory the higher priorities have indefinite availability before the system is exhausted by means of generation. In practice I would default to expecting 0 min availability during power outage, regardless of how resilient my CPE is. We can scarcely make the Internet work at the best of times. -- ++ytti
Re: MX204 and MPC7E-MRATE EoL - REVOKED
On Sat, 28 Jan 2023 at 08:48, Mark Tinka wrote: > Apparently, the shortage of chips for the MX204 and MPC7E is now resolved, > and there is no longer any need to force customers to move to the MX304. There is still just Micron for HMC, and as far as I can find, they've not revoked their EOL. You can't find the HMC product page from Micron 'products' anymore, and hardly any mentions anywhere. Everyone is now focusing on HBM3. https://www.micron.com/about/blog/2018/august/micron-announces-shift-in-high-performance-memory-roadmap-strategy Whatever led to this problem, and what led to this EOL revocation is not something Juniper has communicated. If I'd have to stab in the dark based on nothing, I'd imagine they forgot HMC is no longer shipping, and then panicked and EOLd all HMC boxes, until someone did more work, and gathered they probably can support a few HMC platforms with existing HMC parts they have. I would be very uneasy committing to HMC gear, unless I'd have a better understanding of what the problem was, and why it is no longer a problem. My concern would be, if they were wrong once to EOL all, then wrong again to revoke some EOL, can I trust them now to have HMC parts for any RMAs I have down the life expectancy. Not at all uncommon to run a box for a decade in SP network, and Juniper released all-new HMC gear, after Micron announced HMC EOL. For HBM there is Samsung, Hynix and Micron coming up, so HBM seems safe. Unclear how safe HBM2 is now, as HBM3 is shipping, for the life expectancy SP gears have. Obviously most of the market moves faster, no one is going to run HBM2 GPUs decade from now. We are a kinda shitty market, few units, long sales times, long cycles. -- ++ytti
Re: Large RTT or Why doesn't my ping traffic get discarded?
On Thu, 22 Dec 2022 at 08:41, William Herrin wrote: > Suppose you have a loose network cable between your Linux server and a > switch. Layer 1. That RJ45 just isn't quite solid. It's mostly working > but not quite right. What does it look like at layer 2? One thing it > can look like is a periodic carrier flash where the NIC thinks it has > no carrier, then immediately thinks it has enough of a carrier to > negotiate speed and duplex. How does layer 3 respond to that? Agreed. But then once the resolve happens, and linux floods the queued pings out, the responses would come ~immediately. So the delta between the RTT would remain at the send interval, in this case 1s. In this case, we see the RTT decreasing as if the buffer is being purged, until it seems to be filled again, up-until 5s or so. I don't exclude the rationale, I just think it's not likely based on the latencies observed. But at any rate with so little data, my confidence to include or exclude any specific explanation is low. > > 1s: send ping toward default router > 1.1s: ping response from remote server > 2s: send ping toward default router > 2.1s: ping response from remote server > 2.5s: carrier down > 2.501s: carrier up > 3s: queue ping, arp for default router, no response > 4s: queue ping, arp for default router, no response > 5s: queue ping, arp for default router, no response > 6s: queue ping, arp for default router, no response > 7s: queue ping, arp for default router > 7.01s: arp response, send all 5 queued pings but note that the > earliest is more than 4 seconds old. > 7.1s: response from all 5 queued pings. > > Cable still isn't right though, so in a few seconds or a few minutes > you're going to get another carrier flash and the pattern will repeat. > > I've also seen some cheap switches get stuck doing this even after the > faulty cable connection is repaired, not clearing until a reboot. > > Regards, > Bill Herrin > > > -- > For hire. https://bill.herrin.us/resume/ -- ++ytti
Re: Large RTT or Why doesn't my ping traffic get discarded?
There certainly aren't any temporal buffers in SP gear limiting the buffer to 100ms, nor are there any mechanisms to temporally decrease TTL or hop-limit. Some devices may expose temporal configuration to UX, but that is just a multiplier for max_buffer_bytes, and what is programmed is a fixed amount of bytes instead of temporal limit as function of observed traffic rate. This is important, because HW may support tens or even hundreds of thousands of queues, because HW may support large amount of logical interfaces with HQoS and multiple queues each, then if such device is ran with single logical interface, which is low speed either physically or shaped, you may end up having very very long temporal queues, not because people intend to queue long, but because understanding all of this requires lot of context and information about platform which isn't readily available nor is solved by 'just remove those buffers from devices physically, it's bufferbloat'. Like others have pointed out, there is not much information to go with and this could be many things, one of those could be 'buffer bloat' like Taht pointed out, this might be true because cyclical nature of the ping, buffer getting filled and drained. I don't really think ARP/ND is good candidate like Herring suggested, because it's cyclical, instead of exactly single event, but not impossible. We'd really need to see full mtr output, and if or not this affects other destinations, if it just affects icmp or also dns, ideally reverse traceroute as well. I can tell that I'm not observing the issue, nor did I expect to observe it, as I expect problem to close to your network, and therefore affecting a lot of destinations. On Thu, 22 Dec 2022 at 07:35, Jerry Cloe wrote: > > > Because there is no standard for discarding "old" traffic, only discard is > for packets that hop too many times. There is, however, a standard for > decrementing TTL by 1 if a packet sits on a device for more than 1000ms, and > of course we all know what happens when TTL hits zero. Based on that, your > packet could have floated around for another 53 seconds. Having said that, > I'm not sure many devices actually do this (but its not likely it would have > had a significant impact on this traffic anyway). > > > > -Original message- > From: Jason Iannone > Sent: Wed 12-21-2022 11:11 am > Subject: Large RTT or Why doesn‘t my ping traffic get discarded? > To: North American Network Operators‘ Group ; > Here's a question I haven't bothered to ask until now. Can someone please > help me understand why I receive a ping reply after almost 5 seconds? As I > understand it, buffers in SP gear are generally 100ms. According to my math > this round trip should have been discarded around the 1 second mark, even in > a long path. Maybe I should buy a lottery ticket. I don't get it. What is > happening here? > > Jason > > 64 bytes from 4.2.2.2: icmp_seq=392 ttl=54 time=4834.737 ms > 64 bytes from 4.2.2.2: icmp_seq=393 ttl=54 time=4301.243 ms > 64 bytes from 4.2.2.2: icmp_seq=394 ttl=54 time=3300.328 ms > 64 bytes from 4.2.2.2: icmp_seq=396 ttl=54 time=1289.723 ms > Request timeout for icmp_seq 400 > Request timeout for icmp_seq 401 > 64 bytes from 4.2.2.2: icmp_seq=398 ttl=54 time=4915.096 ms > 64 bytes from 4.2.2.2: icmp_seq=399 ttl=54 time=4310.575 ms > 64 bytes from 4.2.2.2: icmp_seq=400 ttl=54 time=4196.075 ms > 64 bytes from 4.2.2.2: icmp_seq=401 ttl=54 time=4287.048 ms > 64 bytes from 4.2.2.2: icmp_seq=403 ttl=54 time=2280.466 ms > 64 bytes from 4.2.2.2: icmp_seq=404 ttl=54 time=1279.348 ms > 64 bytes from 4.2.2.2: icmp_seq=405 ttl=54 time=276.669 ms -- ++ytti
Re: Large prefix lists/sets on IOS-XR
On Fri, 9 Dec 2022 at 20:19, t...@pelican.org wrote: Hey Tim, > Or at least, you've moved the problem from "generate config" to "have > complete and correct data". Which statement should probably come with some > kind of trigger-warning... I think it's a lot easier than you think. I understand that all older networks and practical access networks have this problem, the data is in the network, it's of course not the right way to do it, but it's the way they are. But there is no reason to get discouraged. First you gotta ignore waterfall model, you can never order something ready and have utility out of it, because no data. What you can do, day1 a) copy configs as-is, as templates b) only edit the template c) push templates to network boom, now you are FAR, and that took an hour or day depending on the person. Maybe you feel like you've not accomplished much, but you have. Now you can start modelling data out of the template into the database, and keep shrinking the 'blobs'. You can do this at whatever pace is convenient, and you can trivially measure which one to do next, which one will reduce total blob bytes most. You will see constant, measurable progress. And you always know the network state is always what is in your files, as you are now always replacing the entire config with the generated config. -- ++ytti
Re: Large prefix lists/sets on IOS-XR
On Fri, 9 Dec 2022 at 17:58, Joshua Miller wrote: > In terms of structured vs unstructured data, sure, assembling text is not a > huge lift. Though, when you're talking about layering on complex use cases, > then it gets more complicated. Especially if you want to compute the inverse > configuration to remove service instances that are no longer needed. In terms > of vendor support, I'd hope that if you're paying that kind of money, you're > getting a product that meets your requirements. Something that should be > assessed during vendor selection and procurement. That's just my preference; > do whatever works best for your use cases. Deltas are _super_ hard. But you never need to do them. Always produce a complete config, and let the vendor deal with the problem. We've done this with Junos, IOSXR, EOS (compass, not arista, RIP), SROS (MDCLI) for years If you remove the need for deltas the whole problem becomes extremely trivial. Fill in all the templates with data, push it. -- ++ytti
Re: Large prefix lists/sets on IOS-XR
On Fri, 9 Dec 2022 at 17:30, Tom Beecher wrote: > Pushing thousands of lines via CLI/expect automation is def not a great idea, > no. Putting everything into a file, copying that to the device, and loading > from there is generally best regardless. The slowness you refer to is almost > certainly just because of how XR handles config application. If I'm following > correctly, that seems to be the crux of your question. If you read carefully, that is what Steffann is doing. He is doing 'load location:file' + 'commit'. He is not punching anything by hand. So the answer we are looking for is how to make that go faster. In Junos answer would be 'ephemeral config', but in IOS-XR as far as I know, the only thing you can do is improve the 'load' part by moving the server closer, other than that, you get what you get. -- ++ytti
Re: Large prefix lists/sets on IOS-XR
On Fri, 9 Dec 2022 at 17:07, Joshua Miller wrote: > I don't know that Netconf or gRPC are any faster than loading cli. Those > protocols facilitate automation so that the time it takes to load any one > device is not a significant factor, especially when you can roll out changes > to devices in parallel. Also, it's easier to build the changes into a > structured format than assemble the right syntax to interact with the CLI. As a programmer I don't really find output format to be significant cost. If I have source of data how I emit them out doesn't matter much. I accept preferences that people have, but don't think it to be important part of solution. Adrian mentioned paramiko, and if we imagine paramiko logging into IOS-XR, and doing 'load http://...' + 'commit'. We've automated the task. Depending on your platform netconf/yang/grpc can be asset or liability, I put IOS-XR strongly in the liability part, because they don't have proper infrastructure that is datafirst, they don't have proper module for even handling configurations, but configurations are owned by individual component teams (like tunnel teams owns GRE config and so forth). Contrasting with Juniper, which is datafirst, and even CLI is 2nd class citizen taking formal data from XML RPC. In IOS-XR you will find all kind of gaps, where you can't rely on netconf/yang, which you will then spend cycles to deal with vendor. Compared to people who use the first class citizen approach, CLI format, who are already done. I did not read Steffan as though he'd be punching in anything manually, he wants to make the process itself faster, without any delays introduced by humans. And I have personally nothing to offer him, except put your server closer to the router, so you can deal with the limited TCP window sizes that hurt transfer speed. -- ++ytti
Re: Large prefix lists/sets on IOS-XR
Can Andrian and Joshua explain what they specifically mean, and how they expect it to perform over what Steffann is already doing (e.g. load https://nms/cfg/router.txt)? How much faster will it be, and why? Can Steffan explain how large a file they are copying, over what protocol, how long does it take, and how long does the commit take. We used to have configurations in excess of a million lines before 'or-longer' halved them, and we've seen much longer times than 30min to get a new config pushed+commtited. We use FTP and while the FTP does take its sweet time, the commit itself is very long as well. I refrain from expressing my disillusionment with the utility of doing IRR based filtering. On Fri, 9 Dec 2022 at 15:38, Andrian Visnevschi via NANOG wrote: > > Two options: > - gRPC > - Netconf > > You can use tools like paramiko,netmiko or napalm that are widely used to > programmatically configure and manage your XR router. > > > On Fri, Dec 9, 2022 at 2:24 AM Joshua Miller wrote: >> >> Netconf is really nice for atomic changes to network devices, though it >> would still take some time for the device to process such a large change. >> >> On Thu, Dec 8, 2022 at 6:05 PM Sander Steffann wrote: >>> >>> Hi, >>> >>> What is the best/most efficient/most convenient way to push large prefix >>> lists or sets to an XR router for BGP prefix filtering? Pushing thousands >>> of lines through the CLI seems foolish, I tried using the load command but >>> it seems horribly slow. What am I missing? :) >>> >>> Cheers! >>> Sander >>> >>> --- >>> for every complex problem, there’s a solution that is simple, neat, and >>> wrong > > > > -- > > Cheers, > > Andrian Visnevschi > > -- ++ytti
Re: Newbie Concern: (BGP) AS-Path Oscillation
I don't think this is normal, I think this is a fault and needs to be addressed. There should be significant reachability problems, because rerouting isn't neither immediate, nor lock-step with SW+HW nor synchronous between nodes. What exactly needs to be done, I can't tell without looking at the specific case. I'm not sure I understand 'tail-end ' and 'origin announcer' as synonyms, tail to me means receiver, head advertiser. But origin announcer to me means advertiser. So I'm not sure in which position you are. But if you are the source of this prefix, then you can probably fix the situation, if you are not, then you probably cannot fix the situation. On Mon, 28 Nov 2022 at 07:56, Pirawat WATANAPONGSE via NANOG wrote: > > Dear Guru(s), > > > My apologies upfront if this question has already been asked. > If that’s the case, please kindly point me to the solution|thread so that the > mailing list bandwidth is not wasted. > > Situation: > On one of our prefixes, we are detecting continuous “BGP AS-Path Changes” in > the order of 1,000 announcements per hour---practically one every 3-4 seconds. > Those paths oscillate between two of our immediate upstreams. > > Questions: > 1. Is this number of events “normal” for a prefix? > 2. Is there any way we, as the tail-end (Origin Announcer), can do to reduce > it? Or should I just “let it be”? > 3. [Extra] Is this kind of oscillation affecting user experience, say, > throughput and/or latency? > > Thank you in advance for all the pointers and help. > > > Best Regards, > > Pirawat. > -- ++ytti
Re: Random Early Detect and streaming video
Hey, On Mon, 7 Nov 2022 at 21:58, Graham Johnston wrote: > I've been involved in service provider networks, small retail ISPs, for 20+ > years now. Largely though, we've never needed complex QoS, as at > $OLD_DAY_JOB, we had been consistently positioned to avoid regular link > congestion by having sufficient capacity. In the few instances when we've > had link congestion, egress priority queuing met our needs. What does 'egress priority queueing' mean? Do you mean 'send all X, before any Y, send all Y before any Z'? If this, then this must have been quite some time now, as since traffic managers were implemented in hardware ages ago, this hasn't been available. And the only thing that has been available has been 'X has guaranteed rate X1, Y has Y1 and Z has Z1' and love it or hate it, that's the QoS tool industry has decided you need. > combine that with the buffering and we should adjust the drop profile to kick > in at a higher percentage. Today we use 70% to start triggering the drop > behavior, but my head tells me it should be higher. The reason I am saying > this is that we are dropping packets ahead of full link congestion, yes that > is what RED was designed to do, but I surmise that we are making this > application worse than is actually intended. I wager almost no one knows what their RED curve is, and different vendors have different default curves which is then the curve almost everyone uses. Some use a RED curve such that everything is basically tail drop (Juniper, 0% drop at 96% fill and 100% drop at 98% fill). Some are linear. Some allow defining just two points, some allow defining 64 points. And almost no one has any idea what their curve is, i.e. mostly it doesn't matter. If it usually mattered, we'd all know what the curve is and why. As practical example Juniper has basically In your case, I assume you have at least two points with 0% drop at 69% fill, then a linear curve from 70% to 100% fill with 1% to 100% drop. It doesn't seem outright wrong to me. You have 2-3 goals here, to avoid synchronising TCP flows so that you have steady fill, instead of wave-like behaviour and to reduce queueing delay for packets not dropped, which would experience as long a delay as there is queue if tail dropped. You could have a 3rd possible goal, if you map more than 1 class of packets into the same queue you can still give them different curves, so during congestion in a single queue can show two different behaviours depending on packet. So what is the problem you're trying to fix? Can you measure it? I suspect in a modern high speed network with massive amounts of flows the wave-like synchronisation is not a problem. If you can't measure it or If your only goal is to reduce queueing delay because you have 'strategic' congestion, perhaps instead of worrying about RED, use tail only and reduce queue size to something that is tolerable 1ms-5ms max? -- ++ytti
Re: Router ID on IPv6-Only
On Fri, 9 Sept 2022 at 09:31, Crist Clark wrote: > As I said in the original email, I realize router IDs just need to be > unique in > an AS. We could have done random ones with IPv4, but using a well chosen In some far future this will be true. We meet eBGP speakers across the world, and not everyone supports route refresh, _TODAY_, I suspect mostly because internally developed eBGP implementations and developers were not very familiar with how real life BGP works. RFC6286 is not supported by all common implementations, much less uncommon. And even for common implementations it requires a very new image (20.4 for Junos, many are even in 17.4 still). So while we can consider BGP router-id to be only locally significant when RFC6286 is implemented, in practice you want to be defensive in your router-id strategy, i.e. avoid at least scheme of 1,2,3,4,5,6... on thesis that will be common scheme and liable to increase support costs down the line due to collision probability being higher. While it might also add commercial advantage for transit providers, to have low router-id to win billable traffic. > And to get even a little more specific about our particular use case and > the > suggestion here to build the device location into the ID, we're > generally not I would strongly advise against any information-to-ID mapping schemes. This adds complexity and reduces flexibility and requires you to know the complete problem ahead of time, which is difficult, only have rules you absolutely must have. I am sure most people here have experience having too cutesy addressing schemes some time in their past, where forming an IP address had unnecessary rules in them, which just created complexity and cost in future. If you can add an arbitrary 32b ID to your database, this problem becomes very easy. If not, it's tricky. -- ++ytti
Re: Router ID on IPv6-Only
On Thu, 8 Sept 2022 at 10:22, Bjørn Mork wrote: > I'm not used to punching anything, so I probably have too simple a view > of the world. > > But I still don't understand how this changes the ID allocation scheme, > which is how I understood the question. I assume the punched value was > based on input from somewhere? Today 1. Don't punch - wont work, you have to (junos) 2. Punch IPv4 - won't work So what to do tomorrow? -- ++ytti
Re: Router ID on IPv6-Only
On Thu, 8 Sept 2022 at 10:01, Bjørn Mork wrote: > Why would you do it differently than for dual-stack routers, except that > you skip the step where you configure the ID as a loopback address? Because you may not have an option, if you're IPv6 only, vendors (e.g. junos) may expect you to punch it manually. Of course most of us punch it manually as loopback0 IPv4 to have more control over outcome. Question is legitimate and represents change where previously used mechanisms do not apply, therefore OP is right to ask 'well what should I do now'. -- ++ytti
Re: Router ID on IPv6-Only
Hey, > Well, now there is no IPv4. But BGP, OSPFv3, and other routing protocols > still use 32-bit router IDs for IPv6. On the one hand, there are plenty of > 32-bit numbers to use. Generally speaking, router IDs just need to be unique > inside of an AS to do their job, but (a) for humans or automation to generate > them and (b) to easily recognize them, it's convenient to have some algorithm > or methodology for assigning them. 2nd hand knowledge, but when this was discussed early on in standardization, someone argued against 128b ID, because it would require too much bandwidth in their OSPF network. Joys of everyone plays standardisation. > Has anyone thought about this or have a good way to do it? We had ideas like > use bits 32-63 from an interface. Seems like it could work, but also could > totally break down if we're using >64-bit prefixes for things like > router-to-router links or pulling router loopbacks out of a common /64. If your data is in a database I think the best bet is to algorithmically generate multiple forms of IDs in your device and interface rows, to satisfy various restrictions on what forms of IDs are accepted. And then use these IDs. If your data is in configs, you don't have really good solutions, but you could choose 32b from your IPv6 loopback right side :/. -- ++ytti
Re: End of Cogent-Sprint peering wars?
On Thu, 8 Sept 2022 at 01:06, Jawaid Bazyar wrote: > $1 deals usually come with an operation in the red, or assumption of > significant debts. To me this looks like a continuation of the game of attrition for infrastructure players. No one seems to know how to capitalise infrastructure, and ostensibly cheap deals have brought shops down before due to naive buyers (GTT). But I do think this makes sense for both tmus and ccoi, for tmus infrastructure is a bad risk and they can always afford to procure the service at market price, by moving costs to customers. For CCOI they don't have much choice but to figure out how to turn infrastructure into money, if they can't they're dead anyhow, now they're dead just little bit sooner, so it seems like a good risk for CCOI. I am a little bit more optimistic in CCOI leadership's ability to capitalise this than the ability GTT had, and wish them good luck. -- ++ytti
Re: IoT - The end of the internet
On Wed, 10 Aug 2022 at 12:48, Pascal Thubert (pthubert) wrote: Hey, > I do not share that view: I'm not sure how you read my view. I was not attempting to communicate anything negative of IPv6. What I attempted to communicate - near future looks to improve IOT security posture significantly, as the IOT LAN won't share network with your user LAN, you'll go via GW - thread+matter gives me optimism that IOT is being taken seriously and good progress is being made, and the standards look largely well thought out > 1) Thread uses 6LoWPAN so nodes are effectively IPv6 even though it doesn’t > show in the air. I believe I implied that strongly. Considering the 'forced marketing of IPv6' on the thread addressing scheme. Mind you, I don't think it is big deal, might even be positive, but I would have probably used inline PDU to decide roles. -- ++ytti
Re: 400G forwarding - how does it work?
On Wed, 10 Aug 2022 at 06:48, wrote: > How do you propose to fairly distribute market data feeds to the market if > not multicast? I expected your aggressive support for small packets was for fintech. An anecdote: one of the largest exchanges in the world used MX for multicast replication, which is btree or today utree replication, that is, each NPU gets replicated packet wildy different time, therefore receivers do. Which wasn't a problem for them, because they didn't know that's how it works and suffered no negative consequence of this, which arguably should have been a show stopper if we need receivers to receive it at a remotely similar time. Also, it is not in disagreement with my statement that it is not addressable market, because this marker can use products which do not do 64B wire-rate, for two separate reason either/and a) port is no where near congested b) the market is not cost sensitive, they buy the device with many WAN ports, and don't provision it so that they can't get 64B on each actually used ports. -- ++ytti
Re: IoT - The end of the internet
On Wed, 10 Aug 2022 at 07:54, Pascal Thubert (pthubert) via NANOG wrote: > On a more positive note, the IPv6 IoT can be seen as an experiment on how we > can scale the internet another order of magnitude or 2 without taking the > power or the spectrum consumption to the parallel levels. I think at least the next 20 years of IoT is thread (and wifi for high BW)+matter, and IoT devices won't have IP that is addressable even from the user LAN, you go via GW, none of which you configure. Some bits of if look unnecessarily forced perspective, like the addressing scheme, instead of inlining your role in PDU we use this cutesy addressing scheme looks like bit forced marketing of IPv6, doesn't seem necessary but also not really an important decision either way. Overall I think thread+matter are well designed and they make me quite optimistic of reasonable IoT outcomes. -- ++ytti
Re: 400G forwarding - how does it work?
On Mon, 8 Aug 2022 at 14:37, Masataka Ohta wrote: > With such an imaginary assumption, according to the end to end > principle, the customers (the ends) should use paced TCP instead I fully agree, unfortunately I do not control the whole problem domain, and the solutions available with partial control over the domain are less than elegant. -- ++ytti
Re: 400G forwarding - how does it work?
On Mon, 8 Aug 2022 at 14:02, Masataka Ohta wrote: > which is, unlike Yttinet, the reality. Yttinet has pesky customers who care about single TCP performance over long fat links, and observe poor performance with shallow buffers at the provider end. Yttinet is cost sensitive and does not want to do work, unless sufficiently motivated by paying customers. -- ++ytti
Re: 400G forwarding - how does it work?
On Mon, 8 Aug 2022 at 13:03, Masataka Ohta wrote: > If RTT is large, your 100G runs over several 100/400G > backbone links with many other traffic, which makes the > burst much slower than 10G. In Ohtanet, I presume. -- ++ytti
Re: 400G forwarding - how does it work?
On Sun, 7 Aug 2022 at 14:16, Masataka Ohta wrote: > When many TCPs are running, burst is averaged and traffic > is poisson. If you grow a window, and the sender sends the delta at 100G, and receiver is 10G, eventually you'll hit that 10G port at 100G rate. It's largely an edge problem, not a core problem. > People who use irrationally small packets will suffer, which is > not a problem for the rest of us. Quite, unfortunately, the problem I have exists in the Internet, the problem you're solving exists in Ohtanet, Ohtanet is much more civilized and allows for elegant solutions. The Internet just has a different shade of bad solution to pick from. -- ++ytti
Re: 400G forwarding - how does it work?
On Sun, 7 Aug 2022 at 17:58, wrote: > There are MANY real world use cases which require high throughput at 64 byte > packet size. Denying those use cases because they don’t fit your world view > is short sighted. The word of networking is not all I-Mix. Yes but it's not an addressable market. Such a market will just buy silly putty for 2bucks and modify the existing face-plate to do 64B. No one will ship that box for you, because the addressable market gladly will take more WAN ports as trade-off for large minimum mean packet size. -- ++ytti
Re: 400G forwarding - how does it work?
On Sun, 7 Aug 2022 at 12:16, Masataka Ohta wrote: > I'm afraid you imply too much buffer bloat only to cause > unnecessary and unpleasant delay. > > With 99% load M/M/1, 500 packets (750kB for 1500B MTU) of > buffer is enough to make packet drop probability less than > 1%. With 98% load, the probability is 0.0041%. I feel like I'll live to regret asking. Which congestion control algorithm are you thinking of? If we estimate BW and pace TCP window growth at estimated BW, we don't need much buffering at all. But Cubic and Reno will burst tcp window growth at sender rate, which may be much more than receiver rate, someone has to store that growth and pace it out at receiver rate, otherwise window won't grow, and receiver rate won't be achieved. So in an ideal scenario, no we don't need a lot of buffer, in practical situations today, yes we need quite a bit of buffer. Now add to this multiple logical interfaces, each having 4-8 queues, it adds up. Big buffers are bad 'kay is frankly simplistic and inaccurate. Also the shallow ingress buffers discussed in the thread are not delay buffers and the problem is complex because no device is marketable that can accept wire rate of minimum packet size, so what trade-offs do we carry, when we get bad traffic at wire rate at small packet size? We can't empty the ingress buffers fast enough, do we have physical memory for each port, do we share, how do we share? -- ++ytti