Re: Lossy cogent p2p experiences?
Saku Ytti wrote: And you will be wrong. Packet arriving out of order, will be considered previous packet lost by host, and host will signal need for resend. As I already quote the very old and fundamental paper on the E2E argument: End-To-End Arguments in System Design https://groups.csail.mit.edu/ana/Publications/PubPDFs/End-to-End%20Arguments%20in%20System%20Design.pdf : 3.4 Guaranteeing FIFO Message Delivery and as is described in rfc2001, Since TCP does not know whether a duplicate ACK is caused by a lost ^^^ segment or just a reordering of segments, it waits for a small number ^ of duplicate ACKs to be received. It is assumed that if there is just a reordering of the segments, there will be only one or two duplicate ACKs before the reordered segment is processed, which will then generate a new ACK. If three or more duplicate ACKs are ^^^ received in a row, it is a strong indication that a segment has been lost. - in networking, it is well known that "Guaranteeing FIFO Message Delivery" by the network is impossible because packets arriving out of order without packet losses is inevitable and is not uncommon. As such, slight reordering is *NOT* interpreted as previous packet loss. The allowed amount of reordering depends on TCP implementations and can be controlled by upgrading TCP. Masataka Ohta
Re: Lossy cogent p2p experiences?
Tom Beecher wrote: Well, not exactly the same thing. (But it's my mistake, I was referring to L3 balancing, not L2 interface stuff.) That should be a correct referring. load-balance per-packet will cause massive reordering, If buffering delay of ECM paths can not be controlled , yes. because it's random spray , caring about nothing except equal loading of the members. Equal loading on point to point links between two routers by (weighted) round robin means mostly same buffering delay, which won't cause massive reordering. Masataka Ohta
Re: Lossy cogent p2p experiences?
Benny Lyne Amorsen wrote: TCP looks quite different in 2023 than it did in 1998. It should handle packet reordering quite gracefully; Maybe and, even if it isn't, TCP may be modified. But that is not my primary point. ECMP, in general, means pathes consist of multiple routers and links. The links have various bandwidth and other traffic may be merged at multi access links or on routers. Then, it is hopeless for the load balancing points to control buffers of the routers in the pathes and delays caused by buffers, which makes per-packet load balancing hopeless. However, as I wrote to Mark Tinka; : If you have multiple parallel links over which many slow : TCP connections are running, which should be your assumption, with "multiple parallel links", which are single hop pathes, it is possible for the load balancing point to control amount of buffer occupancy of the links and delays caused by the buffers almost same, which should eliminate packet reordering within a flow, especially when " many slow TCP connections are running". And, simple round robin should be good enough for most of the cases (no lab testing at all, yet). A little more aggressive approach is to fully share a single buffer by all the parallel links. But as it is not compatible with router architecture today, I did not proposed the approach. Masataka Ohta
Re: Lossy cogent p2p experiences?
William Herrin wrote: I recognize what happens in the real world, not in the lab or text books. What's the difference between theory and practice? W.r.t. the fact that there are so many wrong theories and wrong practices, there is no difference. In theory, there is no difference. Especially because the real world includes labs and text books and, as such, all the theories including all the wrong ones exist in the real world. Masataka Ohta
Re: Lossy cogent p2p experiences?
Saku Ytti wrote: Fun fact about the real world, devices do not internally guarantee order. That is, even if you have identical latency links, 0 congestion, order is not guaranteed between packet1 coming from interfaceI1 and packet2 coming from interfaceI2, which packet first goes to interfaceE1 is unspecified. So, you lack fundamental knowledge on the E2E argument fully applicable to situations in the real world Internet. In the very basic paper on the E2E argument published in 1984: End-To-End Arguments in System Design https://groups.csail.mit.edu/ana/Publications/PubPDFs/End-to-End%20Arguments%20in%20System%20Design.pdf reordering is recognized both as the real and the theoretical world as: 3.4 Guaranteeing FIFO Message Delivery Ensuring that messages arrive at the receiver in the same order in which they are sent is another function usually assigned to the communication subsystem. which means, according to the paper, the "function" of reordering by network can not be complete or correct, and, unlike you, I'm fully aware of it. > This is because packets inside lookup engine can be sprayed to > multiple lookup engines, and order is lost even for packets coming > from interface1 exclusively, however after the lookup the order is > restored for _flow_, it is not restored between flows, so packets > coming from interface1 with random ports won't be same order going out > from interface2. That is a broken argument for how identification of flows by intelligent intermediate entities could work against the E2E argument and the reality initiated this thread. In the real world, according to the E2E argument, attempts to identify flows by intelligent intermediate entities is just harmful from the beginning, which is why flow driven architecture including that of MPLS is broken and hopeless. I really hope you understand the meaning of "intelligent intermediate entities" in the context of the E2E argument. Masataka Ohta
Re: Lossy cogent p2p experiences?
Mark Tinka wrote: Are you saying you thought a 100G Ethernet link actually consisting of 4 parallel 25G links, which is an example of "equal speed multi parallel point to point links", were relying on hashing? No... So, though you wrote: >> If you have multiple parallel links over which many slow >> TCP connections are running, which should be your assumption, >> the proper thing to do is to use the links with round robin >> fashion without hashing. Without buffer bloat, packet >> reordering probability within each TCP connection is >> negligible. > > So you mean, what... per-packet load balancing, in lieu of per-flow > load balancing? you now recognize that per-flow load balancing is not a very good idea. Good. you are saying that. See above to find my statement of "without hashing". Masataka Ohta
Re: Lossy cogent p2p experiences?
Nick Hilliard wrote: Are you saying you thought a 100G Ethernet link actually consisting of 4 parallel 25G links, which is an example of "equal speed multi parallel point to point links", were relying on hashing? this is an excellent example of what we're not talking about in this thread. Not "we", but "you". A 100G serdes is an unbuffered mechanism which includes a PLL, and this allows the style of clock/signal synchronisation required for the deserialised 4x25G lanes to be reserialised at the far end. This is one of the mechanisms used for packet / cell / bit spray, and it works really well. That's why I, instead of fully shared buffer, mentioned round robin as the proper solution for the case. This thread is talking about buffered transmission links on routers / switches on systems which provide no clocking synchronisation and not even a guarantee that the bearer circuits have comparable latencies. ECMP / hash based load balancing is a crock, no doubt about it; See the first three lines of this mail to find that I explicitly mentioned "equal speed multi parallel point to point links" as the context for round robin. As I already told you: : In theory, you can always fabricate unrealistic counter examples : against theories by ignoring essential assumptions of the theories. you are keep ignoring essential assumptions for no good purposes. Masataka Ohta
Re: Lossy cogent p2p experiences?
William Herrin wrote: Well it doesn't show up in long slow pipes because the low transmission speed spaces out the packets, Wrong. That is a phenomenon with slow access and fast backbone, which has nothing to do with this thread. If backbone is as slow as access, there can be no "space out" possible. and it doesn't show up in short fat pipes because there's not enough delay to cause the burstiness. Short pipe means speed of burst shows up continuously without interruption. > So I don't know how you figure it has nothing to do with > long fat pipes, That's your problem. Masataka Ohta
Re: Lossy cogent p2p experiences?
William Herrin wrote: No, not at all. First, though you explain slow start, it has nothing to do with long fat pipe. Long fat pipe problem is addressed by window scaling (and SACK). So, I've actually studied this in real-world conditions and TCP behaves exactly as I described in my previous email for exactly the reasons I explained. Yes of course, which is my point. Your problem is that your point of slow start has nothing to do with long fat pipe. > Window scaling and SACK makes it possible for TCP to grow to consume > the entire whole end-to-end pipe when the pipe is at least as large as > the originating interface and -empty- of other traffic. Totally wrong. Unless the pipe is long and fat, a plain TCP without window scaling or SACK is to grow to consume the entire whole end-to-end pipe when the pipe is at least as large as the originating interface and -empty- of other traffic. > Those > conditions are rarely found in the real world. It is usual that TCP consumes all the available bandwidth. Exceptions, not so rare in the real world, are plain TCPs over long fat pipes. Masataka Ohta
Re: Lossy cogent p2p experiences?
Mark Tinka wrote: ECMP, surely, is a too abstract concept to properly manage/operate simple situations with equal speed multi parallel point to point links. I must have been doing something wrong for the last 25 years. Are you saying you thought a 100G Ethernet link actually consisting of 4 parallel 25G links, which is an example of "equal speed multi parallel point to point links", were relying on hashing? Masataka Ohta
Re: Lossy cogent p2p experiences?
William Herrin wrote: Hi David, That sounds like normal TCP behavior over a long fat pipe. No, not at all. First, though you explain slow start, it has nothing to do with long fat pipe. Long fat pipe problem is addressed by window scaling (and SACK). As David Hubbard wrote: : I've got a non-rate-limited 10gig circuit and : The initial and recurring packet loss occurs on any flow of : more than ~140 Mbit. the problem is caused not by wire speed limitation of a "fat" pipe but by artificial policing at 140M. Masataka Ohta
Re: Lossy cogent p2p experiences?
Nick Hilliard wrote: In this case, "Without buffer bloat" is an essential assumption. I can see how this conclusion could potentially be reached in specific styles of lab configs, I'm not interested in how poorly you configure your lab. but the real world is more complicated and And, this thread was initiated because of unreasonable behavior apparently caused by stupid attempts for automatic flow detection followed by policing. That is the real world. Moreover, it has been well known both in theory and practice that flow driven architecture relying on automatic detection of flows does not scale and is no good, though MPLS relies on the broken flow driven architecture. > Generally in real world situations on the internet, packet reordering > will happen if you use round robin, and this will impact performance > for higher speed flows. That is my point already stated by me. You don't have to repeat it again. > It's true that per-hash load > balancing is a nuisance, but it works better in practice on larger > heterogeneous networks than RR. Here, you implicitly assume large number of slower speed flows against your statement of "higher speed flows". Masataka Ohta
Re: Lossy cogent p2p experiences?
Nick Hilliard wrote: the proper thing to do is to use the links with round robin fashion without hashing. Without buffer bloat, packet reordering probability within each TCP connection is negligible. Can you provide some real world data to back this position up? See, for example, the famous paper of "Sizing Router Buffers". With thousands of TCP connections at the backbone recognized by the paper, buffers with thousands of packets won't cause packet reordering. What you said reminds me of the old saying: in theory, there's no difference between theory and practice, but in practice there is. In theory, you can always fabricate unrealistic counter examples against theories by ignoring essential assumptions of the theories. In this case, "Without buffer bloat" is an essential assumption. Masataka Ohta
Re: Lossy cogent p2p experiences?
Mark Tinka wrote: So you mean, what... per-packet load balancing, in lieu of per-flow load balancing? Why, do you think, you can rely on existence of flows? So, if you internally have 10 parallel 1G circuits expecting perfect hashing over them, it is not "non-rate-limited 10gig". It is understood in the operator space that "rate limiting" generally refers to policing at the edge/access. And nothing beyond, of course. The core is always abstracted, and that is just capacity planning and management by the operator. ECMP, surely, is a too abstract concept to properly manage/operate simple situations with equal speed multi parallel point to point links. Masataka Ohta
Re: Lossy cogent p2p experiences?
Mark Tinka wrote: Wrong. It can be performed only at the edges by policing total incoming traffic without detecting flows. I am not talking about policing in the core, I am talking about detection in the core. I'm not talking about detection at all. Policing at the edge is pretty standard. You can police a 50Gbps EoMPLS flow coming in from a customer port in the edge. If you've got N x 10Gbps links in the core and the core is unable to detect that flow in depth to hash it across all those 10Gbps links, you can end up putting all or a good chunk of that 50Gbps of EoMPLS traffic into a single 10Gbps link in the core, despite all other 10Gbps links having ample capacity available. Relying on hash is a poor way to offer wide bandwidth. If you have multiple parallel links over which many slow TCP connections are running, which should be your assumption, the proper thing to do is to use the links with round robin fashion without hashing. Without buffer bloat, packet reordering probability within each TCP connection is negligible. Faster TCP may suffer from packet reordering during slight congestion, but the effect is like that of RED. Anyway, in this case, the situation is: :Moreover, as David Hubbard wrote: :> I've got a non-rate-limited 10gig circuit So, if you internally have 10 parallel 1G circuits expecting perfect hashing over them, it is not "non-rate-limited 10gig". Masataka Ohta
Re: Lossy cogent p2p experiences?
Mark Tinka wrote: it is the core's ability to balance the Layer 2 payload across multiple links effectively. Wrong. It can be performed only at the edges by policing total incoming traffic without detecting flows. While some vendors have implemented adaptive load balancing algorithms There is no such algorithms because, as I wrote: : 100 50Mbps flows are as harmful as 1 5Gbps flow. Masataka Ohta
Re: Lossy cogent p2p experiences?
Mark Tinka wrote: On 9/1/23 15:59, Mike Hammett wrote: I wouldn't call 50 megabit/s an elephant flow Fair point. Both of you are totally wrong, because the proper thing to do here is to police, if *ANY*, based on total traffic without detecting any flow. 100 50Mbps flows are as harmful as 1 5Gbps flow. Moreover, as David Hubbard wrote: > I’ve got a non-rate-limited 10gig circuit there is no point of policing. Detection of elephant flows were wrongly considered useful with flow driven architecture to automatically bypass L3 processing for the flows, when L3 processing capability were wrongly considered limited. Then, topology driven architecture of MPLS appeared, even though topology driven is flow driven (you can't put inner labels of MPLS without knowing detailed routing information at the destinations, which is hidden at the source through route aggregation, on demand after detecting flows.) Masataka Ohta
Re: NTP Sync Issue Across Tata (Europe)
Forrest Christian (List Account) wrote: There are lots of ways to improve a GPS-based NTP server. Better antenna positioning. Better GPS chipset. Paying attention to antenna patterns. Adding notch filters to the GPS feed. And so on. They are not a very meaningful improvement. But, in the end, there is nothing better than adding a second GPS source at a diverse location as far as improving reliability, provided that's an option based on timing needs. You keep ignoring DOS attacks. Though you wrote: : If I just want to deny you time, it gets cheaper and : easier. All I need is a 1.2 GHz oscillator coupled to an : antenna. There are units like this available for under $10, : delivered. These block GPS trackers on trucks and/or private : automobiles. Build your own and you can get a watt or two : to shove into a tiny antenna for not a lot more. Guaranteed : to Jam anything within a couple of blocks. you don't understand similar effectiveness by DOS. I can also attest that there is at least one overlap between time-nuts and NANOG See above. Masataka Ohta
Re: NTP Sync Issue Across Tata (Europe)
Mike Hammett wrote: " As such, the ultimate (a little expensive) solution is to have your own Rb clocks locally." Yeah, that's a reasonable course of action for most networks. For most data centers with time sensitive transactions, at least. *sigh* https://en.wikipedia.org/wiki/Atomic_clock Modern rubidium standard tubes last more than ten years, and can cost as little as US$50. https://www.ebay.com/sch/i.html?_nkw=rubidium Masataka Ohta
Re: NTP Sync Issue Across Tata (Europe)
John Gilmore wrote: Subsequent conversation has shown that you are both right here. Yes, many public NTP servers ARE using GPS-derived time. Yes, some public NTP servers ARE NOT using GPS-derived time. The point is whether : 2) Run a set of internal NTPd servers, and configure them to pull : time from all of your GPS-derived NTP servers, AND trusted public : NTP servers is a proper recommendation against total GPS failure or not. At one point I proposed that some big NTP server pools be segregated by names, to distinguish between GPS-derived time and national-standard derived time. For example, two domain names could be e.g.: fromnist.pool.tick.tock fromgps.pool.tick.tock A problem is that a public NTP server, which is not necessarily stratum 1, may depends on both. Another problem is that domain name management is not so trustworthy. An NTP server once relying on NIST may now relying on GPS but an administrator of the server may not change its domain name. "trusted public NTP servers" is not a trustworthy or verifiable concept. PS: When we say "GPS", do we really mean any GNSS (global navigation satellite system)? There are now four such systems that have global coverage, plus regionals. While they attempt to coordinate their time-bases and reference-frames, they are using different hardware and systems, and are under different administration, so there are some differences in the clock values returned by each GNSS. These differences and discontinuties have ranged up to 100ns in normal operation, and higher amounts in the past. See: Because of the relativity, 100ns of time difference between locations more than 30m apart can not be a problem for correct transaction processing or ordering of events. Masataka Ohta
Re: NTP Sync Issue Across Tata (Europe)
Forrest Christian (List Account) wrote: The NIST time servers do NOT get their time from GPS. No, of course. I know it very well. However, as I wrote: > But, additionally relying on remote servers (including those > provided by NIST) is subject to DOS attacks. the (mostly wired) Internet is just as secure/insecure as wireless GPS, over which NIST servers can not be reliably accessed. Just as many people who only know wired Internet blindly think wireless channels are secure, you can not recognize various attack modes for the mostly wired internet. These are physical realizations of UTC... that is, a phase-aligned 1PPS pulse and a high precision clock signal. These realizations are used to directly drive the NIST NTP servers at each location. GPS is not involved. UTC??? You are totally wrong. Just as many other people, you are purposelessly seeking meaningless accuracy assuming inertial frame of UTC, which is *NOT* required for correct transactions Because of relativity, we can assume *ANY* inertial frame for simultaneity, which means simultaneity requirement is not so strong. Moreover, information cone allows even less simultaneity for correct transactions. These two timescales are within a few ns of each other, also verified with GNSS common view technology, so one can consider them the same for most purposes. You don't understand simultaneity of theory of relativity at all. 10ns of time difference can not be physically or logically meaningful between locations with 3m distance. Note that a similar process is used to derive UTC(NICT) in Japan. Depending on inertial system, time in US and JP can be different a lot more than 1ms, which means timing error between mainland US and Japan can be a lot more than 1ms. As far as a rubidium clock goes, I'd much rather see it disciplined regularly to a GPS time source, but that comes from the fact that I like my 1PPS to be within a microsecond or so of UTC due to the precision I need in the lab. As I already wrote: : For millisecond accuracy, Rb clocks do not need any synchronization : for centuries. : Rb clocks on GPS are a lot more frequently synchronized, because : a lot more accuracy is required for positioning (10ns of timing : error means 3m of positioning error). you didn't understand the required accuracy for the Internet operators, which is your problem. Note that some of the high end appliances I'm referring to just use GPS over days and weeks to discipline a precision oscillator (sometimes rubidium) which is essentially an automatic calibrating version of what you're proposing. That has nothing to do with the a lot more broad required accuracy required by the theory of special relativity for proper causality. Masataka Ohta
Re: NTP Sync Issue Across Tata (Europe)
Forrest Christian (List Account) wrote: The recommendation tends to be the following: 1) Run your GPS-derived NTP appliances, but DO NOT point end-user clients at it. 2) Run a set of internal NTPd servers, and configure them to pull time from all of your GPS-derived NTP servers, AND trusted public NTP servers 3) Point your clients at the internal NTPd servers. That is not a very good recommendation. See below. At some point, using publicly available NTP sources is redundant unless one wants to mitigate away the risks behind failure of the GPS system itself. Your assumption that public NTP servers were not GPS-derived NTP servers is just wrong. What I'm advocating against is the seemingly common practice to go buy an off-the-shelf lower-cost GPS-NTP appliance (under $1K or so), stick an antenna in a window or maybe on the rooftop, and point all your devices at that device. Relying on a local expensive GPS appliance does not improve security so much and is the worst thing to do. But, additionally relying on remote servers (including those provided by NIST) is subject to DOS attacks. As such, the ultimate (a little expensive) solution is to have your own Rb clocks locally. Masataka Ohta
Re: NTP Sync Issue Across Tata (Europe)
John Gilmore wrote: I was also speaking specifically about installing GPS antennas in viable places, not using a facility-provided GPS or NTP service. Am I confused? Getting the time over a multi-gigabit Internet from a national time standard agency such as NIST (or your local country's equivalent) should produce far better accuracy and stability than relying on locally received GPS signals. When the (wrong) question is "how to build a stratum 1 server?", that can not be an answer. GPS uses very weak radio signals which are regularly spoofed by all sorts of bad actors: The question, seemingly, is not "how to build a secure stratum 1 server?". BTW, the proper question should be "how to obtain secure time?". Masataka Ohta
Re: NTP Sync Issue Across Tata (Europe)
Forrest Christian (List Account) wrote: Depends on how synchronized you need to be. Sure. But, we should be assuming NTP is mostly enough. A rubidium oscillator or Chip Scale Atomic Clock is in the price range you quote. However, these can drift enough that you should occasionally synchronize with a reference time source. This is to ensure continued millisecond accuracy. Of course it all depends on how much drift you'll tolerate, and if you're OK with being within a second, then a rubidium might be ok. For millisecond accuracy, Rb clocks do not need any synchronization for centuries. Rb clocks on GPS are a lot more frequently synchronized, because a lot more accuracy is required for positioning (10ns of timing error means 3m of positioning error). Masataka Ohta
Re: NTP Sync Issue Across Tata (Europe)
Mel Beckman wrote: > To be useful, any atomic clocks you operate must be synchronized > to a Stratum Zero time source, such as GPS. Only initially. Precise time is crucial to a variety of economic activities around the world. Communication systems, electrical power grids, and financial networks all rely on precision timing for synchronization and operational efficiency. The free availability of GPS time has enabled cost savings for companies that depend on precise time and has led to significant advances in capability. FYI, time difference between two points is not noticeable, that is, does not affect correctness of any distributed algorithm, if the difference is below the communication delay between the points, which means rough synchronization by NTP is good enough. That is an information theoretic version of relativity of simultaneity: https://en.wikipedia.org/wiki/Relativity_of_simultaneity For information theoretic simultaneity, you can consider, instead of light cone, information cone. Masataka Ohta
Re: NTP Sync Issue Across Tata (Europe)
Forrest Christian (List Account) wrote: In the middle tends to be a more moderate solution which involves a mix of time transmission methods from a variety of geographically and/or network diverse sources. Taking time from the public trusted ntp servers and adding lower cost GPS receivers at diverse points in your network seems like a good compromise in the middle. That way, only coordinated attacks will be successful. Instead, just rely on atomic clocks operated by you. They are not so expensive (several thousand dollars) and should be accurate enough without adjustment for hundreds of years. There can be no coordinated attacks. They may be remotely accessed through secured NTP. Masataka Ohta
Re: New addresses for b.root-servers.net
Mark Andrews wrote: >> If an end and another end directly share a secret >> key without involving untrustworthy trusted third >> parties, the ends are secure end to end. >> An untrustworthy but light weight and inexpensive (or free) >> PKI may worth its price and may be useful to make IP address >> based security a little better. Which you can do with DNSSEC but the key management will be enormous. Which part of my message, are you responding? First part? Though you might have forgotten, my initial proposal of DNSSEC actually allows to use both public and shared keys. Having hierarchical KDCs (Key Distribution Centers), instead of hierarchical CAs, key management is not enormous. Shared key is better than public key, because revocation is instantaneous. Instead, root KDCs receive large amount of requests. But, situation is similar to DNS root servers today and is manageable. Kerberos relies on KDCs. However, the shared keys are shared by ends and intermediate systems of KDCs, which is not end to end security. Masataka Ohta
Re: New addresses for b.root-servers.net
Matt Corallo wrote: As PKI, including DNSSEC, is subject to MitM attacks, is not cryptographically secure, does not provide end to end security and is not actually workable, why do you bother? It sounds like you think nothing is workable, we simply cannot make anything secure If an end and another end directly share a secret key without involving untrustworthy trusted third parties, the ends are secure end to end. - if we should give up on WebPKI (and all its faults) and DNSSEC (and all its faults) and RPKI (and all its faults), what do we have left? An untrustworthy but light weight and inexpensive (or free) PKI may worth its price and may be useful to make IP address based security a little better. Masataka Ohta
Re: New addresses for b.root-servers.net
Matt Corallo wrote: So, let's recognize ISPs as trusted authorities and we are reasonably safe without excessive cost to support DNSSEC with all the untrustworthy hypes of HSMs and four-eyes principle. I think this list probably has a few things to say about "ISPs as trusted authorities" I'm afraid you miss the point. My point is that trusted third parties of CAs including DNSSEC providers are at least as untrustworthy as ISPs. - is everyone on this list already announcing and enforcing an exact ASPA policy (or BGPSec or so) and ensuring the full path for each packet they send is secure and robust to ensure it gets to its proper destination? I'm afraid that is a hype as bad as HSMs and four-eyes principle. Somehow I don't think this model is workable, As PKI, including DNSSEC, is subject to MitM attacks, is not cryptographically secure, does not provide end to end security and is not actually workable, why do you bother? Masataka Ohta
Re: New addresses for b.root-servers.net
Matt Corallo wrote: Note that diginotar was advertised to be operated with HSMs and four-eyes principle, which means both of them were proven to be untrustworthy marketing hypes. Even more reason to do DNSSEC stapling! See hypes of HSMs and four-eyes from DNSSEC operators. This is totally unrelated to the question at hand. There wasn't a question about whether a user relying on trusted authorities can maybe be whacked by said trusted authorities (though there's been a ton of work in this space, most notably requiring CT these days), So, let's recognize ISPs as trusted authorities and we are reasonably safe without excessive cost to support DNSSEC with all the untrustworthy hypes of HSMs and four-eyes principle. it was purely about whether we can rely on pure "I sent a packet to IP X, did it get to IP X", which *is* solved by DNSSEC. That's overkill. See above for the proper solution. Masataka Ohta
Re: New addresses for b.root-servers.net
Matt Corallo wrote: Both in theory and practice, DNSSEC is not secure end to end Indeed, but (a) there's active work in the IETF to change that (DNSSEC stapling to TLS certs) TLS? What? As was demonstrated by diginotar, PKI is NOT cryptographically secure and vulnerable to MitM attacks on intermediate intelligent entities of CAs. Note that diginotar was advertised to be operated with HSMs and four-eyes principle, which means both of them were proven to be untrustworthy marketing hypes. and (b) that wasn't the point - the above post said "It’s not like you can really trust your packets going to B _today_ are going to and from the real B (or Bs)." which is exactly what DNSSEC protects against! As long as root key rollover is performed in time and intermediate zones such as ccTLDs are not compromised, maybe, which is why it is not very useful or secure. The following description https://en.wikipedia.org/wiki/DigiNotar Secondly, they issued certificates for the Dutch government's PKIoverheid ("PKIgovernment") program. This issuance was via two intermediate certificates, each of which chained up to one of the two "Staat der Nederlanden" root CAs. National and local Dutch authorities and organisations offering services for the government who want to use certificates for secure internet communication can request such a certificate. Some of the most-used electronic services offered by Dutch governments used certificates from DigiNotar. Examples were the authentication infrastructure DigiD and the central car-registration organisation Netherlands Vehicle Authority [nl] (RDW). makes it clear that entities operating ccTLDs may also be compromised. If its not useful, please describe a mechanism by which an average recursive resolver can be protected against someone hijacking C root on Hurricane Electric (which doesn't otherwise have the announcement at all, last I heard) and responding with bogus data? As DNSSEC capable resolvers are not very secure, you don't have to make plain resolvers so secure. For example, root key rollover is as easy/difficult as updating IP addresses for b.root-servers.net. Then maybe read the rest of this thread, cause lots of folks pointed out issues with *just* updating the IP and not bothering to give it some time to settle :) In this thread, I'm the first to have pointed out that old IP addresses of root servers must be reserved (for 50 years). Masataka Ohta
Re: New addresses for b.root-servers.net
Matt Corallo wrote: That's great in theory, and folks should be using DNSSEC [1], Wrong. Both in theory and practice, DNSSEC is not secure end to end and is not very useful. For example, root key rollover is as easy/difficult as updating IP addresses for b.root-servers.net. Masataka Ohta
Re: New addresses for b.root-servers.net
Mark Andrews wrote: The commitment to maintain service for 1 year after the new LACNIC addresses are switched in to the root.hints from IANA does not mean that this is a cutoff date and that we intend to turn off service on the older addresses after a year. We currently have no plans to do so for the foreseeable future. In fact, the possibility has not even been suggested or discussed at all. Such total lack of advance and public discussion and preparation on a substantial change on critical infrastructure is a serious problem, I'm afraid. I'm curious about what more discussion you want to happen than has happen in the past. Over the last 20 years there have been lots of address changes. If such changes are performed without proper transition plans even after DNS became critical infrastructure (when?), they also are serious problems. None of them have caused operational problems. Thank you for a devil's proof. That you haven't noticed any problem does not mean there actually was no problem. Masataka Ohta
Re: New addresses for b.root-servers.net
Robert Story wrote: The commitment to maintain service for 1 year after the new LACNIC addresses are switched in to the root.hints from IANA does not mean that this is a cutoff date and that we intend to turn off service on the older addresses after a year. We currently have no plans to do so for the foreseeable future. In fact, the possibility has not even been suggested or discussed at all. Such total lack of advance and public discussion and preparation on a substantial change on critical infrastructure is a serious problem, I'm afraid. Masataka Ohta
Re: New addresses for b.root-servers.net
Mark Andrews wrote: It announces itself to an address which remains under the control of USC/ISI the current and on going root server operator for b.root-servers.net. So apart from leaking that the root hints have not been updated I don’t see a big risk here. The address block, as has been stated, is in a reserved range for critical infrastructure and, I suspect, has special controls placed on it by ARIN regarding its re-use should USC/ISI ever release it / cease to be a root-server operator. I would hope that ARIN and all the RIRs have the list of current and old root-server addresses and that any block that are being transferred that have one of these addresses are flagged for special consideration. I'm afraid that "old root-server addresses" will not be considered for "critical infrastructure" at least by those people who can't see operational difficulties to change the addresses. Masataka Ohta
Re: New addresses for b.root-servers.net
William Herrin wrote: Certainly we would appreciate other opinions about what the right length of a change-over time would be, especially from the operational communities that will be most impacted by this change. Considering the possibility that, in a long run, remaining 12 sets (4 and 6) of IP addresses will also change, the proper length should be determined assuming all the 13 sets of addresses will change (not necessarily at the same time). A server generation is about 3 years before it's obsolete and is generally replaced. I suggest making the old address operable for two generations (6 years) and black-holed for another generation (3 more years). You are assuming managed servers under Moore's law. But, after Moore, a server generation will be longer. Moreover, a linux-based black box, vendor of which has disappeared, may be used for 10 or 20 years without being managed. Then, another important period is the period to reserve the IP addresses once used for root servers. If the addresses are reused by some bad guys, systems depending on them can easily be compromised. For the reservation period, 50 years of reservation period of ISO3166 country codes seems to be reasonable. And, if the addresses are reserved, there is no reason not to keep using the addresses as alternative addresses of active root name servers. Masataka Ohta PS First of all, it is a bad idea to change the addresses of root servers. For political ceremony, it is enough to transfer address blocks to LACNIC.
Re: Spectrum (legacy TWC) Infrastructure - Contact Off List
Mike Hammett wrote: In no way is what I said wrong. Incumbent operators (coax or copper pairs) screw things up constantly (whether technically or in the business side of things), prompting a sea of independent operators to overbuild them (or fill in where they haven't). See below: : https://en.wikipedia.org/wiki/Incumbent_local_exchange_carrier : Various regional independents also held incumbent monopolies : in their respective regions. to know many independent operators are incumbent operators. I don't mean non-RBOC ILECs. I mean WISPs, regional fiber operators, I'm afraid "non-RBOC" is a synonym of "independent". Anyway, ILECs including both RBOCs and thousands of non-RBOC ones should be the regional fiber operators, as I already wrote: : Many ILECs enjoying regional monopoly should be 100+ years old: : https://en.wikipedia.org/wiki/Independent_telephone_company : By 1903 while the Bell system had 1,278,000 subscribers on : 1,514 main exchanges, the independents, excluding non-profit : rural cooperatives, claimed about 2 million subscribers on : 6,150 exchanges.[1] : The size ranged from small mom and pop companies run by a : husband and wife team, to large independent companies, : many of which should now be PON operators still enjoying regional : monopoly. > Bob from down the street that retired and built a fiber company to > serve his small town. I mean companies with less than 10,000 > customers and are younger than 20 years. There are literally > thousands of them in the US and they're only getting more formidable > in the face of lousy incumbents. See above: : The size ranged from small mom and pop companies run by a : husband and wife team Thousands of Bobs from down the street retired and built telephone companies, now recognized as non-RBOC ILECs, to serve their small towns 100+ years ago. Newly coming Bobs can survive as regional fiber operators only in regions not served by ILECs as PON providers. Masataka Ohta
Re: Smaller than a /24 for BGP?
Michael Bolton via NANOG wrote: > We would benefit from advertising /25's but it hurt's more > than it helps. That is, IPv6 really hurts. I'm in the alarm industry and they still haven't started adopting IPv6. If we allow /25 subnets, some industries will never change. In a sense, we have to “force” them to change. FYI, WRT routing table bloat, IPv6 having a lot longer minimum allocation prefix than /24 (which forbid operators cut IPv6 prefixes longer than /24), that is, a lot beyond direct SRAM look up, and, worse, needing longer TCAM word size (64 or 128 bits?) than IPv4, is, in a not so long run, a lot lot worse than IPv4. Masataka Ohta
Re: Spectrum (legacy TWC) Infrastructure - Contact Off List
Mike Hammett wrote: Where did you think that condensation was going to get you in this conversation? I was involved in this thread because of your totally wrong statement of: : I selfishly hope they don't because that's where independent : operators will succeed. ;-) First of all, "Spectrum (legacy TWC)" is not a small company. Moreover, as is stated in wikipedia that: >https://en.wikipedia.org/wiki/Incumbent_local_exchange_carrier >Various regional independents also held incumbent monopolies >in their respective regions. many independent operators are keep succeeding for 100+ years not because they unreasonably cut maintenance cost but because they have archived regional monopoly. Masataka Ohta
Re: Spectrum (legacy TWC) Infrastructure - Contact Off List
Mike Hammett wrote: Except there are literally thousands of independent ISPs in the US, > many 10+ years old that aren't likely to be going anywhere and > they are moving to constructing their own wireline. Many ILECs enjoying regional monopoly should be 100+ years old: https://en.wikipedia.org/wiki/Incumbent_local_exchange_carrier Various regional independents also held incumbent monopolies in their respective regions. https://en.wikipedia.org/wiki/Independent_telephone_company By 1903 while the Bell system had 1,278,000 subscribers on 1,514 main exchanges, the independents, excluding non-profit rural cooperatives, claimed about 2 million subscribers on 6,150 exchanges.[1] The size ranged from small mom and pop companies run by a husband and wife team, to large independent companies, many of which should now be PON operators still enjoying regional monopoly. So? Masataka Ohta
Re: Spectrum (legacy TWC) Infrastructure - Contact Off List
Mike Hammett wrote: Maybe it's not as hard as everyone says? That's exactly the way of thinking by investors during bubble. It should be noted that corona virus not only caused depression against which QE policy was chosen but also forced people stay at home. As such, investing on internet access seemed promising and some money was also invested on high speed inexpensive satellite internet, even though satellite internet must be low speed or expensive. Masataka Ohta
Re: Spectrum (legacy TWC) Infrastructure - Contact Off List
Mike Hammett wrote: Yet the independents are doing it anyway. Petit bubble caused by quantitative easing, perhaps. Masataka Ohta - Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP - Original Message - From: "Eric Kuhnke" To: "Forrest Christian (List Account)" Cc: "nanog list" Sent: Thursday, February 2, 2023 6:46:01 PM Subject: Re: Spectrum (legacy TWC) Infrastructure - Contact Off List It might look low cost until you look at a post-1980s suburb in the USA or Canada where 100% of the utilities are underground. There may be no fiber or duct routes. Just old coax used for DOCSIS3 owned/run by the local cable incumbent and copper POTS wiring belonging to the ILEC. The cost to retrofit such a neighborhood and reach every house with a fiber architecture can be quite high in construction and labor. On Thu, Feb 2, 2023 at 9:14 AM Forrest Christian (List Account) < li...@packetflux.com > wrote: The cost to build physical layer in much of the suburban and somewhat rural US is low enough anymore that lots of smaller, independent, ISPs are overbuilding the incumbent with fiber and taking a big chunk of their customer base because they are local and care. And making money while doing it. On Thu, Feb 2, 2023, 8:22 AM Masataka Ohta < mo...@necom830.hpcl.titech.ac.jp > wrote: Mike Hammett wrote: I selfishly hope they don't because that's where independent operators will succeed. ;-) Because of natural regional monopoly at physical layer (cabling cost for a certain region is same between competitors but their revenues are proportional to their regional market shares), they can't succeed unless the physical layer is regulated to be unbundled, which is hard with PON. But, in US where regional telephone network has been operated by, unlike Europe/Japan, a private company enjoying natural regional monopoly, economic situation today should be no worse than that at that time. Masataka Ohta
Re: Spectrum (legacy TWC) Infrastructure - Contact Off List
Mike Hammett wrote: I selfishly hope they don't because that's where independent operators will succeed. ;-) Because of natural regional monopoly at physical layer (cabling cost for a certain region is same between competitors but their revenues are proportional to their regional market shares), they can't succeed unless the physical layer is regulated to be unbundled, which is hard with PON. But, in US where regional telephone network has been operated by, unlike Europe/Japan, a private company enjoying natural regional monopoly, economic situation today should be no worse than that at that time. Masataka Ohta
Re: Smaller than a /24 for BGP?
I wrote: So, another way of multihoming critically depends on replacing the layer-4 protocols with something that doesn't intermingle the IP address with the connection identifier. Wrong. As is stated in my ID that: On the other hand, with end to end multihoming, multihoming is supported by transport (TCP) or application layer (UDP etc.) of end systems and does not introduce any problem in the network and works as long as there is some connectivity between the end systems. end to end multihoming may be supported at the application layer by trying all the available addresses, which is what DNS and SMTP are actually doing. To my surprise, I've found that the current (2017) happy eyeball already does so as is stated in rfc8305: : Appendix A. Differences from RFC 6555 :o how to handle multiple addresses from each address family So, we are ready for end to end multihoming for which multiple PA addresses are enough and /24 is not necessary. Though not all the application protocols may support it, DNS, SMTP and HTTP(S) should be good enough as a starter. It should be noted that happy eyeball strongly depends on DNS, even though someone might think DNS not guaranteed. Your web server is multihomed if you assign it PA addresses assigned from multiple ISPs and register the addresses to DNS. You don't have to manage BGP. TCP modification is just an option useful for long lasting TCP connections. A major obstacle for it, as most of you can see, is that there are people who can't distinguish IP address changes by mobility and by multihoming. Such people will keep reinventing MPTCP. Masataka Ohta
Re: Smaller than a /24 for BGP?
William Herrin wrote: The easiest way for applications know all the addresses of the destination is to use DNS. With DNS reverse, followed by forward, lookup, applications can get a list of all the addresses of the destination from an address of the destination. The DNS provides no such guarantee. Guarantee for what? Remember that we have been enjoying secure confirmation that certain IP address belongs to certain hostname by DNS reverse look up without any guarantee. > Moreover, the DNS does guarantee > its information to be correct until the TTL expires, making it > unsuitable for communicating address information which may change > sooner. I'm afraid you know very little about DNS operation. See rfc1034: If a change can be anticipated, the TTL can be reduced prior to the change to minimize inconsistency during the change, and then increased back to its former value following the change. which is the way to operate DNS when host addresses are changing, for example, by multihoming configuration changes. In addition, when a dual homed site with end to end multihoming changes one of its ISP, it is a good idea to offer all the three addresses by DNS during the change. Make before break. With TCP, applications must be able to pass multiple addresses to transport layer (e.g. BSD socket). which implies addresses are supplied from applications by DNS look up. Which is a bit of hand-waving since the protocol can't do anything with that information regardless of whether you expand the API to provide it. Read my draft, which explains how TCP should be modified. Masataka Ohta
Re: Smaller than a /24 for BGP?
William Herrin wrote: Use Multipath TCP https://datatracker.ietf.org/group/mptcp/documents/ Doesn't work well. Has security problems (mismatch between reported IP addresses used and actual addresses in use) and it can't reacquire the opposing endpoint if an address is lost before a new one is communicated. It merely means MPTCP is wrongly architected. Dynamically changing IP addresses is for mobility (if you don't mind location privacy), not for multihoming. The following way in my ID: The easiest way for applications know all the addresses of the destination is to use DNS. With DNS reverse, followed by forward, lookup, applications can get a list of all the addresses of the destination from an address of the destination. does not have any such problem and should be as safe as happy eyeball for two or more IPv4/IPv6 addresses. As for (long lasting) TCP, my ID says: With TCP, applications must be able to pass multiple addresses to transport layer (e.g. BSD socket). which implies addresses are supplied from applications by DNS look up. Though a client may, at the time TCP connection is established, send a list of its IP addresses to a server, which may have some security complications, it is simpler to let the server just rely on DNS: With DNS reverse, followed by forward, lookup, applications can get a list of all the addresses of the destination from an address of the destination. As I pointed out in the previous mail, DNS already supports end to end multihoming at the application layer to try all the addresses of name servers, on which other applications can safely rely. Masataka Ohta
Re: Smaller than a /24 for BGP?
William Herrin wrote: That multihomed sites are relying on the entire Internet for computation of the best ways to reach them is not healthy way of multihoming. This was studied in the IRTF RRG about a decade ago. There aren't any > other workable ways of multihoming compatible with the TCP protocol, > not even in theory. A decade? The problem and the solution was thoroughly studied by me long ago and the first ID was available already in 2000. The 5th version is here: https://datatracker.ietf.org/doc/html/draft-ohta-e2e-multihoming-05.txt I've found that you can access the first one by "Compare versions" feature of the web page. So, another way of multihoming critically depends on replacing the layer-4 protocols with something that doesn't intermingle the IP address with the connection identifier. Wrong. As is stated in my ID that: On the other hand, with end to end multihoming, multihoming is supported by transport (TCP) or application layer (UDP etc.) of end systems and does not introduce any problem in the network and works as long as there is some connectivity between the end systems. end to end multihoming may be supported at the application layer by trying all the available addresses, which is what DNS and SMTP are actually doing. TCP modification is just an option useful for long lasting TCP connections. Masataka Ohta
Re: Smaller than a /24 for BGP?
Lars Prehn wrote: Accepting and globally redistributing all hyper-specifics increases the routing table size by >100K routes (according to what route collectors see). That figure is guaranteed minimum but there should be 10 or 100 times more desire for hyper-specifics suppressed by the established (since early days with class C) practice. That multihomed sites are relying on the entire Internet for computation of the best ways to reach them is not healthy way of multihoming. Masataka Ohta
Re: Smaller than a /24 for BGP?
Jon Lewis wrote: Yeah, but in another couple years we'll breach the 1M mark and everybody will have fresh routers with lots of TCAM for a while. If that were the only issue, it'd be a matter of timing the change well. Everybody will need them. Not all will get (or be able to get) them. Wrong. For /24, direct look up of 16M entry SRAM is enough. Updating 64K entries for /8 should not be a problem, though you may also have 64K entry SRAM for /16. In addition, for small number of local smaller-than-/24 prefixes, another lookup of radix tree by a smaller SRAM (with 64K entry, we can subdivide 256 /24 into /32) should be possible. But, there is no need for costly and power wasting TCAM. So far, I ignore IPv6, of course. Masataka Ohta
Re: Starlink routing
Jorge Amodio wrote: You, seemingly, do not have much knowledge on UUNET. Of course I don't :-) atina agomar(DAILY), antar(DAILY), biotlp(DAILY), cab(HOURLY), cedro(EVENING), cenep(DAILY), cneaint(DAILY), cnea(EVENING), cnielf(DAILY), colimpo(DAILY), confein(DAILY), criba(EVENING), curbre(EVENING), dacfyb(DEMAND), dcfcen(DEMAND), ecord(DEMAND), enace(DAILY), epfrn(EVENING), fb1(DAILY), fcys(DAILY), fecic(DAILY), gagcha(EVENING), getinfo(DAILY), hasar(DAILY), iaros(DAILY), intiar(DAILY), invapba(DAILY/2), invapqq(DAILY/2), isoft(DAILY), itcgi(DAILY), labdig(DAILY), lasbe(DAILY), licmdp(EVENING), lis(EVENING), ludo(DAILY), maap(DAILY), meyosp(DAILY), minerva(DAILY), minjus(DAILY), mlearn(DAILY), occam(EVENING), oceanar(DAILY), onba(DAILY), opsarg(DEMAND), pnud009(EVENING), sadio(DAILY), saravia(DAILY), sdinam(DAILY), secyt(DEMAND), spok(DAILY), sykes(DAILY), tandil(DAILY), tsgfred(WEEKLY), ulatar(EVENING), unisel(EVENING), uunet(DEMAND) So, you now remember that UUCP links were scheduled. Masataka Ohta
Re: Starlink routing
Jorge Amodio wrote: This gets sort of merged with DTN (Delay/Disruption Tolerant Networking.) I have been saying that DTN is a reinvention of UUNET. Hmmm, nope not even close. You, seemingly, do not have much knowledge on UUNET. As such, it should be noted that, in UUNET, availability of phone links between computers was scheduled. You must be talking about UCCP, UUNET was a company. Why, do you think, UUNET as a company named so? It was an organization to offer connectivity to UseNET but some used the word to just mean UseNET. See, for example: https://docs.oracle.com/cd/E19957-01/805-4368/gavzo/index.html UUNET (n.) A network that carries electronic newsgroups, aggregates of many electronic messages that are sorted by topic, to thousands of users on hundreds of workstations worldwide. Availability of links was declared not scheduled, Declared in map files used by pathalias? But that's not my point. UUCP links were not permanent but scheduled. See, for example: https://www.ibm.com/docs/en/zos/2.4.0?topic=systems-schedule-periodic-uucp-transfers-cron Schedule periodic UUCP transfers with cron so pathalias was able to figure the best UUCP path from a given UUCP node. Such initial attempts were not so elegant or scalable. UUCP networks as DTN were brought to perfection through integration with the Internet relying on DNS MX RRs. Masataka Ohta
Re: Starlink routing
Jorge Amodio wrote: We are in the process of starting a new Working Group at IETF, Timer Variant Routing or TVR. https://datatracker.ietf.org/group/tvr/about/ Some of the uses cases are for space applications where you can predict or schedule the availability and capacity of "links" (radio, optical) Even though the current routing protocols have no difficulty to treat unpredictable/unscheduled changes on links? This gets sort of merged with DTN (Delay/Disruption Tolerant Networking.) I have been saying that DTN is a reinvention of UUNET. As such, it should be noted that, in UUNET, availability of phone links between computers was scheduled. Masataka Ohta
Re: Starlink routing
Matthew Petach wrote: Unlike most terrestrial links, the distances between satellites are not fixed, and thus the latency between nodes is variable, making the concept of "Shortest Path First" calculation a much more dynamic and challenging one to keep current, as the latency along a path may be constantly changing as the satellite nodes move relative to each other, without any link state actually changing to trigger a new SPF calculation. As LEO satellites should be leafs to a network of MEO satellites, 1 minutes of update period between MEO satellites should be enough, which is not so dynamic. Physical layer of MEO communications must (to save power and to prevent broadcast storms) be point to point with known orbital elements and link layer should be some point to point protocol perhaps with ARQ. As only meaningful metric between satellites is physical distance, 16bit metric of OSPF should be enough. The most annoying part is to have multiple ground stations, which, as usual, makes the MEO network DFZ with more than 1M routing table entries. Masataka Ohta
Re: A straightforward transition plan (was: Re: V6 still not supported)
Pascal Thubert (pthubert) wrote: Hi, Solutions must first avoid broadcast as much as possible, because there's also the cost of it. Though I'm not saying all the broadcast must be repeated, if you think moderate broadcast is costly, just say, CATENET. I remember old days when entire network of CERN with thousands of hosts was managed to be a single Ethernet several years after we learned dividing network by routers can prevent various problems caused by broadcast. It was, at least partly, because operating multi-protocol routers is painful. Unlike most sites at that time, non IP protocols such as DECnet was popular at CERN. As IPv4 became dominant, problems went away. Then you want zerotrust, ND is so easy to attack from inside and even outside. This is RFC 8928. As many people are saying zerotrust relying on PKI, which blindly trust CAs as TPPs (trusted third parties), which are confirmed-to-be-untrustworthy third parties by Diginotar, zerotrust is not very meaningful beyond marketing hype. Anyway, relying on link broadcast implies that the link is trusted to some extent, which is not ND specific. Ethernet is enterprise networks is largely virtualized. We cannot offer fast and reliable broadcast services on a worldwide overlay. Unlike CERN in the past, today, I can see no point to have large Ethernet, though some operators may be hyped to deploy expensive service of telco for nothing. Add to that the desire by the device to own more and more addresses. What? How can it happen with IPv4? You want a contract between that the host and the network that the host owns an address and is reachable at that address. Like any contract, that must be a negotiation. ND is not like that. RFC 8505 is exactly that. Ignoring poor IPv6, I'm afraid it a property of not ARP but DHCP. It may be more constructive to work for proxy ARP suitable for Wifi, which may be enforced by Wifi alliance. An RFC may be published if Wifi industry request IETF to do so. This is effectively done already for ND. I agree with you but my point is that it is more constructive for ARP. I guess the design can be easily retrofitted to ARP. ND is really designed exactly as ARP. The differences were for the show, the real steps that should have been made were not. But now with RFC 8505 we have a modern solution. The problem is no more on the standard side, it is adoption. People will not move if it does not hurt enough. And they can bear a lot. But, for adoption, some formal document, not necessarily a (standard track) rfc, is necessary. Masataka Ohta
Re: A straightforward transition plan (was: Re: V6 still not supported)
Pascal Thubert (pthubert) wrote: Hi, For that issue at least there was some effort. Though ATM and FR appear to be long gone, the problem got even worse with pseudo wires / overlays and wireless. It was tackled in the IoT community 10+ years ago and we ended up with RFC 8505 and 8928. This is implemented in LoWPAN devices and deployed by millions. Allowing IPv6 subnets of thousands on constrained radios. When I mentioned a problem for the first time in IPng or IPv6 (I can't find any archive, are there any?) list, Christian Huitema mentioned it could be solved by ND over NBMA but the problem is not NB but broadcast of Wifi is unreliable. As such, the solutions should be based on a fact that repeated unreliable broadcast is reliable. I spent a bit of time explaining the architecture issue (in mild terms) and solutions in https://datatracker.ietf.org/doc/html/draft-thubert-6man-ipv6-over-wireless-12. Though you wrote in the draft: Reducing the speed at the physical (PHY) layer for broadcast transmissions can increase the reliability longer packets mean more collision (with hidden terminals) probability and less reliability. A link broadcast domain must be same for all the members of the link and should be defined as set of terminals which can receive broadcast from a central station (or, stations) with certain probability, which is why Wifi broadcast is relayed by a central station. So far we failed to get those RFCs implemented on the major stacks for WiFi or Ethernet. Ethernet? Even though its broadcast is reliable? Though Wifi bridged by Ethernet may have its own problems, they are Wifi-specific problems. There’s a new thread at IETF 6MAN just now on adopting just the draft above - not even the solution. It is facing the same old opposition from the same few and a lot of silence. You can't expect people still insisting on IPv6 as is much. My suggestion is still to fix IPv6 as opposed to drop it, because I don’t see that we have another bullet to fire after that one. For that particular issue of fixing ND, new comments and support at the 6MAN on the draft above may help. It may be more constructive to work for proxy ARP suitable for Wifi, which may be enforced by Wifi alliance. An RFC may be published if Wifi industry request IETF to do so. Masataka Ohta
Re: A straightforward transition plan (was: Re: V6 still not supported)
Randy Bush wrote: three of the promises of ipng which ipv6 did not deliver o compatibility/transition, o security, and o routing & renumbering You miss a promise of o ND over ATM/NBMA which caused IPv6 lack a notion of link broadcast. Masataka Ohta
Re: SDN Internet Router (sir)
Mike Hammett wrote: " With plain IP routers?" Yes, or, well, relatively plain, depending on the implementation. As completely plain routers have no difficulty to treat a default route, it is a waste of money and effort to try to have not so plain routers to do so regardless of whether the routers are SDN ones or not. Masataka Ohta
Re: SDN Internet Router (sir)
Matthew Walster wrote: No... It's action based. You can send it a different route, you can replicate it, you can drop it, you can mutate it... Replication is a poor alternative for multicast. You conveniently ignore things like IDS, port mirroring, things like that. Wrong. Instead, you conveniently ignore that such forwarding requires a link between an SDN router and a monitoring device have the same or larger MTU than an incoming link of the SDN router, which means the router and the monitoring device must be tightly coupled effectively to be a single device. Sometimes, packet loss possibility between them often requires they must actually be the same device. No. There are far more actions than for prioritisation. Just for fun? I'm afraid I already mentioned so. What if you want to make sure certain classes of traffic do not flow over a link, because it is unencrypted and/or sensitive, but you're happy to send as much TLS wrapped data as you like? You are wrongly assuming TLS wrapped packets can be identified packet by packet, as I wrote: >> Unless pattern is as simple as having certain port number, >> stateful filtering almost always needs all packets including >> those matching expected pattern, I'm afraid. So? What if you want to sample some flows in an ERSPAN like mechanism? See above for MTU issues. What if you want to urgently drop a set of flows based on a known DDOS signature? Urgently? Even though a DDOS signature is known in advance? Why? Unless pattern is as simple as having certain port number, stateful filtering almost always needs all packets including those matching expected pattern, I'm afraid. Or a certain set of IP addresses. Policy based routing. That's even simpler than port number to be treated by having or not having proper routing table entries. If default route is acceptable, just rely on it along with 50 non default routes with plain IP routers. That's what OP is suggesting. With plain IP routers? That's what SIR is. Classifying prefixes by traffic and only keeping the ones with the highest volume of traffic, discarding the rest, relying on the default route to infill. Given the connectionless nature of the Internet, route change based on volume of traffic averaged over certain period of time is rather harmful than useful. Masataka Ohta
Re: SDN Internet Router (sir)
Matthew Walster wrote: No... It's action based. You can send it a different route, you can replicate it, you can drop it, you can mutate it... Replication is a poor alternative for multicast. For other actions, why, do you think, they are performed? Just for fun? Or to differentiate treatment of some packets, that is, prioritization? You can send it to a different destination for stateful filtering when it doesn't match an expected pattern! Unless pattern is as simple as having certain port number, stateful filtering almost always needs all packets including those matching expected pattern, I'm afraid. SDN is not just QoS routing, please stop saying that. See above. Nope, not true. Had 1000 routes, only 100 available in FIB. So you filter to the top 50 doing traffic and default route the rest of the traffic. Less entries. If default route is acceptable, just rely on it along with 50 non default routes with plain IP routers. Masataka Ohta
Re: SDN Internet Router (sir)
Matthew Walster wrote: SDN does not imply QoS routing, As long as the shortest path is comfortable enough, no, it does not have to. it's just one aspect of it. Some use it for classifying guest traffic etc. If special path is provided for guest or otherwise prioritized traffic, that's QoS routing. Anyway, prioritization needs more, not less, routing table entries. Masataka Ohta
Re: SDN Internet Router (sir)
Christopher Morrow wrote: Some of the reasoning behind 'i need/want to do SDN things' is 'low fib device' sort of reasonings. What? SDN is a poor alternative for those who can't construct a network with fully automated QoS guarantee. Even with SDN, QoS guarantee implies QoS routing requiring dedicated routing table entry for each flow, which will not shrink but bloat routing tables regardless of whether you call it FIB or not. Masataka Ohta
Re: Large RTT or Why doesn't my ping traffic get discarded?
Jerry Cloe wrote: Because there is no standard for discarding "old" traffic, only discard is for packets that hop too many times. There is, however, a standard for decrementing TTL by 1 if a packet sits on a device for more than 1000ms, and of course we all know what happens when TTL > hits zero. Based on that, your packet could have floated around for > another 53 seconds. Totally wrong as the standard says TTL MUST be decremented at least by one on every hop and TTL MAY NOT be decremented further as is specified by the standard of IPv4 router requirements (rfc1812): When a router forwards a packet, it MUST reduce the TTL by at least one. If it holds a packet for more than one second, it MAY decrement the TTL by one for each second. As for IPv6, Unlike IPv4, IPv6 nodes are not required to enforce maximum packet lifetime. That is the reason the IPv4 "Time to Live" field was renamed "Hop Limit" in IPv6. In practice, very few, if any, IPv4 implementations conform to the requirement that they limit packet lifetime, so this is not a change in practice. Masataka Ohta
Re: Alternative Re: ipv4/25s and above Re: 202211232221.AYC
Vasilenko Eduard via NANOG wrote: Big OTTs installed caches all over the world. Big OTTs support IPv6. As large network operational cost to support IPv6 is negligible for OTTs spending a lot more money at the application layer, they may. Hosts prefer IPv6. No. As many retail ISPs can not afford operational cost of IPv6, they are IPv4 only, which makes hosts served by them IPv4 only. Possible exceptions are ISPs offering price (not necessarily value) added network services in noncompetitive environment. But, end users suffer from the added price. Masataka Ohta
Re: Jon Postel Re: 202210301538.AYC
William Allen Simpson wrote: Something similar happened with IPv6. Cisco favored a design where only they had the hardware mechanism for high speed forwarding. So we're stuck with 128-bit addresses and separate ASNs. Really? Given that high speed forwarding at that time meant TCAM, difference between 128 bit address should mean merely twice more TCAM capacity than 64 bit address. I think the primary motivation for 128 bit was to somehow encode NSAP addresses into IPng ones as is exemplified by RFC1888. Though the motivation does not make any engineering sense, IPv6 neither. Masataka Ohta
Re: 400G forwarding - how does it work?
sro...@ronan-online.com wrote: How do you propose to fairly distribute market data feeds to the > market if not multicast? Unicast with randomized order. To minimize latency, bloated buffer should be avoided and TCP with configured small (initial) RTT should be used. Masataka Ohta
Re: 400G forwarding - how does it work?
Dave Taht wrote: But as fair queuing does not scale at all, they disappeared long ago. What do you mean by FQ, exactly? Fair queuing is "fair queuing" not some queuing idea which is, by someone, considered "fair". See, for example, https://en.wikipedia.org/wiki/Fair_queuing "5 tuple FQ" is scaling today Fair queuing does not scale w.r.t. the number of queues. Masataka Ohta
Re: 400G forwarding - how does it work?
Matthew Huff wrote: Also, for data center traffic, especially real-time market data and other UDP multicast traffic, micro-bursting is one of the biggest issues especially as you scale out your backbone. Are you saying you rely on multicast even though loss of a packet means loss of large amount of money? Is it a reason why you use large buffer to eliminate possibilities of packet dropping caused by buffer overflow but not by other reasons? Masataka Ohta
Re: 400G forwarding - how does it work?
Saku Ytti wrote: With such an imaginary assumption, according to the end to end principle, the customers (the ends) should use paced TCP instead I fully agree, unfortunately I do not control the whole problem domain, and the solutions available with partial control over the domain are less than elegant. OK. But, you should be aware that, with bloated buffer, all the customers sharing the buffer will suffer from delay. Masataka Ohta
Re: 400G forwarding - how does it work?
Saku Ytti wrote: which is, unlike Yttinet, the reality. Yttinet has pesky customers who care about single TCP performance over long fat links, and observe poor performance with shallow buffers at the provider end. With such an imaginary assumption, according to the end to end principle, the customers (the ends) should use paced TCP instead of paying unnecessarily bloated amount of money to intelligent intermediate entities of ISPs using expensive routers with bloated buffers. Yttinet is cost sensitive and does not want to do work, unless sufficiently motivated by paying customers. I understand that if customers follow the end to end principle, revenue of "intelligent" ISPs will be reduced. Masataka Ohta
Re: 400G forwarding - how does it work?
Saku Ytti wrote: If RTT is large, your 100G runs over several 100/400G backbone links with many other traffic, which makes the burst much slower than 10G. In Ohtanet, I presume. which is, unlike Yttinet, the reality. Masataka Ohta
Re: 400G forwarding - how does it work?
Saku Ytti wrote: When many TCPs are running, burst is averaged and traffic is poisson. If you grow a window, and the sender sends the delta at 100G, and receiver is 10G, eventually you'll hit that 10G port at 100G rate. Wrong. If it's local communicaiton where RTT is small, the window is not so large smaller than unbloated router buffer. If RTT is large, your 100G runs over several 100/400G backbone links with many other traffic, which makes the burst much slower than 10G. Masataka Ohta
Re: 400G forwarding - how does it work?
dip wrote: I have seen cases where traffic behaves more like self-similar. That could happen if there are small number of TCP streams or multiple TCPs are synchronized through interactions on bloated buffers, which is one reason why we should avoid bloated buffers. Do you have any good pointers where the research has been done that today's internet traffic can be modeled accurately by Poisson? For as many papers supporting Poisson, I have seen as many papers saying it's not Poisson. https://www.icir.org/vern/papers/poisson.TON.pdf It is based on observations between 1989 and 1994 when Internet backbone was slow and the number of users was small, which means the number of TCP streams running in parallel is small. For example, merely 124M packets for 36 days of observation [LBL-1], is slower than 500kbps, which can be filled up by a single TCP connection even by computers at that time and is not a meaningful measurement. https://www.cs.wustl.edu/~jain/cse567-06/ftp/traffic_models2/#sec1.2 It merely states that some use non Poisson traffic models. Masataka Ohta
Re: 400G forwarding - how does it work?
sro...@ronan-online.com wrote: There are MANY real world use cases which require high throughput at 64 byte packet size. Certainly, there were imaginary world use cases which require to guarantee so high throughput of 64kbps with 48B payload size for which 20(40)B IP header was obviously painful and 5B header was used. At that time, poor fair queuing was assumed, which requires small packet size for short delay. But as fair queuing does not scale at all, they disappeared long ago. > Denying those use cases because they don’t fit > your world view is short sighted. That could have been a valid argument 20 years ago. Masataka Ohta
Re: 400G forwarding - how does it work?
Saku Ytti wrote: I'm afraid you imply too much buffer bloat only to cause unnecessary and unpleasant delay. With 99% load M/M/1, 500 packets (750kB for 1500B MTU) of buffer is enough to make packet drop probability less than 1%. With 98% load, the probability is 0.0041%. I feel like I'll live to regret asking. Which congestion control algorithm are you thinking of? I'm not assuming LAN environment, for which paced TCP may be desirable (if bandwidth requirement is tight, which is unlikely in LAN). But Cubic and Reno will burst tcp window growth at sender rate, which may be much more than receiver rate, someone has to store that growth and pace it out at receiver rate, otherwise window won't grow, and receiver rate won't be achieved. When many TCPs are running, burst is averaged and traffic is poisson. So in an ideal scenario, no we don't need a lot of buffer, in practical situations today, yes we need quite a bit of buffer. That is an old theory known to be invalid (Ethernet switches with small buffer is enough for IXes) and theoretically denied by: Sizing router buffers https://dl.acm.org/doi/10.1145/1030194.1015499 after which paced TCP was developed for unimportant exceptional cases of LAN. > Now add to this multiple logical interfaces, each having 4-8 queues, > it adds up. Having so may queues requires sorting of queues to properly prioritize them, which costs a lot of computation (and performance loss) for no benefit and is a bad idea. > Also the shallow ingress buffers discussed in the thread are not delay > buffers and the problem is complex because no device is marketable > that can accept wire rate of minimum packet size, so what trade-offs > do we carry, when we get bad traffic at wire rate at small packet > size? We can't empty the ingress buffers fast enough, do we have > physical memory for each port, do we share, how do we share? People who use irrationally small packets will suffer, which is not a problem for the rest of us. Masataka Ohta
Re: 400G forwarding - how does it work?
ljwob...@gmail.com wrote: Buffer designs are *really* hard in modern high speed chips, and there are always lots and lots of tradeoffs. The "ideal" answer is an extremely large block of memory that ALL of the forwarding/queueing elements have fair/equal access to... but this physically looks more or less like a full mesh between the memory/buffering subsystem and all the forwarding engines, which becomes really unwieldly (expensive!) from a design standpoint. The amount of memory you can practically put on the main NPU die is on the order of 20-200 **mega** bytes, where a single stack of HBM memory comes in at 4GB -- it's literally 100x the size. I'm afraid you imply too much buffer bloat only to cause unnecessary and unpleasant delay. With 99% load M/M/1, 500 packets (750kB for 1500B MTU) of buffer is enough to make packet drop probability less than 1%. With 98% load, the probability is 0.0041%. But, there are so many router engineers who think, with bloated buffer, packet drop probability can be zero, which is wrong. For example, https://www.broadcom.com/products/ethernet-connectivity/switching/stratadnx/bcm88690 Jericho2 delivers a complete set of advanced features for the most demanding carrier, campus and cloud environments. The device supports low power, high bandwidth HBM packet memory offering up to 160X more traffic buffering compared with on-chip memory, enabling zero-packet-loss in heavily congested networks. Masataka Ohta
Re: 400G forwarding - how does it work?
James Bensley wrote: The BCM16K documentation suggests that it uses TCAM for exact matching (e.g.,for ACLs) in something called the "Database Array" (with 2M 40b entries?), and SRAM for LPM (e.g., IP lookups) in something called the "User Data Array" (with 16M 32b entries?). Which documentation? According to: https://docs.broadcom.com/docs/16000-DS1-PUB figure 1 and related explanations: Database records 40b: 2048k/1024k. Table width configurable as 80/160/320/480/640 bits. User Data Array for associated data, width configurable as 32/64/128/256 bits. means that header extracted by 88690 is analyzed by 16K finally resulting in 40b (a lot shorter than IPv6 addresses, still may be enough for IPv6 backbone to identify sites) information by "database" lookup, which is, obviously by CAM because 40b is painful for SRAM, converted to "32/64/128/256 bits data". 1 second / 164473684 packets = 1 packet every 6.08 nanoseconds, which is within the access time of TCAM and SRAM As high speed TCAM and SRAM should be pipelined, cycle time, which matters, is shorter than access time. Finally, it should be pointed out that most, if not all, performance figures such as MIPS and Flops are merely guaranteed not to be exceeded. In this case, if so deep packet inspections by lengthy header for some complicated routing schemes or to satisfy NSA requirements are required, communication speed between 88690 and 16K will be the limitation factor for PPS resulting in a lot less than maximum possible PPS. Masataka Ohta
Re: Upstream bandwidth usage
Michael Thomas wrote: If it's so tiny, why shape it aggressively? Why shouldn't I be able to burst to whatever is available at the moment? I would think most users would be happy with that. Seemingly, to distinguish inexpensive economy and expensive business class services. Masataka Ohta
Re: [EXTERNAL] FCC proposes higher speed goals (100/20 Mbps) for USF providers
David Conrad wrote: I'm with Jason. If even a small percentage of the "representative use cases" that came out of the ITU's Network 2030 Focus Group or other similar efforts comes to pass, bandwidth demand will continue to grow. As Moore's law has ended, it means users must pay a lot, which is favorable to telcos consisting ITU. Masataka Ohta
Re: FCC proposes higher speed goals (100/20 Mbps) for USF providers
Dave Taht wrote: "New Zealand is approximately 268,838 sq km, while United States is approximately 9,833,517 sq km, making United States 3,558% larger than New Zealand. Meanwhile, the population of New Zealand is ~4.9 million people (327.7 million more people live in United States)." That NZ has less population density than US means the last mile problem is more severe in NZ than US, though actual severity depends on detailed population distribution. Masataka Ohta
Re: FCC proposes higher speed goals (100/20 Mbps) for USF providers
Dave Taht wrote: Looking back 10 years, I was saying the same things, only then I felt it was 25Mbit circa mike belshe's paper. So real bandwidth requirements only doubling every decade might be a new equation to think about... Required resolution of pictures is bounded by resolution of our eyes, which is fixed. For TVs at homes, IMHO, baseband 2k should be enough, quality of which may be better than highly compressed 4k. Masataka Ohta
Re: FCC proposes higher speed goals (100/20 Mbps) for USF providers
Livingood, Jason via NANOG wrote: That shows up as increased user demand (usage), which means that the CAGR will rise and get factored into future year projections. You should recognize that Moore's law has ended. Masataka Ohta
Re: FCC proposes higher speed goals (100/20 Mbps) for USF providers
Owen DeLong wrote: USF is great for rural, but it has turned medium density and suburban areas into connectivity wastelands. Carrier & cable lobbying organizations say that free market competition by multiple providers provide adequate service in those areas. That's simply untrue, because of natural regional monopoly. Lobbyists lie? Say it isn’t so. You seem somehow surprised by this. No, not at all. So? Masataka Ohta
Re: FCC proposes higher speed goals (100/20 Mbps) for USF providers
Sean Donelan wrote: USF is great for rural, but it has turned medium density and suburban areas into connectivity wastelands. Carrier & cable lobbying organizations say that free market competition by multiple providers provide adequate service in those areas. That's simply untrue, because of natural regional monopoly. Competitive providers must invest same amount of money to cover a certain area by their cables but their revenues are proportional to their local market shares, which means only the provider with the largest share can survive. In urban areas where local backbone costs, which are proportional to market shares, exceeds cabling costs, there may be some competitions. But, the natural regional monopoly is still possible. Still, providers relying on older technologies will be competitively replaced by other providers using newer technologies, which is why DSL providers have been disappearing and cable providers will disappear. In a long run, only fiber providers will survive. The problem, then, is that, with PON, there is no local competition even if fibers are unbundled, because, providers with smaller share can find smaller number of subscribers around PON splitters, as, usually, fiber cost between the splitters and stations are same, which is why fiber providers prefer PON over SS. But, such preference is deadly for rural areas where only one or two homes exist around PON splitters, in which case, SS is less costly. Masataka Ohta
Re: Question re prevention of enumeration with DNSSEC (NSEC3, etc.)
John McCormac wrote: There are various ways, such as crawling the web, to enumerate domain names. That is not an efficient method. Not a problem for large companies or botnet. So, only small legal players suffer from hiding zone information. For example, large companies such as google can obtain enumerated list of all the current most active domains in the world, which can, then, be used to access whois. What Google might obtain would be a list of domain names with websites. The problem is that the web usage rate for TLDs varies with some ccTLDs seeing a web usage rate of over 40% (40% of domain names having developed websites) but some of the new gTLDs have web usage rates below 10%. Some of the ccTLDs have high web usage rates. You misunderstand my statement. Domain names not offering HTTP service can also be collected by web crawling. Hiding DNS zone information from public is beneficial to powerful entities such as google. In some respects, yes. Google can also use gmail to collect domain names used by sent or received e-mails. But there is a problem with that because of all the FUD about websites linking to "bad" websites that had been pushed in the media a few years ago. Is your concern privacy of "bad" websites? Another factor that is often missed is the renewal rate of domain names. That's not a problem related to enumeration of domain names. A lot of personal data such as e-mail addresses, phone numbers and even postal addresses have been removed from gTLD records because of the fear of GDPR. As I have been saying, the problem, *if+ *any*, is whois. So? The zones change. New domain names are registered and domain names are deleted. For many TLDs, the old WHOIS model of registrant name, e-mail and phone number no longer exists. And there are also WHOIS privacy services which have obscured ownership. As I wrote: : Moreover, because making ownership information of lands and : domain names publicly available promotes public well fair : and domain name owners approve publication of such : information in advance, there shouldn't be any concern : of privacy breach forbidden by local law of DE. that is not a healthy movement. Masataka Ohta
Re: Question re prevention of enumeration with DNSSEC (NSEC3, etc.)
As I wrote: But some spam actors deliberately compared zone file editions to single out additions, and then harass the owners of newly registered domains, both by e-mail and phone. If that is a serious concern, stop whois. There are various ways, such as crawling the web, to enumerate domain names. For example, large companies such as google can obtain enumerated list of all the current most active domains in the world, which can, then, be used to access whois. Hiding DNS zone information from public is beneficial to powerful entities such as google. As such A wrench can be a tool or a weapon, depending on how one uses it. The wrench is whois. However, something like trust banks may be able to hide privacy of domain name owners if such entities can be regulated properly for people who want some privacy. Masataka Ohta
Re: Question re prevention of enumeration with DNSSEC (NSEC3, etc.)
Rubens Kuhl wrote: But some spam actors deliberately compared zone file editions to single out additions, and then harass the owners of newly registered domains, both by e-mail and phone. If that is a serious concern, stop whois. A wrench can be a tool or a weapon, depending on how one uses it. The wrench is whois. Masataka Ohta
Re: Question re prevention of enumeration with DNSSEC (NSEC3, etc.)
Rubens Kuhl wrote: Is there any case law where someone has asserted a database right for a DNS zone? German law has something to goes somewhat near it, although closer to a mandate rather than a right: https://www.denic.de/en/faqs/faqs-for-domain-holders/#code-154 Similar regulation also exists in Japan. However... Considering that, with a detailed map of a town, one can enumerate addresses of all the houses in the town and owner information of the houses can be obtained from land registry office operated by government (I know complications in US on such registry), such regulation is not very meaningful. As privacy breach is caused by not enumeration but registry, there is little, if any, reason to avoid enumeration. Moreover, because making ownership information of lands and domain names publicly available promotes public well fair and domain name owners approve publication of such information in advance, there shouldn't be any concern of privacy breach forbidden by local law of DE. Masataka Ohta
Re: Court orders for blocking of streaming services
Philip Loenneker wrote: I have a tongue-in-cheek question... if the documentation provided by the plaintiff to the court, and/or the court documentation including the final ruling, includes the specific URLs to the websites to block, does that constitute transmitting links to illegal content? Doing something authorized by law in a way specified by the law can not be illegal. So? Masataka Ohta
Re: Court orders for blocking of streaming services
Mel Beckman wrote: You are confusing "illegal" and "guilty". The first party publicly transmitting illegal contents or links to the contents are guilty, which means the links themselves are illegal. But, DMCA makes some third party providers providing illegal contents or illegal links guilty only if some condition of DMCA is met. Same for civil liability. You're incorrect about the DMCA when you say "DMCA treats 'linking' to illegal contents as illegal as the contents themselves". See above. You > must knowingly link to works that clearly infringe somebody's copyright. Same is true if you are transmitting not links but the contents themselves. > A link to the Israel.TV websites themselves is not to a specific > work, so it's not covered by DMCA. So first, as long as you don't > know that a work is infringing someone's copyright, You totally miss the point of the order, though I wrote: : As the order is to those "having actual knowledge of this Default : Judgment and Permanent Injunction Order", Masataka Ohta
Re: Court orders for blocking of streaming services
Mel Beckman wrote: But the phrase "or linking to the domain" Includes hundreds, possibly thousands, of unwitting certain parties: DMCA treats "linking" to illegal contents as illegal as the contents themselves, which is why I wrote: : In addition, it seems to me that name server operators "having : actual knowledge" that some domain names are used for copyright : infringements are not be protected by DMCA. > I think I am simply right. So, you know nothing about DMCA. Read it. > The lawsuit is contradictory and overreaching. As for transit ISPs enjoying a safe harbor of DMCA, yes, as I already said so. Masataka Ohta
Re: Court orders for blocking of streaming services
Mel Beckman wrote: The plaintiff’s won a default judgement, because the defendants didn’t show up in court. But they could not have shown up in court, because they were only listed as "John Does" in the lawsuit. Thus no defendant could have "actual knowledge" that they were sued, As the defendants are those identified as "d/b/a Israel.tv, as the owners and operators of the website, service and/or applications (the “Website”) located at or linking to the domain www.Israel.TV;", you are simply wrong. > For the court to then > approve sanctions against innocent non-parties to the suit is a > logical contradiction. Wrong. Those knowingly actively cooperating with the defendants are not innocent at all though DMCA makes some passive cooperation innocent. Masataka Ohta
Re: Court orders for blocking of streaming services
John Levine wrote: I agree that the rest of the language demanding that every ISP, hosting provider, credit union, bank, and presumably nail salon and coin laundry in the US stop serving the defendants is nuts. As the order is to those "having actual knowledge of this Default Judgment and Permanent Injunction Order", according to DMCA, that should be a reasonable order for hosting providers of illegal contents but not for transit ISPs. In addition, it seems to me that name server operators "having actual knowledge" that some domain names are used for copyright infringements are not be protected by DMCA. Masataka Ohta
Re: how networking happens in Hawaii
William Herrin wrote: Countries whose law derives from English Common law have a concept of adverse possession. Details vary but mainly if you can hold the land for 20 years against the owner's wishes then it's your land. Conceptually it applies to nations just as surely as individuals. Such interpretation of English Common law is against Zionism promoted by British government and should be wrong. Masataka Ohta
Re: Any sign of supply chain returning to normal?
Randy Bush wrote: i suspect that, in years of overabundant late stage capitalism, folk went nuts. and we are now paying for it. one of my fave quotes I thought of it in a slightly different way--like a space that we were exploring and, in the early days, we figured out this consistent path through the space: IP, TCP, and so on. What's been happening over the last few years is that the IETF is filling the rest of the space with every alternative approach, not necessarily any better. Every possible alternative is now being written down. And it's not useful. -- Jon Postel And Steve Deering agreed with Jon saying "Exactly". That's so funny because the statement was published in Oct. 1998 and the first rfc on IPv6 was published in Dec. 1995. Masataka Ohta
Re: V4 via V6 and IGP routing protocols
Mark Tinka wrote: MPLS with nested labels, which is claimed to scale because nesting represents route hierarchy, just does not scale because source hosts are required to provide nested labels, which means the source hosts have the current most routing table at destinations, which requires flat routing without hierarchy or on demand, that is, flow driven, look up of detailed routing tables of destinations at a distance. This detail is limited to PE devices (ingress/egress). As it requires >> flat routing without hierarchy or on >> demand, that is, flow driven, look up of detailed routing tables >> of destinations at a distance. MPLS is just broken. You don't need to carry a BGP table in the P devices (core), as only label swapping is required. So? Fair point, it is a little heavy for an edge box, Requiring >> flat routing without hierarchy means it is fatally heavy for intermediate boxes. >> or on >> demand, that is, flow driven, look up of detailed routing tables >> of destinations at a distance. means it is fatally heavy for edge boxes. > In the end, having a flat L2 domain was just simpler. That's totally against the CATENET model. Why, do you think, NHRP was abandoned? > we've never ran into an issue carrying > thousands of IS-IS IPv4/IPv6 routes this way. Thousands of? Today with so powerful CPUs, that is a small network. So? Masataka Ohta
Re: V4 via V6 and IGP routing protocols
Dave Taht wrote: Are MPLS or SR too heavy a bat? MPLS was not an option at the time. It might become one. MPLS with nested labels, which is claimed to scale because nesting represents route hierarchy, just does not scale because source hosts are required to provide nested labels, which means the source hosts have the current most routing table at destinations, which requires flat routing without hierarchy or on demand, that is, flow driven, look up of detailed routing tables of destinations at a distance. Masataka Ohta
Re: V4 via V6 and IGP routing protocols
Pascal Thubert (pthubert) wrote: Hello Ohta-san Hi, it is hopeless. If you look at it, LS - as OSPF and ISIS use it - My team developed our own. Hierarchical QoS Link Information Protocol (HQLIP) https://datatracker.ietf.org/doc/draft-ohta-ric-hqlip/ which support 256 levels of hierarchy with hierarchical thinning of link information, including available QoS. depends on the fact that all nodes get the same information and react the same way. Isn't that hopeless too? If you insist on OSPF or ISIS, yes. Clearly, the above limits LS applicability to stable links and topologies, and powered devices. This is discussed at length in https://datatracker.ietf.org/doc/html/draft-ietf-roll-protocols-survey. OLSRv2 pushes the model to its limit, don't drive it any faster. You don't have to say "low power" to notice OSPF not so good. With just a quick look at OSPF, I noticed OSPF effectively using link local reliable multicast hopeless (as a basis to construct hierarchical QoS routing system). Worse, minimum hello interval of OSPF is too long for quick recovery (low power is not required, for example, at backbone), which is why additional complication to have an optical layer were considered useful. RIFT (https://datatracker.ietf.org/doc/draft-ietf-rift-rift/) shows that evolution outside that box is possible. OK. RIFT is "for Clos and fat-tree network topologies" of data centers. > RIFT develops > anisotropic routing concepts (arguably from RPL) and couples DV and > LS to get the best of both worlds. It usually results in the worst of both, I'm afraid. But none of the above allow an source router to decide once and for all what it will get. As there are not so many alternative routes with Clos and fat-tree network topologies of data centers, pure source routing combined with some transport protocol to simultaneously try multiple routes should be the best solution, IMO, because avoiding link saturation is an important goal. When you drive and the street is blocked, you can U-turn around the block and rapidly restore the shortest path. The protocols above will not do that; this is why technologies such as LFA were needed on top. But then the redundancy is an add-on as opposed to a native feature of the protocol. What if network is not very large and minimum hello interval of OSPF is 1ms? Thinking outside that box would then mean: - To your end-to-end principle point, let the source decide the packet treatment (including path) based on packet needs To apply the E2E argument for LS routing, all the routers are *dumb* intermediate systems to quickly flood LS. At the same time, all the routers are ends to initiate flooding of local LS, to receive flooded LS and to compute the best route to destinations in a way consistent with other routers because they share same flooded LS except during short transition periods. Masataka Ohta
Re: V4 via V6 and IGP routing protocols
Dave Taht wrote: Periodically I still do some work on routing protocols. 12? years ago I had kind of given up on ospf and isis, and picked the babel protocol as an IGP for meshy networks because I felt link-state had gone as far as it could and somehow unifying BGP DV with an IGP that was also DV (distance vector) seemed like a path forward. As DV depends other routers to choose the best path from several candidates updated asynchronously, which means it is against the E2E principle and decisions by other routers are delayed a lot to wait all the candidates are updated, it is hopeless. OTOH, LS only allows routers distribute the current most link states instantaneously and let end systems of individual routers compute the best path, LS converges quickly. BGP is DV because there is no way to describe policies of various domains and, even if it were possible, most, if not all, domains do not want to publish their policies in full detail. My question for this list is basically, has anyone noticed or fiddled with babel? No. Masataka Ohta
Re: V6 still not supported
Matthew Petach wrote: Hi Masataka, Hi, One quick question. If every host is granted a range of public port numbers on the static stateful NAT device, what happens when two customers need access to the same port number? I mean static outgoing port number, but your concern should be well known incoming port number, which is an issue not specific to "static stateful" NAT. Because there's no way in a DNS NS entry to specify a port number, if I need to run a DNS server behind this static NAT, I *have* to be given port 53 in my range; there's no other way to make DNS work. And SMTP, as is explained in draft-ohta-e2e-nat-00: A server port number different from well known ones may be specified through mechanisms to specify an address of the server, which is the case of URLs. However, port numbers for DNS and SMTP are, in general, implicitly assumed by DNS and are not changeable. Or, a NAT gateway may receive packets to certain ports and behave as an application gateway to end hosts, if request messages to the server contains information, such as domain names, which is the case with DNS, SMTP and HTTP, to demultiplex the request messages to end hosts. However, for an ISP operating the NAT gateway, it may be easier to operate independent servers at default port for DNS, SMTP, HTTP and other applications for their customers than operating application relays. Though the draft is for E2ENAT, situation is same for any kind of NAT. This means that if I have two customers that each need to run a DNS server, I have to put them on separate static NAT boxes--because they can't both get access to port 53. See above for other possibilities. This limits the effectiveness of a stateful static NAT box For incoming port, static stateful NAT is no worse than dynamic NAT. Both may be configured to map certain incoming ports to certain local ports and addresses statically or dynamically with, say, UPnP. The point of static stateful NAT is for outgoing port that it does not require logging. tl;dr -- "if only we'd thought of putting a port number field in the NS records in DNS back in 1983..." And, MX. As named has "-p" option, I think some people were already aware of uselessness of the option in 1983. But, putting a port number field at that time is overkill. Masataka Ohta
Re: V6 still not supported
Pascal Thubert (pthubert) via NANOG wrote: - Stateful NATs the size of the Internet not doable, Stateful NATs are necessary only near leaf edges of ISPs for hundreds of customers or, may be, a little more than that and is doable. If you make the stateful NATs static, that is, each private address has a statically configured range of public port numbers, it is extremely easy because no logging is necessary for police grade audit trail opacity. Masataka Ohta