Re: 60 ms cross-continent

2020-06-21 Thread Eric Kuhnke
Serious HFT moved to shortwave years ago. The chicago-NYC routes by
microwave still exist, but are only for things that need higher data rates
(as measured in kbps). It's hard to hide a giant log-periodic or yagi-uda
antenna. The sites near Chicago that are aimed at London are well known to
those in the industry.



On Sun, Jun 21, 2020 at 10:53 AM Brett Frankenberger 
wrote:

> On Sun, Jun 21, 2020 at 02:17:08PM -0300, Rubens Kuhl wrote:
> > On Sat, Jun 20, 2020 at 5:05 PM Marshall Eubanks <
> marshall.euba...@gmail.com>
> > wrote:
> >
> > > This was also pitched as one of the killer-apps for the SpaceX
> > > Starlink satellite array, particularly for cross-Atlantic and
> > > cross-Pacific trading.
> > >
> > >
> > >
> https://blogs.cfainstitute.org/marketintegrity/2019/06/25/fspacex-is-opening-up-the-next-frontier-for-hft/
> > >
> > > "Several commentators quickly caught onto the fact that an extremely
> > > expensive network whose main selling point is long-distance,
> > > low-latency coverage has a unique chance to fund its growth by
> > > addressing the needs of a wealthy market that has a high willingness
> > > to pay — high-frequency traders."
> > >
> > >
> > This is a nice plot for a movie, but not how HFT is really done. It's so
> > much easier to colocate on the same datacenter of the exchange and run
> > algorithms from there; while those algorithms need humans to guide their
> > strategy, the human thought process takes a couple of seconds anyways. So
> > the real HFTs keep using the defined strategy while the human controller
> > doesn't tell it otherwise.
>
> For faster access to one exchange, yes, absolutely, colocate at the
> exchange.  But there's more then one exchange.
>
> As one example, many index futures trade in Chicago.  The stocks that
> make up those indices mostly trade in New York.  There's money to be
> made on the arbitrage, if your Chicago algorithms get faster
> information from New York (and vice versa) than everyone else's
> algorithms.
>
> More expensive but shorter fiber routes have been build between NYC and
> Chicago for this reason, as have a microwave paths (to get
> speed-of-light in air rather than in glass).  There's competition to
> have the microwave towers as close as possible to the data centers,
> because the last mile is fiber so the longer your last mile, the less
> valuable your network.
>
>
> https://www.bloomberg.com/news/features/2019-03-08/the-gazillion-dollar-standoff-over-two-high-frequency-trading-towers
>
>  -- Brett
>


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Mark Tinka



On 21/Jun/20 23:01, Robert Raszuk wrote:

>
> Nope. You need to get to PQ node via potentially many hops. So you
> need to have even ordered or independent label distribution to its
> loopback in place.

I have some testing I want to do with IS-IS only announcing the Loopback
from a set of routers to the rest of the backbone, and LDP allocating
labels for it accordingly, to solve a particular problem.

I'll test this out and see what happens re: LDP LFA.

Mark.


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Robert Raszuk
> Wouldn't T-LDP fix this, since LDP LFA is a targeted session?

Nope. You need to get to PQ node via potentially many hops. So you need to
have even ordered or independent label distribution to its loopback in
place.

Best,
R.

On Sun, Jun 21, 2020 at 10:58 PM Mark Tinka  wrote:

>
>
> On 21/Jun/20 22:21, Robert Raszuk wrote:
>
>
> Well this is true for one company :) Name starts with j 
>
> Other company name starting with c - at least some time back by default
> allocated labels for all routes in the RIB either connected or static or
> sourced from IGP. Sure you could always limit that with a knob if desired.
>
>
>
> Juniper allocates labels to the Loopback only.
>
> Cisco allocates labels to all IGP and interface routes.
>
> Neither allocate labels to BGP routes for the global table.
>
>
>
> The issue with allocating labels only for BGP next hops is that your
> IP/MPLS LFA breaks (or more directly is not possible) as you do not have a
> label to PQ node upon failure.  Hint: PQ node is not even running BGP :).
>
>
> Wouldn't T-LDP fix this, since LDP LFA is a targeted session?
>
> Need to test.
>
>
>
> Sure selective folks still count of "IGP Convergence" to restore
> connectivity. But I hope those will move to much faster connectivity
> restoration techniques soon.
>
>
> We are happy :-).
>
> Mark.
>


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Mark Tinka


On 21/Jun/20 22:21, Robert Raszuk wrote:

>
> Well this is true for one company :) Name starts with j  
>
> Other company name starting with c - at least some time back by
> default allocated labels for all routes in the RIB either connected or
> static or sourced from IGP. Sure you could always limit that with a
> knob if desired.


Juniper allocates labels to the Loopback only.

Cisco allocates labels to all IGP and interface routes.

Neither allocate labels to BGP routes for the global table.


>
> The issue with allocating labels only for BGP next hops is that your
> IP/MPLS LFA breaks (or more directly is not possible) as you do not
> have a label to PQ node upon failure.  Hint: PQ node is not even
> running BGP :).

Wouldn't T-LDP fix this, since LDP LFA is a targeted session?

Need to test.


>
> Sure selective folks still count of "IGP Convergence" to restore
> connectivity. But I hope those will move to much faster connectivity
> restoration techniques soon.

We are happy :-).

Mark.


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Mark Tinka



On 21/Jun/20 21:15, adamv0...@netconsultings.com wrote:

> I wouldn't say it's known to many as not many folks are actually limited by 
> only up to ~1M customer connections, or next level up, only up to ~1M 
> customer VPNs.   

It's probably less of a problem now than it was 10 years ago. But, yes,
I don't have any real-world experience.



> Well yeah, things work differently in VRFs, not a big surprise.  
> And what about an example of bad flowspec routes/filters cutting the boxes 
> off net -where having those flowspec routes/filters contained within an 
> Internet VRF would not have such an effect.
> See, it goes either way.  
> Would be interesting to see a comparison of good vs bad for the Internet 
> routes in VRF vs in Internet routes in global/default routing table.

Well, the global table is the basics, and VRF's is where sexy lives :-).


> No, that's just a result of having a finite FIB/RIB size -if you want to cut 
> these resources into virtual pieces you'll naturally get your equations above.
> But if you actually construct your testing to showcase the delta between how 
> much FIB/RIB space is taken by x prefixes with each in a VRF as opposed to 
> all in a single default VRF (global routing table) the delta is negligible. 
> (Yes negligible even in case of per prefix VPN label allocation method -which 
> I'm assuming no one is using anyways as it inherently doesn't scale and would 
> limit you to ~1M VPN prefixes though per-CE/per-next-hop VPN label allocation 
> method gives one the same functionality as per-prefix one while pushing the 
> limit to ~1M PE-CE links/IFLs which from my experience is sufficient for most 
> folks out there). 

Like I said, with today's CPU's and memory, probably not an issue. But
it's not an area I play in, so those with more experience - like
yourself - would know better.

Mark.


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Robert Raszuk
>
> I should point out that all of my input here is based on simple MPLS
> forwarding of IP traffic in the global table. In this scenario, labels
> are only assigned to BGP next-hops, which is typically an IGP Loopback
> address.
>

Well this is true for one company :) Name starts with j 

Other company name starting with c - at least some time back by default
allocated labels for all routes in the RIB either connected or static or
sourced from IGP. Sure you could always limit that with a knob if desired.

The issue with allocating labels only for BGP next hops is that your
IP/MPLS LFA breaks (or more directly is not possible) as you do not have a
label to PQ node upon failure.  Hint: PQ node is not even running BGP :).

Sure selective folks still count of "IGP Convergence" to restore
connectivity. But I hope those will move to much faster connectivity
restoration techniques soon.


> Labels don't get assigned to BGP routes in a global table. There is no
> use for that.
>

Sure - True.

Cheers,
R,


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Mark Tinka



On 21/Jun/20 19:34, Robert Raszuk wrote:
>
> That is true for P routers ... not so much for PEs. 
>
> Please observe that label space in each PE router is divided for IGP
> and BGP as well as other label hungy services ... there are many
> consumers of local label block. 
>
> So it is always the case that LFIB table (max 2^20 entries - 1M) on
> PEs is much larger then LFIB on P nodes.

I should point out that all of my input here is based on simple MPLS
forwarding of IP traffic in the global table. In this scenario, labels
are only assigned to BGP next-hops, which is typically an IGP Loopback
address.

Labels don't get assigned to BGP routes in a global table. There is no
use for that.

Of course, as this is needed in VRF's and other BGP-based VPN services,
the extra premium customers pay for that priviledge may be considered
warranted :-).

Mark.


RE: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread adamv0025



> From: NANOG  On Behalf Of Mark Tinka
> Sent: Friday, June 19, 2020 7:28 PM
> 
> 
> On 19/Jun/20 17:13, Robert Raszuk wrote:
> 
> >
> > So I think Ohta-san's point is about scalability services not flat
> > underlay RIB and FIB sizes. Many years ago we had requests to support
> > 5M L3VPN routes while underlay was just 500K IPv4.
> 
> Ah, if the context, then, was l3vpn scaling, yes, that is a known issue.
> 
I wouldn't say it's known to many as not many folks are actually limited by 
only up to ~1M customer connections, or next level up, only up to ~1M customer 
VPNs.   

> Apart from the global table vs. VRF parity concerns I've always had (one of
> which was illustrated earlier this week, on this list, with RPKI in a VRF),
>
Well yeah, things work differently in VRFs, not a big surprise.  
And what about an example of bad flowspec routes/filters cutting the boxes off 
net -where having those flowspec routes/filters contained within an Internet 
VRF would not have such an effect.
See, it goes either way.  
Would be interesting to see a comparison of good vs bad for the Internet routes 
in VRF vs in Internet routes in global/default routing table.


> the
> other reason I don't do Internet in a VRF is because it was always a 
> trade-off:
> 
> - More routes per VRF = fewer VRF's.
> - More VRF's  = fewer routes per VRF.
> 
No, that's just a result of having a finite FIB/RIB size -if you want to cut 
these resources into virtual pieces you'll naturally get your equations above.
But if you actually construct your testing to showcase the delta between how 
much FIB/RIB space is taken by x prefixes with each in a VRF as opposed to all 
in a single default VRF (global routing table) the delta is negligible. 
(Yes negligible even in case of per prefix VPN label allocation method -which 
I'm assuming no one is using anyways as it inherently doesn't scale and would 
limit you to ~1M VPN prefixes though per-CE/per-next-hop VPN label allocation 
method gives one the same functionality as per-prefix one while pushing the 
limit to ~1M PE-CE links/IFLs which from my experience is sufficient for most 
folks out there). 
  
adam




Re: 60 ms cross-continent

2020-06-21 Thread Rubens Kuhl
> > This is a nice plot for a movie, but not how HFT is really done. It's so
> > much easier to colocate on the same datacenter of the exchange and run
> > algorithms from there; while those algorithms need humans to guide their
> > strategy, the human thought process takes a couple of seconds anyways. So
> > the real HFTs keep using the defined strategy while the human controller
> > doesn't tell it otherwise.
>
> For faster access to one exchange, yes, absolutely, colocate at the
> exchange.  But there's more then one exchange.
>

Yes, but to do real HFT you will need to colocate at each exchange.
Otherwise your competitors have a head start on you.


>
> As one example, many index futures trade in Chicago.  The stocks that
> make up those indices mostly trade in New York.  There's money to be
> made on the arbitrage, if your Chicago algorithms get faster
> information from New York (and vice versa) than everyone else's
> algorithms.
>

Most traded index futures are longer than just that day closing, usually
months to a year in advance.
They are influenced mostly by traders perception on economic futures, and
the current stocks valuation is a poor proxy for it.
There is more chance in reading the news feeds and speculating its impact
on perception than stocks.

Rubens


Re: 60 ms cross-continent

2020-06-21 Thread Alejandro Acosta



On 6/21/20 1:53 PM, Brett Frankenberger wrote:

On Sun, Jun 21, 2020 at 02:17:08PM -0300, Rubens Kuhl wrote:

On Sat, Jun 20, 2020 at 5:05 PM Marshall Eubanks 
wrote:


This was also pitched as one of the killer-apps for the SpaceX
Starlink satellite array, particularly for cross-Atlantic and
cross-Pacific trading.


https://blogs.cfainstitute.org/marketintegrity/2019/06/25/fspacex-is-opening-up-the-next-frontier-for-hft/

"Several commentators quickly caught onto the fact that an extremely
expensive network whose main selling point is long-distance,
low-latency coverage has a unique chance to fund its growth by
addressing the needs of a wealthy market that has a high willingness
to pay — high-frequency traders."



This is a nice plot for a movie, but not how HFT is really done. It's so
much easier to colocate on the same datacenter of the exchange and run
algorithms from there; while those algorithms need humans to guide their
strategy, the human thought process takes a couple of seconds anyways. So
the real HFTs keep using the defined strategy while the human controller
doesn't tell it otherwise.

For faster access to one exchange, yes, absolutely, colocate at the
exchange.  But there's more then one exchange.

As one example, many index futures trade in Chicago.  The stocks that
make up those indices mostly trade in New York.  There's money to be
made on the arbitrage, if your Chicago algorithms get faster
information from New York (and vice versa) than everyone else's
algorithms.

More expensive but shorter fiber routes have been build between NYC and
Chicago for this reason, as have a microwave paths (to get
speed-of-light in air rather than in glass).  There's competition to
have the microwave towers as close as possible to the data centers,
because the last mile is fiber so the longer your last mile, the less
valuable your network.

https://www.bloomberg.com/news/features/2019-03-08/the-gazillion-dollar-standoff-over-two-high-frequency-trading-towers



... and similar to this: 
https://www.extremetech.com/extreme/122989-1-5-billion-the-cost-of-cutting-london-toyko-latency-by-60ms





  -- Brett


Re: 60 ms cross-continent

2020-06-21 Thread Brett Frankenberger
On Sun, Jun 21, 2020 at 02:17:08PM -0300, Rubens Kuhl wrote:
> On Sat, Jun 20, 2020 at 5:05 PM Marshall Eubanks 
> wrote:
> 
> > This was also pitched as one of the killer-apps for the SpaceX
> > Starlink satellite array, particularly for cross-Atlantic and
> > cross-Pacific trading.
> >
> >
> > https://blogs.cfainstitute.org/marketintegrity/2019/06/25/fspacex-is-opening-up-the-next-frontier-for-hft/
> >
> > "Several commentators quickly caught onto the fact that an extremely
> > expensive network whose main selling point is long-distance,
> > low-latency coverage has a unique chance to fund its growth by
> > addressing the needs of a wealthy market that has a high willingness
> > to pay — high-frequency traders."
> >
> >
> This is a nice plot for a movie, but not how HFT is really done. It's so
> much easier to colocate on the same datacenter of the exchange and run
> algorithms from there; while those algorithms need humans to guide their
> strategy, the human thought process takes a couple of seconds anyways. So
> the real HFTs keep using the defined strategy while the human controller
> doesn't tell it otherwise.

For faster access to one exchange, yes, absolutely, colocate at the
exchange.  But there's more then one exchange.

As one example, many index futures trade in Chicago.  The stocks that
make up those indices mostly trade in New York.  There's money to be
made on the arbitrage, if your Chicago algorithms get faster
information from New York (and vice versa) than everyone else's
algorithms.

More expensive but shorter fiber routes have been build between NYC and
Chicago for this reason, as have a microwave paths (to get
speed-of-light in air rather than in glass).  There's competition to
have the microwave towers as close as possible to the data centers,
because the last mile is fiber so the longer your last mile, the less
valuable your network.

https://www.bloomberg.com/news/features/2019-03-08/the-gazillion-dollar-standoff-over-two-high-frequency-trading-towers

 -- Brett


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Robert Raszuk
> The LFIB in each node need only be as large as the number of LDP-enabled
routers in the network.

That is true for P routers ... not so much for PEs.

Please observe that label space in each PE router is divided for IGP and
BGP as well as other label hungy services ... there are many consumers of
local label block.

So it is always the case that LFIB table (max 2^20 entries - 1M) on PEs is
much larger then LFIB on P nodes.

Thx,
R.




On Sun, Jun 21, 2020 at 6:01 PM Mark Tinka  wrote:

>
>
> On 21/Jun/20 15:48, Robert Raszuk wrote:
>
>
>
> Actually when IGP changes LSPs are not recomputed with LDP or SR-MPLS
> (when used without TE :).
>
> "LSP" term is perhaps what drives your confusion --- in LDP MPLS there is
> no "Path" - in spite of the acronym (Labeled Switch *Path*). Labels are
> locally significant and swapped at each LSR - resulting essentially with a
> bunch of one hop crossconnects.
>
> In other words MPLS LDP strictly follows IGP SPT at each LSR hop.
>
>
> Yep, which is what I tried to explain as well. With LDP, MPLS-enabled
> hosts simply push, swap and pop. There is not concept of an "end-to-end
> LSP" as such. We just use the term "LSP" to define an FEC. But really, each
> node in the FEC's path is making its own push, swap and pop decisions.
>
> The LFIB in each node need only be as large as the number of LDP-enabled
> routers in the network. You can get scenarios where FEC's are also created
> for infrastructure links, but if you employ filtering to save on FIB slots,
> you really just need to allocate labels to Loopback addresses only.
>
> Mark.
>


Re: 60 ms cross-continent

2020-06-21 Thread Rubens Kuhl
On Sat, Jun 20, 2020 at 5:05 PM Marshall Eubanks 
wrote:

> This was also pitched as one of the killer-apps for the SpaceX
> Starlink satellite array, particularly for cross-Atlantic and
> cross-Pacific trading.
>
>
> https://blogs.cfainstitute.org/marketintegrity/2019/06/25/fspacex-is-opening-up-the-next-frontier-for-hft/
>
> "Several commentators quickly caught onto the fact that an extremely
> expensive network whose main selling point is long-distance,
> low-latency coverage has a unique chance to fund its growth by
> addressing the needs of a wealthy market that has a high willingness
> to pay — high-frequency traders."
>
>
This is a nice plot for a movie, but not how HFT is really done. It's so
much easier to colocate on the same datacenter of the exchange and run
algorithms from there; while those algorithms need humans to guide their
strategy, the human thought process takes a couple of seconds anyways. So
the real HFTs keep using the defined strategy while the human controller
doesn't tell it otherwise.

And in order to preserve equality among traders, each exchange already adds
physically (loops of fiber or copper cable) some ns to closer racks so
everyone gets at the system at the same time.

And then comes a really high added latency of the trade risk controller,
which limits what a trader is allowed to expose itself to what is deposited
or agreed with the exchange. And this comes with both latency and jitter
due to its implementation, making even the faster HFT only faster on
average, not faster at every transaction.


Rubens


Re: 60 ms cross-continent

2020-06-21 Thread Tony Finch
Mel Beckman  wrote:

> An intriguing development in fiber optic media is hollow core optical
> fiber, which achieves 99.7% of the speed of light in a vacuum.
>
> https://www.extremetech.com/computing/151498-researchers-create-fiber-network-that-operates-at-99-7-speed-of-light-smashes-speed-and-latency-records

Here's an update from 7 years after that article which hints at the
downside of hollow core fibre:

https://phys.org/news/2020-03-hollow-core-fiber-technology-mainstream-optical.html

It sounds like attenuation was a big problem: "in the space of 18 months
the attenuation in data-transmitting hollow-core fibers has been reduced
by over a factor of 10, from 3.5dB/km to only 0.28 dB/km within a factor
of two of the attenuation of conventional all-glass fiber technology."

Tony.
-- 
f.anthony.n.finchhttp://dotat.at/
Shetland Isles: Southeasterly 5 or 6, veering southerly or southwesterly 3 or
4, then backing southeasterly 5 later in southwest. Slight or moderate,
occasionally rough later in far west. Occasional rain then mainly fair, but
showers far in east. Good, occasionally moderate.


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Mark Tinka


On 21/Jun/20 15:48, Robert Raszuk wrote:

>
>
> Actually when IGP changes LSPs are not recomputed with LDP or SR-MPLS
> (when used without TE :). 
>
> "LSP" term is perhaps what drives your confusion --- in LDP MPLS there
> is no "Path" - in spite of the acronym (Labeled Switch *Path*). Labels
> are locally significant and swapped at each LSR - resulting
> essentially with a bunch of one hop crossconnects. 
>
> In other words MPLS LDP strictly follows IGP SPT at each LSR hop.

Yep, which is what I tried to explain as well. With LDP, MPLS-enabled
hosts simply push, swap and pop. There is not concept of an "end-to-end
LSP" as such. We just use the term "LSP" to define an FEC. But really,
each node in the FEC's path is making its own push, swap and pop decisions.

The LFIB in each node need only be as large as the number of LDP-enabled
routers in the network. You can get scenarios where FEC's are also
created for infrastructure links, but if you employ filtering to save on
FIB slots, you really just need to allocate labels to Loopback addresses
only.

Mark.


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Mark Tinka


On 21/Jun/20 14:58, Baldur Norddahl wrote:

>
> Not really the same. Lets say the best path is through transit 1 but
> the customer thinks transit 1 sucks balls and wants his egress traffic
> to go through your transit 2. Only the VRF approach lets every BGP
> customer, even single homed ones, make his own choices about upstream
> traffic.
>
> You would be more like a transit broker than a traditional ISP with a
> routing mix. Your service is to buy one place, but get the exact same
> product as you would have if you bought from top X transits in your
> area. Delivered as X distinct BGP sessions to give you total freedom
> to send traffic via any of the transit providers.

We received such requests years ago, and calculated the cost of
complexity vs. BGP communities. In the end, if the customer wants to use
a particular upstream on our side, we'd rather setup an EoMPLS circuit
between them and they can have their own contract.

Practically, 90% of our traffic is peering. We don't that much with
upstreams providers.


>
> This is also the reason you do not actually need any routes in the FIB
> for each of those transit VRFs. Just a default route because all
> traffic will unconditionally go to said transit provider. The customer
> routes would still be there of course.

Glad it works for you. We just found it too complex, not just for the
problems it would solve, but also for the parity issues between VRF's
and the global table.

Mark.


Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

2020-06-21 Thread Mark Tinka



On 21/Jun/20 14:36, Masataka Ohta wrote:

>  
>
> That is a tragedy.

Well...


> If all the link-wise (or, worse, host-wise) information of possible
> destinations is distributed in advance to all the possible sources,
> it is not hierarchical but flat (host) routing, which scales poorly.
>
> Right?

Host NLRI is summarized in iBGP within the domain, and eBGP outside the
domain.

It's no longer novel to distribute end-user NLRI in the IGP. If folk are
still doing that, I can't feel sympathy for the pain they may experience.


>
> Why, do you think, flat routing does not but hierarchical
> routing does scale?
>
> It is because detailed information to reach destinations
> below certain level is advertised not globally but only for
> small part of the network around the destinations.
>
> That is, with hierarchical routing, detailed information
> around destinations is actively hidden from sources.
>
> So, with hierarchical routing, routing protocols can
> carry only rough information around destinations, from
> which, source side can not construct detailed (often
> purposelessly nested) labels required for MPLS.

But hosts often point default to a clever router.

That clever router could also either point default to the provider, or
carry a full BGP table from the provider.

Neither the host nor their first-hop gateway need to be MPLS-aware.

There are use-cases where a customer CPE can be MPLS-aware, but I'd say
that in nearly 99.999% of all cases, CPE are never MPLS-aware.


> According to your theory to ignore routing traffic, we can be happy
> with global *host* routing table with 4G entries for IPv4 and a lot
> lot lot more than that for IPv6. CIDR should be unnecessary
> complication to the Internet

Not sure what Internet you're running, but I, generally, accept
aggregate IPv4 and IPv6 BGP routes from other AS's. I don't need to know
every /32 or /128 host that sits behind them.


>
> With nested labels, you don't need so much labels at certain nesting
> level, which was the point of Yakov, which does not mean you don't
> need so much information to create entire nested labels at or near
> the sources.

I don't know what Yakov advertised back in the day, but looking at what
I and a ton of others are running in practice, in the real world, today,
I don't see what you're talking about.

Again, if you can identify an actual scenario today, in a live, large
scale (or even small scale) network, I'd like to know.

I'm talking about what's in practice, not theory.


>
> The problem is that we can't afford traffic (and associated processing
> by all the related routers or things like those) and storage (at or
> near source) for routing (or MPLS, SR* or whatever) with such detailed
> routing at the destinations.

Again, I disagree as I mentioned earlier, because you won't be able to
buy a router today that does only IP any cheaper than it does both IP
and MPLS.

MPLS has become mainstream, that its economies of scale have made the
consideration between it and IP a non-starter. Heck, you can even do it
in Linux...

Mark.



Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Robert Raszuk
> I'm saying that, if some failure occurs and IGP changes, a
> lot of LSPs must be recomputed, which does not scale
> if # of LSPs is large, especially in a large network
> where IGP needs hierarchy (such as OSPF area).
>
> Masataka Ohta
>


Actually when IGP changes LSPs are not recomputed with LDP or SR-MPLS (when
used without TE :).

"LSP" term is perhaps what drives your confusion --- in LDP MPLS there is
no "Path" - in spite of the acronym (Labeled Switch *Path*). Labels are
locally significant and swapped at each LSR - resulting essentially with a
bunch of one hop crossconnects.

In other words MPLS LDP strictly follows IGP SPT at each LSR hop.

Many thx,
R.


Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

2020-06-21 Thread Robert Raszuk
Let's clarify a few things ...

On Sun, Jun 21, 2020 at 2:39 PM Masataka Ohta <
mo...@necom830.hpcl.titech.ac.jp> wrote:

If all the link-wise (or, worse, host-wise) information of possible
> destinations is distributed in advance to all the possible sources,
> it is not hierarchical but flat (host) routing, which scales poorly.
>
> Right?
>

Neither link wise nor host wise information is required to accomplish say
L3VPN services. Imagine you have three sites which would like to
interconnect each with 1000s of users.

So all you are exchanging as part of VPN overlay is three subnets.

Moreover if you have 1000 PEs and those three sites are attached only to 6
of them - only those 6 PEs will need to learn those routes (Hint: RTC -
RFC4684)

It is because detailed information to reach destinations
> below certain level is advertised not globally but only for
> small part of the network around the destinations.
>

Same thing here.


> That is, with hierarchical routing, detailed information
> around destinations is actively hidden from sources.
>

  Same thing here.

That is why as described we use label stack. Top label is responsible to
get you to the egress PE. Service label sitting behind top label is
responsible to get you  through to the customer site (with or without IP
lookup at egress PE).


> So, with hierarchical routing, routing protocols can
> carry only rough information around destinations, from
> which, source side can not construct detailed (often
> purposelessly nested) labels required for MPLS.
>

Usually sources have no idea of MPLS. MPLS to the host never took off.


> According to your theory to ignore routing traffic, we can be happy
> with global *host* routing table with 4G entries for IPv4 and a lot
> lot lot more than that for IPv6. CIDR should be unnecessary
> complication to the Internet
>

I do not think any one saying it here.


> With nested labels, you don't need so much labels at certain nesting
> level, which was the point of Yakov, which does not mean you don't
> need so much information to create entire nested labels at or near
> the sources.
>

Label stack is here from day one. Each layer of the stack has a completely
different role. That is your hierarchy.

Kind regards,
R.


Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

2020-06-21 Thread Robert Raszuk
It is destination based flat routing distributed 100% before any data
packet within each layer - yes. But layers are decoupled so in a sense this
is what defines a hierarchy overall.

So transport is using MPLS LSPs most often hosts IGP routes are matched
with LDP FECs and flooded everywhere in spite of RFC 5283 at least allowing
to aggregate IGP.

Then say L2VPNs or L3VPNs with their own choice of routing protocols are in
turn distributing reachability for the customer sites. Those are service
routes linked to transport by BGP next hop(s).

Many thx,
R.


On Sun, Jun 21, 2020 at 1:11 PM Masataka Ohta <
mo...@necom830.hpcl.titech.ac.jp> wrote:

> Robert Raszuk wrote:
>
> > MPLS LDP or L3VPNs was NEVER flow driven.
> >
> > Since day one till today it was and still is purely destination based.
>
> If information to create labels at or near sources to all the
> possible destinations is distributed in advance, may be. But
> it is effectively flat routing, or, in extreme cases, flat host
> routing.
>
> Or, if information to create labels to all the active destinations
> is supplied on demand, it is flow driven.
>
> On day one, Yakov said MPLS had scaled because of nested labels
> corresponding to routing hierarchy.
>
> Masataka Ohta
>


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Baldur Norddahl
On Sun, Jun 21, 2020 at 1:30 PM Mark Tinka  wrote:

>
>
> On 21/Jun/20 12:45, Baldur Norddahl wrote:
>
>
> Yes I once made a plan to have one VRF per transit provider plus a peering
> VRF. That way our BGP customers could have a session with each of those
> VRFs to allow them full control of the route mix. I would of course also
> need a Internet VRF for our own needs.
>
> But the reality of that would be too many copies of the DFZ in the routing
> tables. Although not necessary in the FIB as each of the transit VRFs could
> just have a default route installed.
>
>
> We just opted for BGP communities :-).
>
>
Not really the same. Lets say the best path is through transit 1 but the
customer thinks transit 1 sucks balls and wants his egress traffic to go
through your transit 2. Only the VRF approach lets every BGP customer, even
single homed ones, make his own choices about upstream traffic.

You would be more like a transit broker than a traditional ISP with a
routing mix. Your service is to buy one place, but get the exact same
product as you would have if you bought from top X transits in your area.
Delivered as X distinct BGP sessions to give you total freedom to send
traffic via any of the transit providers.

This is also the reason you do not actually need any routes in the FIB for
each of those transit VRFs. Just a default route because all traffic will
unconditionally go to said transit provider. The customer routes would
still be there of course.

Regards,

Baldur


Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

2020-06-21 Thread Masataka Ohta

Mark Tinka wrote:


If information to create labels at or near sources to all the
possible destinations is distributed in advance, may be.


But this is what happens today.


That is a tragedy.


Whether you do it manually or use a label distribution protocol, FEC's
are pre-computed ahead of time.

What am I missing?


If all the link-wise (or, worse, host-wise) information of possible
destinations is distributed in advance to all the possible sources,
it is not hierarchical but flat (host) routing, which scales poorly.

Right?


But
it is effectively flat routing, or, in extreme cases, flat host
routing.


I still don't get it.


Why, do you think, flat routing does not but hierarchical
routing does scale?

It is because detailed information to reach destinations
below certain level is advertised not globally but only for
small part of the network around the destinations.

That is, with hierarchical routing, detailed information
around destinations is actively hidden from sources.

So, with hierarchical routing, routing protocols can
carry only rough information around destinations, from
which, source side can not construct detailed (often
purposelessly nested) labels required for MPLS.

> So why create labels on-demand if
> a box to handle the traffic is already in place and actively working,
> day-in, day-out?

According to your theory to ignore routing traffic, we can be happy
with global *host* routing table with 4G entries for IPv4 and a lot
lot lot more than that for IPv6. CIDR should be unnecessary
complication to the Internet

With nested labels, you don't need so much labels at certain nesting
level, which was the point of Yakov, which does not mean you don't
need so much information to create entire nested labels at or near
the sources.

The problem is that we can't afford traffic (and associated processing
by all the related routers or things like those) and storage (at or
near source) for routing (or MPLS, SR* or whatever) with such detailed
routing at the destinations.


Masataka Ohta


Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

2020-06-21 Thread Mark Tinka



On 21/Jun/20 13:11, Masataka Ohta wrote:
 
>
> If information to create labels at or near sources to all the
> possible destinations is distributed in advance, may be.

But this is what happens today.

Whether you do it manually or use a label distribution protocol, FEC's
are pre-computed ahead of time.

What am I missing?


> But
> it is effectively flat routing, or, in extreme cases, flat host
> routing.

I still don't get it.


>
> Or, if information to create labels to all the active destinations
> is supplied on demand, it is flow driven.

What would the benefit of this be? Ingress and egress nodes don't come
and go. They are stuck in racks in data centres somewhere, and won't
disappear until a human wants them to. So why create labels on-demand if
a box to handle the traffic is already in place and actively working,
day-in, day-out?

Mark.


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Mark Tinka


On 21/Jun/20 12:45, Baldur Norddahl wrote:

>
> Yes I once made a plan to have one VRF per transit provider plus a
> peering VRF. That way our BGP customers could have a session with each
> of those VRFs to allow them full control of the route mix. I would of
> course also need a Internet VRF for our own needs.
>
> But the reality of that would be too many copies of the DFZ in the
> routing tables. Although not necessary in the FIB as each of the
> transit VRFs could just have a default route installed.

We just opted for BGP communities :-).

Mark.


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Mark Tinka



On 21/Jun/20 12:10, Masataka Ohta wrote:

>  
> It was implemented and some technology was used by commercial
> router from Furukawa (a Japanese vendor selling optical
> fiber now not selling routers).

I won't lie, never heard of it.


> GMPLS, you are using, is the mechanism to guarantee QoS by
> reserving wavelength resource. It is impossible for GMPLS
> not to offer QoS.

That is/was the idea.

In practice (at least in our Transport network), deploying capacity as
an offline exercise is significantly simpler. In such a case, we
wouldn't use GMPLS for capacity reservation, just path re-computation in
failure scenarios.

Our Transport network isn't overly meshed. It's just stretchy. Perhaps
if one was trying to build a DWDM backbone into, out of and through
every city in the U.S., capacity reservation in GMPLS may be a use-case.
But unless someone is willing to pipe up and confess to implementing it
in this way, I've not heard of it.


>
> Moreover, as some people says they offer QoS with MPLS, they
> should be using some prioritized queueing mechanisms, perhaps
> not poor WFQ.

It would be a combination - PQ and WFQ depending on the traffic type and
how much customers want to pay.

But carrying an MPLS EXP code point does not make MPLS unscalable. It's
no different to carrying a DSCP or IPP code point in plain IP. Or even
an 802.1p code point in Ethernet.


> They are different, of course. But, GMPLS is to reserve bandwidth
> resource.

In theory. What are people doing in practice? I just told you our story.


> MPLS, in general, is to reserve label values, at least.

MPLS is the forwarding paradigm. Label reservation/allocation can be
done manually or with a label distribution protocol. MPLS doesn't care
how labels are generated and learned. It will just push, swap and pop as
it needs to.


> I didn't say scaling problem caused by QoS.
>
> But, as you are avoiding to extensively use MPLS, I think you
> are aware that extensive use of MPLS needs management of a
> lot of labels, which does not scale.
>
> Or, do I misunderstand something?

I'm not avoiding extensive use of MPLS. I want extensive use of MPLS.

In IPv4, we forward in MPLS 100%. In IPv6, we forward in MPLS 80%. This
is due to vendor nonsense. Trying to fix.



> No. IntServ specifies format to carry QoS specification in RSVP
> packets without assuming any specific model of QoS.

Then I'm failing to understand your point, especially since it doesn't
sound like any operator is deploying such a model, or if so, publicly
suffering from it.



> No. As experimental switches are working years ago and making
> it work >10Tbps is not difficult (switching is easy, generating
> 10Tbps packets needs a lot of parallel equipment), there is little
> remaining for research.

We'll get there. This doesn't worry me so much :-). Either horizontally
or vertically. I can see a few models to scale IP/MPLS carriage.


>    
> SDN, maybe. Though I'm not saying SDN scale, it should be no
> worse than MPLS.

I still can't tell you what SDN is :-). I won't suffer it in this
decade, thankfully.


> I did some retrospective research.
>
>    https://en.wikipedia.org/wiki/Multiprotocol_Label_Switching
>    History
>    1994: Toshiba presented Cell Switch Router (CSR) ideas to IETF BOF
>    1996: Ipsilon, Cisco and IBM announced label switching plans
>    1997: Formation of the IETF MPLS working group
>    1999: First MPLS VPN (L3VPN) and TE deployments
>    2000: MPLS traffic engineering
>    2001: First MPLS Request for Comments (RFCs) released
>
> as I was a co-chair of 1994 BOF and my knowledge on MPLS is
> mostly on 1997 ID:
>
>    https://tools.ietf.org/html/draft-ietf-mpls-arch-00
>
> there seems to be a lot of terminology changes.

My comment to that was in reference to your text, below:

    "What if, an inner label becomes invalidated around the
    destination, which is hidden, for route scalability,
    from the equipments around the source?"

I've never heard of such an issue in 16 years.


>
> I'm saying that, if some failure occurs and IGP changes, a
> lot of LSPs must be recomputed, which does not scale
> if # of LSPs is large, especially in a large network
> where IGP needs hierarchy (such as OSPF area).

That happens everyday, already. Links fail, IGP re-converges, LDP keeps
humming. RSVP-TE too, albeit all that state does need some consideration
especially if code is buggy.

Particularly, where you have LFA/IP-FRR both in the IGP and LDP, I've
not come across any issue where IGP re-convergence caused LSP's to fail.

In practice, IGP hierarchy (OSPF Areas or IS-IS Levels) doesn't help
much if you are running MPLS. FEC's are forged against /32 and /128
addresses. Yes, as with everything else, it's a trade-off.

Mark.



Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

2020-06-21 Thread Masataka Ohta

Robert Raszuk wrote:


MPLS LDP or L3VPNs was NEVER flow driven.

Since day one till today it was and still is purely destination based.


If information to create labels at or near sources to all the
possible destinations is distributed in advance, may be. But
it is effectively flat routing, or, in extreme cases, flat host
routing.

Or, if information to create labels to all the active destinations
is supplied on demand, it is flow driven.

On day one, Yakov said MPLS had scaled because of nested labels
corresponding to routing hierarchy.

Masataka Ohta


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Baldur Norddahl
On Sun, Jun 21, 2020 at 9:56 AM Mark Tinka  wrote:

>
>
> On 20/Jun/20 22:00, Baldur Norddahl wrote:
>
>
> I can't speak for the year 2000 as I was not doing networking at this
> level at that time. But when I check the specs for the base mx204 it says
> something like 32 VRFs, 2 million routes in FIB and 6 million routes in
> RIB. Clearly those numbers are the total of routes across all VRFs
> otherwise you arrive at silly numbers (64 million FIB if you multiply, 128k
> FIB if you divide by 32). My conclusion is that scale wise you are ok as
> long you do not try to have more than one VRF with a complete copy of the
> DFZ.
>
>
> I recall a number of networks holding multiple VRF's, including at least
> 2x Internet VRF's, for numerous use-cases. I don't know if they still do
> that today, but one can get creative real quick :-).
>
>
Yes I once made a plan to have one VRF per transit provider plus a peering
VRF. That way our BGP customers could have a session with each of those
VRFs to allow them full control of the route mix. I would of course also
need a Internet VRF for our own needs.

But the reality of that would be too many copies of the DFZ in the routing
tables. Although not necessary in the FIB as each of the transit VRFs could
just have a default route installed.

Regards,

Baldur


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Masataka Ohta

Mark Tinka wrote:


There are many. So, our research group tried to improve RSVP.


I'm a lot younger than the Internet, but I read a fair bit about its
history. I can't remember ever coming across an implementation of RSVP
between a host and the network in a commercial setting.


No, of course, because, as we agreed, RSVP has a lot of problems.


Was "S-RSVP" ever implemented, and deployed?


It was implemented and some technology was used by commercial
router from Furukawa (a Japanese vendor selling optical
fiber now not selling routers).


However, perhaps, most people think show stopper to RSVP is lack
of scalability of weighted fair queueing, though, it is not a
problem specific to RSVP and MPLS shares the same problem.


QoS has nothing to do with MPLS. You can do QoS with or without MPLS.


GMPLS, you are using, is the mechanism to guarantee QoS by
reserving wavelength resource. It is impossible for GMPLS
not to offer QoS.

Moreover, as some people says they offer QoS with MPLS, they
should be using some prioritized queueing mechanisms, perhaps
not poor WFQ.


I should probably point out, also, that RSVP (or RSVP-TE) is not MPLS.


They are different, of course. But, GMPLS is to reserve bandwidth
resource. MPLS, in general, is to reserve label values, at least.


All MPLS can do is convey IPP or DSCP values as an EXP code point in the
core. I'm not sure how that creates a scaling problem within MPLS itself.


I didn't say scaling problem caused by QoS.

But, as you are avoiding to extensively use MPLS, I think you
are aware that extensive use of MPLS needs management of a
lot of labels, which does not scale.

Or, do I misunderstand something?


If I understand this correctly, would this be the IntServ QoS model?


No. IntServ specifies format to carry QoS specification in RSVP
packets without assuming any specific model of QoS.


I didn't attempt to standardize our result in IETF, partly
because optical packet switching was a lot more interesting.


Still is, even today :-)?


No. As experimental switches are working years ago and making
it work >10Tbps is not difficult (switching is easy, generating
10Tbps packets needs a lot of parallel equipment), there is little
remaining for research.

https://www.osapublishing.org/abstract.cfm?URI=OFC-2010-OWM4


Assuming a central controller (and its collocated or distributed
back up controllers), we don't need complicated protocols in
the network to maintain integrity of the entire network.


Well, that's a point of view, I suppose.

I still can't walk into a shop and "buy a controller". I don't know what
this controller thing is, 10 years on.


SDN, maybe. Though I'm not saying SDN scale, it should be no
worse than MPLS.


I can't say I've ever come across that scenario running MPLS since 2004.


I did some retrospective research.

   https://en.wikipedia.org/wiki/Multiprotocol_Label_Switching
   History
   1994: Toshiba presented Cell Switch Router (CSR) ideas to IETF BOF
   1996: Ipsilon, Cisco and IBM announced label switching plans
   1997: Formation of the IETF MPLS working group
   1999: First MPLS VPN (L3VPN) and TE deployments
   2000: MPLS traffic engineering
   2001: First MPLS Request for Comments (RFCs) released

as I was a co-chair of 1994 BOF and my knowledge on MPLS is
mostly on 1997 ID:

   https://tools.ietf.org/html/draft-ietf-mpls-arch-00

there seems to be a lot of terminology changes.

I'm saying that, if some failure occurs and IGP changes, a
lot of LSPs must be recomputed, which does not scale
if # of LSPs is large, especially in a large network
where IGP needs hierarchy (such as OSPF area).

Masataka Ohta



Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

2020-06-21 Thread Mark Tinka


On 20/Jun/20 17:12, Robert Raszuk wrote:

>
> MPLS is not flow driven. I sent some mail about it but perhaps it
> bounced. 
>
> MPLS LDP or L3VPNs was NEVER flow driven. 
>
> Since day one till today it was and still is purely destination based. 
>
> Transport is using LSP to egress PE (dst IP). 
>
> L3VPNs are using either per dst prefix, or per CE or per VRF labels.
> No implementation does anything upon "flow detection" - to prepare any
> nested labels. Even in FIBs all information is preprogrammed in
> hierarchical fashion well before any flow packet arrives.

If you really don't like LDP or RSVP-TE, you can statically assign
labels and manually configure FEC's across your entire backbone. If
trading state for administration is your thing, of course :-).

Mark.


Re: Hurricane Electric has reached 0 RPKI INVALIDs in our routing table

2020-06-21 Thread Radu-Adrian Feurdean
Hi,

On Thu, Jun 18, 2020, at 04:01, Jon Lewis wrote:
> 
> Just like I said, if you create an ROA for an aggregate, forgetting that 
> you have customers using subnets of that aggregate (or didn't create ROAs 
> for customer subnets with the right origin ASNs), you're literally telling 
> those using RPKI to verify routes "don't accept our customers' routes." 
> That might not be bad for "your network", but it's probably bad for 
> someone's.

That makes you a bad upstream operator, one that does things without 
understanding the consequences. This may still be the unfortunate norm, but 
it's by no means something to be considered an acceptable state.

Put otherwise : if you have downstream customers that you allow to announce 
part of your address space in the GRT, make sure you can still provide the 
service after doing changes (like RPKI signing).

Put in a yet another way : if you lease IP space (with or without 
connectivity), make sure all the additional services are included in a way or 
another. Those services should include RPKI signing and reverse DNS, and the 
strict minimum (only slightly better than not doing it at all) should be via 
"open a service ticket"; the more automated the better.


Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

2020-06-21 Thread Mark Tinka



On 20/Jun/20 17:08, Robert Raszuk wrote:

>  
>
> But with that let's not forget that aggregation here is still not
> spec-ed out well and to the best of my knowledge it is also not
> shipping yet. I recently proposed an idea how to aggregate SRGBs ..
> one vendor is analyzing it.

Hence why I think SR still needs time to grow up.

There are some things I can be maverick about. I don't think SR is it,
today.

Mark.


Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

2020-06-21 Thread Mark Tinka



On 20/Jun/20 15:39, Masataka Ohta wrote:

> Ipsilon was hopeless because, as Yakov correctly pointed out, flow
> driven approach to automatically detect flows does not scale.
>
> The problem of MPLS, however, is that, it must also be flow driven,
> because detailed route information at the destination is necessary
> to prepare nested labels at the source, which costs a lot and should
> be attempted only for detected flows.

Again, I think you are talking about what RSVP should have been.

RSVP != MPLS.


> Routing table at IPv4 backbone today needs at most 16M entries to be
> looked up by simple SRAM, which is as fast as MPLS look up, which is
> one of a reason why we should obsolete IPv6.

I'm not sure I should ask this in fear of taking this discussion way off
tangent... aaah, what the heck:

So if we can't assign hosts IPv4 anymore because it has run out, should
we obsolete IPv6 in favour of CGN? I know this works.


>
> Though resource reserved flows need their own routing table entries,
> they should be charged proportional to duration of the reservation,
> which can scale to afford the cost to have the entries.

RSVP failed to take off when it was designed.

Outside of capturing Netflow data (or tracking firewall state), nobody
really cares about handling flows at scale (no, I'm not talking about
ECMP).

Why would we want to do that in 2020 if we didn't in 2000?

Mark.


Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

2020-06-21 Thread Mark Tinka



On 20/Jun/20 01:32, Randy Bush wrote:


> there is saku's point of distributing labels in IGP TLVs/LSAs.  i
> suspect he is correct, but good luck getting that anywhere in the
> internet vendor task force.  and that tells us a lot about whether we
> can actually effect useful simplification and change.

This is shipping today with SR-MPLS.

Besides still being brand new and not yet fully field tested by the
community, my other concern is unless you are running a Juniper and have
the energy to pull a "Vijay Gill" and move your entire backbone to
IS-IS, you'll get either no SR-ISISv6 support, no SR-OSPFv3 support, or
both, with all the vendors.

Which brings me back to the same piss-poor attention LDPv6 is getting,
which is, really, poor attention to IPv6.

Kind of hard for operators to take IPv6 seriously at this level if the
vendors, themselves, aren't.


> is a significant part of the perception that there is a forwarding
> problem the result of the vendors, 25 years later, still not
> designing for v4/v6 parity?

I think the forwarding is fine, if you're carrying the payload in MPLS.

The problem is the control plane. It's not insurmountable; the vendors
just want to do less work.

The issue is IPv4 is gone, and trying to keep it around will only lead
to the creation of more hacks, which will further complicate the control
and data plane.


>
> there is the argument that switching MPLS is faster than IP; when the
> pressure points i see are more at routing (BGP/LDP/RSVP/whatever),
> recovery, and convergence.

Either way, the MPLS or IP problem already has an existing solution. If
you like IP, you can keep it. If you like MPLS, you can keep it.

So I'd be spending less time on the forwarding (of course, if there are
ways to improve that and someone has the time, why not), and as you say,
work on fixing the control plane and the signaling for efficiency and scale.


>
> did we really learn so little from IP routing that we need to
> recreate analogous complexity and fragility in the MPLS control
> plane?  ( sound of steam eminating from saku's ears :)

The path to SR-MPLS's inherent signaling carried in the IGP is an
optimum solution, that even I have been wanting since inception.

But, it's still too fresh, global deployment is terrible, and there is
still much to be learned about how it behaves outside of the lab.

For me, a graceful approach toward SR via LDPv6 makes sense. But, as
always, YMMV.


> and then there is buffering; which seems more serious than simple
> forwarding rate.  get it there faster so it can wait in a queue?  my
> principal impression of the Stanford/Google workshops was the parable
> of the blind men and the elephant.  though maybe Matt had the main
> point: given scaling 4x, Moore's law can not save us and it will all
> become paced protocols.  will we now have a decade+ of BBR evolution
> and tuning?  if so, how do we engineer our networks for that?

This deserves a lot more attention than it's receiving. The problem is
it doesn't sound sexy enough to compile into a PPT that you can project
to suits whom you need to part with cash.

It doesn't have that 5G or SRv6 or Controller or IoT ring to it :-).

It's been a while since vendors that control a large portion of the
market paid real attention to their geeky side. The buffer problem, for
me, would fall into that category. Maybe a smaller, more agile, more
geeky start-up, can take the lead with this one.


> and up 10,000m, we watch vendor software engineers hand crafting in
> an assembler language with if/then/case/for, and running a chain of
> checking software to look for horrors in their assembler programs.
> it's the bleeping 21st century.  why are the protocol specs not
> formal and verified, and the code formally generated and verified?
> and don't give me too slow given that the hardware folk seem to be
> able to do 10x in the time it takes to run valgrind a few dozen
> times.

And for today's episode of Jeopardy:

    "What used to be the IETF?"


> we're extracting ore with hammers and chisels, and then hammering it
> into shiny objects rather than safe and securable network design and
> construction tools.

Rush it out the factory, fast, even though it's not ready. Get all their
money before they board the ship and sail for Mars.

Mark.



Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Mark Tinka


On 20/Jun/20 22:00, Baldur Norddahl wrote:

>
> I can't speak for the year 2000 as I was not doing networking at this
> level at that time. But when I check the specs for the base mx204 it
> says something like 32 VRFs, 2 million routes in FIB and 6 million
> routes in RIB. Clearly those numbers are the total of routes across
> all VRFs otherwise you arrive at silly numbers (64 million FIB if you
> multiply, 128k FIB if you divide by 32). My conclusion is that scale
> wise you are ok as long you do not try to have more than one VRF with
> a complete copy of the DFZ.

I recall a number of networks holding multiple VRF's, including at least
2x Internet VRF's, for numerous use-cases. I don't know if they still do
that today, but one can get creative real quick :-).


>
> More worrying is that 2 million routes will soon not be enough to
> install all routes with a backup route, invalidating BGP FRR.

I have a niggling feeling this will be solved before we get there.

Now, whether we can afford it is a whole other matter.

Mark.


Re: Devil's Advocate - Segment Routing, Why?

2020-06-21 Thread Mark Tinka



On 21/Jun/20 00:54, Sabri Berisha wrote:

> That will be very advantageous in a datacenter environment, or any other
> environment dealing with a lot of ECMP paths. 
>
> I can't tell you how often during my eBay time I've been troubleshooting
> end-to-end packetloss between hosts in two datacenters where there were at 
> least
> 10 or more layers of up to 16 way ECMP between them. Having a record of which
> path is being taken by a packet is very helpful to determine the one with a 
> crappy
> transceiver.
>
> That work is already underway, albeit not specifically for MPLS. For example,
> I've worked with an experimental version of In-Band Network Telemetry (INT)
> as described in this draft: 
> https://tools.ietf.org/html/draft-kumar-ippm-ifa-02
>
> I even demonstrated a very basic implementatoin during SuperCompute 19 in 
> Denver
> last year. Most people who were interested in the demo were academics however,
> probably because it wasn't a real networking event.
>
> Note that there are several caveats that come with this draft and previous
> versions, and that it is still very much work in progress. But the potential 
> is
> huge, at least in the DC.

Alright, we'll wait and see, then.



> That's a different story, but not entirely impossible. A probe packet can
> be sent across AS borders, and as long as the two NOCs are cooperating, the
> entire path can be reconstructed.

Yes, for once-off troubleshooting, I suppose that would work.

My concern is if it's for normal day-to-day operations. But who knows,
maybe someone will propose that too :-).

Mark.


Re: 60 ms cross-continent

2020-06-21 Thread Saku Ytti
On Sat, 20 Jun 2020 at 23:14, Bryan Fields  wrote:

> I think he might be referring to the newer modulation types (QAM) on long haul
> transport.  There's quite a bit of time in uS that the encoding takes into QAM
> and adding FEC.  You typically won't see this at the plug-able level between
> switches and stuff.

FEC is low tens of meters (i.e. low tens of nanoseconds), QAM is less.
Won't impact the pipeline or NPU scenarios meaningfully, will impact
the low latency scenario.

-- 
  ++ytti