NTT engineer in the wings?
If there is someone listening from NTT engineering, would you kindly write back? The IP NOC is unable to locate anyone because it’s Sunday so I thought I might try here. Thanks! J~
Re: Linux BNG
On 07/15/2018 10:56 AM, Denys Fedoryshchenko wrote: On 2018-07-15 19:00, Raymond Burkholder wrote: On 07/15/2018 09:03 AM, Denys Fedoryshchenko wrote: On 2018-07-14 22:05, Baldur Norddahl wrote: About OVS, i didnt looked much at it, as i thought it is not suitable for BNG purposes, like for tens of thousands users termination, i thought it is more about high speed switching for tens of VM. I would call it more of a generic all purpose tool for customized L2/L3/L4/L5 packet forwarding. It works well for datacenter as well as ISP related scenarios. Due to the wide variety of rule matching, encapsulations supported, and the ability to attach a customized controller for specialized packet handling. On edge based translations, is hardware based forwarding actually necessary, since there are so many software functions being performed anyway? IMO at current moment 20-40G on single box is a boundary point when packet forwarding is preferable(but still not necessary) to do in hardware, as passing packets thru whole Linux stack is really not best option. But it works. I'm trying to find an alternative solution, bypassing full stack using XDP, so i can go beyond 40G. Tied to XDP is eBPF (which is what makes tcpdump fast). Another tool is P4 which provides tools to build customized SW/HW forwarders. But I'm not sure how applicable it is to BNG. -- Raymond Burkholder r...@oneunified.net https://blog.raymond.burkholder.net
Re: Linux BNG
søn. 15. jul. 2018 18.57 skrev Denys Fedoryshchenko : > > Openflow IMO by nature is built to do complex matching, and for example > for > typical 12-tuple it is 750-4000 entries max in switches, but you go to > l2 only matching > which was possible at moment i tested, on my experience, only on PF5820 > - you can do L2 > entries only matching, then it can go 80k flows. > But again, sticking to specific vendor is not recommended. > It would be possible to implement a general forward to controller policy and then upload matching on MAC address only as a offload strategy. You would have a different device doing the layer 3 stuff. The OpenFlow switch just adds and removes vlan tagging based on MAC matching. Regards
Re: Linux BNG
On 2018-07-15 19:00, Raymond Burkholder wrote: On 07/15/2018 09:03 AM, Denys Fedoryshchenko wrote: On 2018-07-14 22:05, Baldur Norddahl wrote: I have considered OpenFlow and might do that. We have OpenFlow capable switches and I may be able to offload the work to the switch hardware. But I also consider this solution harder to get right than the idea of using Linux with tap devices. Also it appears the Openvswitch implements a different flavour of OpenFlow than the hardware switch (the hardware is limited to some fixed tables that Broadcom made up), so I might not be able to start with the software and then move on to hardware. AFAIK openflow is suitable for datacenters, but doesn't scale well for users termination purposes. You will run out from TCAM much sooner than you expect. Denys, could you expand on this? In a linux based solution (say with OVS), TCAM is memory/software based, and in following their dev threads, they have been optimizing flow caches continuously for various types of flows: megaflow, tiny flows, flow quantity and variety, caching, ... When you mate OVS with something like a Mellanox Spectrum switch (via SwitchDev) for hardware based forwarding, I could see certain hardware limitations applying, but don't have first hand experience with that. But I suppose you will see these TCAM issues on hardware only specialized openflow switches. Yes, definitely only on hardware switches and biggest issue it is vendor+hardware dependent. This means if i find "right" switch, and make your solution depending on it, and vendor decided to issue new revision, or even new firmware, there is no guarantee "unusual" setup will keep working. That what makes many people afraid to use it. Openflow IMO by nature is built to do complex matching, and for example for typical 12-tuple it is 750-4000 entries max in switches, but you go to l2 only matching which was possible at moment i tested, on my experience, only on PF5820 - you can do L2 entries only matching, then it can go 80k flows. But again, sticking to specific vendor is not recommended. About OVS, i didnt looked much at it, as i thought it is not suitable for BNG purposes, like for tens of thousands users termination, i thought it is more about high speed switching for tens of VM. On edge based translations, is hardware based forwarding actually necessary, since there are so many software functions being performed anyway? IMO at current moment 20-40G on single box is a boundary point when packet forwarding is preferable(but still not necessary) to do in hardware, as passing packets thru whole Linux stack is really not best option. But it works. I'm trying to find an alternative solution, bypassing full stack using XDP, so i can go beyond 40G. But then, it may be conceivable that by buying a number of servers, and load spreading across the servers will provide some resiliency and will come in at a lower cost than putting in 'big iron' anyway. Because then there are some additional benefits: you can run Network Function Virtualization at the edge and provide additional services to customers. +1 For IPoE/PPPoE - servers scale very well, while on "hardware" eventually you will hit a limit how many line cards you can put in chassis and then you need to buy new chassis. I am not talking that some chassis have countless unobvious limitations you might hit inside chassis (in pretty old Cisco 6500/7600, which is not EOL, it is a nightmare). If ISP have big enough chassis, he need to remember, that he need second one at same place, and preferable with same amount of line cards, while with servers you are more reliable even with N+M(where M for example N/4) redundancy. Also when premium customers ask me for some unusual things, it is much easier to move them to separate nodes with extended options for termination, where i can implement their demands over custom vCPE.
Re: Linux BNG
Den 15/07/2018 kl. 18.00 skrev Raymond Burkholder: But I think a clarification on Baldur's speed requirements is needed. He indicates that there are a bunch of locations: do each of the locations require 10G throughput, or was the throughput defined for all sites in aggregate? If the sites indivdiually have smaller throughput, the software based boxes might do, but if that is at each site, then software-only boxes may not handle the throughput. We have considerably more than 10G of total traffic. We are currently transporting it all to one of two locations before doing the BNG function. We then have VRRP to enable failover to the other location. Transport is by MPLS and L2VPN. I set the goal post at 10G per server. To handle more traffic we will have multiple servers. Load balance does not need to be dynamic. We would just distribute the customers so each customer is always handled by the same server. 10G per server translates to approximately 5000 customers per server (2018 - this number is expected to drop as time goes). I am wondering if we could make an open source system (does not strictly have to be Linux) that could do the BNG function at 10G per server, with a server in the price range of 1k - 2k USD. For many sizes of ISP this would be far far cheaper than any of the solutions from Cisco, Juniper et al. Even if you had to get 10 servers to handle 100G you would likely still come out ahead of the big iron solution. And for a startup (like us) it is great to be able to start out with little investment and then let the solution grow with the business. Regards, Baldur
Re: Linux BNG
On 07/15/2018 09:03 AM, Denys Fedoryshchenko wrote: On 2018-07-14 22:05, Baldur Norddahl wrote: I have considered OpenFlow and might do that. We have OpenFlow capable switches and I may be able to offload the work to the switch hardware. But I also consider this solution harder to get right than the idea of using Linux with tap devices. Also it appears the Openvswitch implements a different flavour of OpenFlow than the hardware switch (the hardware is limited to some fixed tables that Broadcom made up), so I might not be able to start with the software and then move on to hardware. AFAIK openflow is suitable for datacenters, but doesn't scale well for users termination purposes. You will run out from TCAM much sooner than you expect. Denys, could you expand on this? In a linux based solution (say with OVS), TCAM is memory/software based, and in following their dev threads, they have been optimizing flow caches continuously for various types of flows: megaflow, tiny flows, flow quantity and variety, caching, ... When you mate OVS with something like a Mellanox Spectrum switch (via SwitchDev) for hardware based forwarding, I could see certain hardware limitations applying, but don't have first hand experience with that. But I suppose you will see these TCAM issues on hardware only specialized openflow switches. On edge based translations, is hardware based forwarding actually necessary, since there are so many software functions being performed anyway? But I think a clarification on Baldur's speed requirements is needed. He indicates that there are a bunch of locations: do each of the locations require 10G throughput, or was the throughput defined for all sites in aggregate? If the sites indivdiually have smaller throughput, the software based boxes might do, but if that is at each site, then software-only boxes may not handle the throughput. But then, it may be conceivable that by buying a number of servers, and load spreading across the servers will provide some resiliency and will come in at a lower cost than putting in 'big iron' anyway. Because then there are some additional benefits: you can run Network Function Virtualization at the edge and provide additional services to customers. I forgot to mention this in the earlier thread, but there are some companies out there which provide devices with many ports on them and provide compute at the same time. So software based Linux switches are possible, with out reverting to a combination of physical switch and separate compute box. In a Linux based switch, by using IRQ affiinity, traffic from ports can be balanced across CPUs. So by collapsing switch and compute, additional savings might be able to be realized. As a couple of side notes: 1) the DPDK people support a user space dataplane version of OVS/OpenFlow, and 2) an eBPF version of the OVS dataplane is being worked on. In summary, OVS supports three current dataplanes with a fourth on the way. 1) native kernel, 2) hardware offload via TC (SwitchDev), 3) DPDK, 4) eBPF. Linux tap device has very high overhead, it suits no more than working as some hotspot gateway for 100s of users. As does the 'veth' construct. -- Raymond Burkholder r...@oneunified.net https://blog.raymond.burkholder.net
Re: Linux BNG
Hi Baldur, Based on the information you provided, CPE connects to the POI via different service provider (access network provider / middle man) before it reaches your network/POP. With this construct, you are typically responsible for IP allocation and session authentication via DHCP (option 82) with AAA or via Radius for PPPoE. You may also have to deal with the S-TAG and C-TAG at BNG level. Here are some options to consider; *Option 1.* Use Radius for session authentication and IP/DNS allocation to CPE. You can configure BBA-GROUP on the BNG to overcome the 409x vlan limitation as well as the S-TAG and C-TAG. BBA-GROUP can handle multiple session. BBA-GROUP is also a well-supported feature. Here is an example of the config for your BNG (Cisco router) ; === *bba-group pppoe NAME -1* virtual-template 1 sessions per-mac limit 2 ! *bba-group pppoe NAME -2* virtual-template 2 sessions per-mac limit 2 ! interface GigabitEthernet1/3.100 * encapsulation dot1Q 100 second-dot1q 500-4094* no ip redirects no ip unreachables no ip proxy-arp ip flow ingress ip flow egress ip multicast boundary 30 *pppoe enable group NAME -1* no cdp enable ! interface GigabitEthernet1/3.200 encapsulation dot1Q 200 second-dot1q 200-300 no ip redirects no ip unreachables no ip proxy-arp ip flow ingress ip flow egress ip multicast boundary 30 *pppoe enable group NAME -2* no cdp enable Configure Virtual templates too. === *Option 2.* You can deploy a DHCP server using DHCP option 82 to handle all IP or IPoE sessions. DHCP option 82 provides you with additional flexibility that can scale as your customer base grows. You can perform authentication using a combination of CircuitID, RemoteID or CPE MAC-ADD etc. I hope this information helps. Cheers, Ahad On Sat, Jul 14, 2018 at 10:13 PM, Baldur Norddahl wrote: > Hello > > I am investigating Linux as a BNG. The BNG (Broadband Network Gateway) > being the thing that acts as default gateway for our customers. > > The setup is one VLAN per customer. Because 4095 VLANs is not enough, we > have QinQ with double VLAN tagging on the customers. The customers can use > DHCP or static configuration. DHCP packets need to be option82 tagged and > forwarded to a DHCP server. Every customer has one or more static IP > addresses. > > IPv4 subnets need to be shared among multiple customers to conserve > address space. We are currently using /26 IPv4 subnets with 60 customers > sharing the same default gateway and netmask. In Linux terms this means 60 > VLAN interfaces per bridge interface. > > However Linux is not quite ready for the task. The primary problem being > that the system does not scale to thousands of VLAN interfaces. > > We do not want customers to be able to send non routed packets directly to > each other (needs proxy arp). Also customers should not be able to steal > another customers IP address. We want to hard code the relation between IP > address and VLAN tagging. This can be implemented using ebtables, but we > are unsure that it could scale to thousands of customers. > > I am considering writing a small program or kernel module. This would > create two TAP devices (tap0 and tap1). Traffic received on tap0 with VLAN > tagging, will be stripped of VLAN tagging and delivered on tap1. Traffic > received on tap1 without VLAN tagging, will be tagged according to a lookup > table using the destination IP address and then delivered on tap0. ARP and > DHCP would need some special handling. > > This would be completely stateless for the IPv4 implementation. The IPv6 > implementation would be harder, because Link Local addressing needs to be > supported and that can not be stateless. The customer CPE will make up its > own Link Local address based on its MAC address and we do not know what > that is in advance. > > The goal is to support traffic of minimum of 10 Gbit/s per server. Ideally > I would have a server with 4x 10 Gbit/s interfaces combined into two 20 > Gbit/s channels using bonding (LACP). One channel each for upstream and > downstream (customer facing). The upstream would be layer 3 untagged and > routed traffic to our transit routers. > > I am looking for comments, ideas or alternatives. Right now I am > considering what kind of CPU would be best for this. Unless I take steps to > mitigate, the workload would probably go to one CPU core only and be > limited to things like CPU cache and PCI bus bandwidth. > > Regards, > > Baldur > > -- Regards, Ahad Swiftel Networks "*Where the best is good enough*"
Re: Linux BNG
On 2018-07-14 22:05, Baldur Norddahl wrote: I have considered OpenFlow and might do that. We have OpenFlow capable switches and I may be able to offload the work to the switch hardware. But I also consider this solution harder to get right than the idea of using Linux with tap devices. Also it appears the Openvswitch implements a different flavour of OpenFlow than the hardware switch (the hardware is limited to some fixed tables that Broadcom made up), so I might not be able to start with the software and then move on to hardware. AFAIK openflow is suitable for datacenters, but doesn't scale well for users termination purposes. You will run out from TCAM much sooner than you expect. Linux tap device has very high overhead, it suits no more than working as some hotspot gateway for 100s of users. Regards, Baldur
Re: Linux BNG
On 2018-07-15 06:09, Jérôme Nicolle wrote: Hi Baldur, Le 14/07/2018 à 14:13, Baldur Norddahl a écrit : I am investigating Linux as a BNG As we say in France, it's like your trying to buttfuck flies (a local saying standing for "reinventing the wheel for no practical reason"). You can say that about whole opensource ecosystem, why to bother, if *proprietary solution name* exists. It is endless flamewar topic. Linux' kernel networking stack is not made for this kind of job. 6WIND or fd.io may be right on the spot, but it's still a lot of dark magic for something that has been done over and over for the past 20 years by most vendors. And it just works. Linux developers are working continuously to improve this, for example latest feature, XDP, able to process several Mpps on <$1000 server. Ask yourself, why Cloudflare "buttfuck flies" and doesn't buy some proprietary vendor who 20 years does filtering in hardware? https://blog.cloudflare.com/how-to-drop-10-million-packets/ I am doing experiments with XDP as well, to terminate PPPoE, and it is doing that quite well over XDP. DHCP (implying straight L2 from the CPE to the BNG) may be an option bust most codebases are still young. PPP, on the other hand, is field-tested for extremely large scale deployments with most vendors. DHCP here, at least from RFC 2131 existence in March 1997. Quite old, isn't it? When you stick to PPPoE, you tie yourself with necessary layers of encapsulation/decapsulation, and this is seriously degrading performance at _user_ level at least. With some development experience of firmware for routers, i can tell that hardware offloading of ipv4 routing (DHCP) obviousl is much easier and cheaper, than offloading PPPoE encap/decap+ipv4 routing. Also Vendors keep screwing up their routers with PPP, and for example one of them failed processing properly PADO in newest firmware revision. Another problem, with PPPoE you subscribe to headache called reduced mtu, that also will give a lot of unpleasant hours for ISP support. If I were in you shooes, and I don't say I'd want to (my BNGs are scaled to less than a few thousand of subscribers, with 1-4 concurrent session each), I'd stick to plain old bitstream (PPP) model, with a decent subscriber framework on my BNGs (I mostly use Juniper MXs, but I also like Nokia's and Cisco's for some features). I am consulting operators from few hundreds to hundreds of thousands. It is very rare, when Linux bng doesn't suit them. But let's say we would want to go forward and ditch legacy / proprietary code to surf on the NFV bullshit-wave. What would you actually need ? Linux does soft-recirculation at every encapsulation level by memory copy. You can't scale anything with that. You need to streamline decapsulation with 6wind's turborouter or fd.io frameworks. It'll cost you a few thousand of man-hours to implement your first prototype. 6wind/fd.io is great solutions, but not suitable for mentioned task. They are mostly created for very tailor made tasks or even as core of some vendor solution. Implementing your BNG based on such frameworks, or DPDK, is really reinventing the wheel, unless you will sell it or can save by that millions of US$. Let's say you got a woking framework to treat subsequent headers on the fly (because decapsulation is not really needed, what you want is just to forward the payload, right ?)… Well, you'd need to address provisionning protocols on the same layers. Who would want to rebase a DHCP server with alien packet forms incoming ? I gess no one. accel-ppp does all that and exactly for IPoE termination, and no black magic there. Well, I could dissert on the topic for hours, because I've already spent months to address such design issues in scalable ISP networks, and the conclusion is : - PPPoE is simple and proven. Its rigid structure alleviates most of the dual-stack issues. It is well supported and largelly deployed. PPPoE has VERY serious flaws. 1)Security of PPPoE sucks big time. Anybody who run rogue PPPoE server in your network will create significant headache for you, while with DHCP you have at least "DHCP snooping". DHCP snooping supported in very many vendors switches, while for PPPoE most of them have nothing, except... you stick each user to his own vlan. Why to pppox them then? 2)DHCP can send some circuit information in Option 82, this is very useful for billing and very cost efficient on last stage of access switches. 3)Modern FTTX(GPON) solutions are built with QinQ in mind, so IPoE fits there flawlessly. - DHCP requires hacks (in the form of undocummented options from several vendors) to seemingly work on IPv4, but the multicast boundaries for NDP are a PITA to handle, so no one implemented that properly yet. So it is to avoid for now. While you can do multicast(mostly for IPTV, yes it is not easy, and need some vendor magic on "native" layer (DHCP), with PPP you can forget about multicast entirely. -
Re: Linux BNG
Hi Baldur, These guys made a PPPoE client for VPP - you could probably extend that into a PPP server: https://lists.fd.io/g/vpp-dev/message/9181 https://github.com/raydonetworks/vpp-pppoeclient Although, I would agree that deploying PPP now is a bit of a step backwards and IPoE is the way to be doing this in 2018. If you want subscribers with a S-TAG/C-TAG landing in unique virtual interfaces with shared gateway etc, IPv4 + IPv6 (DHCP/v6) and were deploying this on "real service provider networking kit" [1] then the way to do this is with pseudowire headend termination. (PWHE/PWHT). However, you're going to struggle to implement something like PWHT on the native Linux networking stack. Many of the features you want exist in Linux like DHCP/v6, IPv4/6, MPLS, LDP, pseudowires etc, but not all together as a combined service offering. My two pence would be to buy kit from some like Cisco or Juniper as I don't think the open source world is quite there yet. Alternatively if it *must* be Linux look at adding the code to https://wiki.fd.io/view/VPP/Features as it has all constituent parts (DHCP, IP, MPLS, bridges etc.) but not glued together. VPP is an order of magnitude faster than the native Kernel networking stack. I'd be shocked if you did all that you wanted to do at 10Gbps line rate with one CPU core. Cheers, James. [1] Which means the expensive stuff big name vendors like Cisco and Juniper sell
Re: (perhaps off topic, but) Microwave Towers
I was going to say... in my experience (I've been to a lot of the Arizona electronics sites, having grown up around broadcasting) that most of the microwave equipment in use was for Bell. That was by far the most populous tower on any mountain top. The broadcasters don't send their signals anywhere but either from downtown to the transmiter or in some cases from the big town to a small town to feed a local low power transmitter (like 5kw VHF as opposed to the normal 100kw). Anything else was Satelite. I know the railroad did some wireless (Sprint's towers were also quite densely packed with directional horns) but a lot of their communication for rail signaling was hardwire as far as I was aware. -Wayne On Sat, Jul 14, 2018 at 12:20:34PM -0500, frnk...@iname.com wrote: > Is it possibly AT&T's old network? > https://99percentinvisible.org/article/vintage-skynet-atts-abandoned-long-lines-microwave-tower-network/ > http://long-lines.net/places-routes/ > > This network runs through our service territory, too. The horns are > distinctive. > > Frank > > -Original Message- > From: NANOG On Behalf Of Miles Fidelman > Sent: Saturday, July 14, 2018 9:54 AM > To: nanog@nanog.org > Subject: (perhaps off topic, but) Microwave Towers > > Hi Folks, > > I find myself driving down Route 66. On our way through Arizona, I was > surprised by what look like a lot of old-style microwave links. They > pretty much follow the East-West rail line - where I'd expect there's a > lot of fiber buried. > > Struck me as somewhat interesting. > > It also struck me that folks here might have some comments. > > Miles Fidelman > > -- > In theory, there is no difference between theory and practice. > In practice, there is. Yogi Berra > > > --- Wayne Bouchard w...@typo.org Network Dude http://www.typo.org/~web/