Re: Getting a handle on all these new NIC features
On Fri, Jan 20, 2017 at 8:36 AM, Martin Habets wrote: > Hi Tom, > > On 17/01/17 22:05, Tom Herbert wrote: >> There was some discussion about the problems of dealing with the >> explosion of NIC features in the mlx directory restructuring proposal, >> but I think the is a deeper issue here that should be discussed. >> >> It's hard not to notice that there has been quite a proliferation of >> NIC features in several drivers. This trend had resulted in very >> complex driver code that may or may not segment individual features. >> One visible manifestation of this is number of ndo functions which is >> somewhere around seventy-five now. >> >> I suspect the vast majority of these advances NIC features (e.g. >> bridging, UDP offloads, tc offload, etc.) are only relevant to some of >> the people some of the time. The problem we have, in this case those >> of us that are attempting to deploy and maintain NICs at scale, is >> when we have to deal with the ramifications of these features being >> intertwined with core driver functionality that is relevant to >> everyone. This becomes very obvious when we need to backport drivers >> from later versions of kernel. >> >> I realize that backports of a driver is not a specific concern of the >> Linux kernel, but nevertheless this is a real problem and a fact of >> life for many users. Rebasing the full kernel is still a major effort >> and it seems the best we could ever do is one rebase per year. In the >> interim we need to occasionally backport drivers. Backporting drivers >> is difficult precisely because of new features or API changes to >> existing ones. These sort of changes tend to have a spiderweb of >> dependencies in other parts of the stack so that the number of patches >> we need to cherry-pick goes way beyond those that touch the driver we >> are interested in. > > For the sfc driver (Solarflare Adapters) we currently do backports internally > for: > - RedHat Enterprise Linux5.10, 5.11 > - RedHat Enterprise Linux6.5, 6.6, 6.7, 6.8 >- Redhat Messaging Realtime and Grid 2.5 > - RedHat Enterprise Linux7.0, 7.1, 7.2 >- RedHat Enterprise Linux for Realtime 7.1, 7.2 > - SuSE Linux Enterprise Server 11sp3, sp4 >- SuSE Linux Enterprise RealTime Extension 11 > - SuSE Linux Enterprise Server 12base release, sp1 > - Canonical Ubuntu Server LTS14.04, 16.04 > - Canonical Ubuntu Server- > - Debian 7 "Wheezy" 7.X > - Debian 8 "Jessie" 8.X > - Linux 2.6.18 to 4.9-rc1 > > We update this list as needed, and always try to support the latest kernel. > I do not know if that would cover the kernel version you are using. > That really doesn't help us. We don't base which kernels we run in datacenters on what distros are doing-- they don't seem to move as fast in rebsing. Our general request is that vendors always do their development upstream, if we need to do a backport in our kernel then we take responsibility for that. As I mentioned, the churn and lack of modularization seem to be making this process more and more difficult. Tom > Best regards, > Martin > >> Currently we (FB) need to backport two NIC drivers. I've already gave >> details of backporting mlx5 on the thread to restructure the driver >> directories. The other driver being backporting seems to suffer from >> the same type of feature complexity. >> >> In short, I would like to ask if driver maintainers to start to >> modularize driver features. If something being added is obviously a >> narrow feature that only a subset of users will need can we allow >> config options to #ifdef those out somehow? Furthermore can the file >> and directory structure of drivers reflect that; our lives would be >> _so_ much simpler to maintain drivers in production if we have such >> modularity and the ability to build drivers with the features of our >> choosing. >> >> Thanks, >> Tom
Re: Getting a handle on all these new NIC features
Hi Tom, On 17/01/17 22:05, Tom Herbert wrote: > There was some discussion about the problems of dealing with the > explosion of NIC features in the mlx directory restructuring proposal, > but I think the is a deeper issue here that should be discussed. > > It's hard not to notice that there has been quite a proliferation of > NIC features in several drivers. This trend had resulted in very > complex driver code that may or may not segment individual features. > One visible manifestation of this is number of ndo functions which is > somewhere around seventy-five now. > > I suspect the vast majority of these advances NIC features (e.g. > bridging, UDP offloads, tc offload, etc.) are only relevant to some of > the people some of the time. The problem we have, in this case those > of us that are attempting to deploy and maintain NICs at scale, is > when we have to deal with the ramifications of these features being > intertwined with core driver functionality that is relevant to > everyone. This becomes very obvious when we need to backport drivers > from later versions of kernel. > > I realize that backports of a driver is not a specific concern of the > Linux kernel, but nevertheless this is a real problem and a fact of > life for many users. Rebasing the full kernel is still a major effort > and it seems the best we could ever do is one rebase per year. In the > interim we need to occasionally backport drivers. Backporting drivers > is difficult precisely because of new features or API changes to > existing ones. These sort of changes tend to have a spiderweb of > dependencies in other parts of the stack so that the number of patches > we need to cherry-pick goes way beyond those that touch the driver we > are interested in. For the sfc driver (Solarflare Adapters) we currently do backports internally for: - RedHat Enterprise Linux5.10, 5.11 - RedHat Enterprise Linux6.5, 6.6, 6.7, 6.8 - Redhat Messaging Realtime and Grid 2.5 - RedHat Enterprise Linux7.0, 7.1, 7.2 - RedHat Enterprise Linux for Realtime 7.1, 7.2 - SuSE Linux Enterprise Server 11sp3, sp4 - SuSE Linux Enterprise RealTime Extension 11 - SuSE Linux Enterprise Server 12base release, sp1 - Canonical Ubuntu Server LTS14.04, 16.04 - Canonical Ubuntu Server- - Debian 7 "Wheezy" 7.X - Debian 8 "Jessie" 8.X - Linux 2.6.18 to 4.9-rc1 We update this list as needed, and always try to support the latest kernel. I do not know if that would cover the kernel version you are using. Best regards, Martin > Currently we (FB) need to backport two NIC drivers. I've already gave > details of backporting mlx5 on the thread to restructure the driver > directories. The other driver being backporting seems to suffer from > the same type of feature complexity. > > In short, I would like to ask if driver maintainers to start to > modularize driver features. If something being added is obviously a > narrow feature that only a subset of users will need can we allow > config options to #ifdef those out somehow? Furthermore can the file > and directory structure of drivers reflect that; our lives would be > _so_ much simpler to maintain drivers in production if we have such > modularity and the ability to build drivers with the features of our > choosing. > > Thanks, > Tom
Re: Getting a handle on all these new NIC features
On 01/17/2017 02:05 PM, Tom Herbert wrote: > I realize that backports of a driver is not a specific concern of the > Linux kernel, but nevertheless this is a real problem and a fact of > life for many users. Rebasing the full kernel is still a major effort > and it seems the best we could ever do is one rebase per year. In the > interim we need to occasionally backport drivers. Backporting drivers > is difficult precisely because of new features or API changes to > existing ones. These sort of changes tend to have a spiderweb of > dependencies in other parts of the stack so that the number of patches > we need to cherry-pick goes way beyond those that touch the driver we > are interested in. backports (formerly known as compat-wireless) dealt with that problem by pulling in all dependencies from the networking stack (and beyond ), this allowed people with a need to stay on a particular kernel version to get the newest and latest networking bits and drivers with minor disruption to other parts of the kernel. The project now seems to be largely dead, but could be revived I presume: https://backports.wiki.kernel.org/index.php/Main_Page > > In short, I would like to ask if driver maintainers to start to > modularize driver features. If something being added is obviously a > narrow feature that only a subset of users will need can we allow > config options to #ifdef those out somehow? Multiplying the number if #ifdef means that every config option is going to be turned on by Linux distributions, and most likely just a subset will be turned by specific kernel configurations (like yours), but all in all, this multiplies the number of build combinations to a point where this may not be manageable for an upstream driver and some combinations won't be tested properly except by whoever diverges from these. I understand the concern of modularizing and having clean independent features/modules, I am unsure that more configuration options is necessarily right approach. Slightly tangential, once a series of patches lands in a given maintainers' tree, it is very hard to match a given commit with its original submission and say, locate the 11 other patches out of this 12 patch series adding feature XYZ of interest. David does a great job a putting submissions in a branch, which helps a lot, but in general, there is not enough information in git to associate a given patch with its companion patches within a series, hence making backporting harder IMHO. -- Florian
Re: Getting a handle on all these new NIC features
On Wed, Jan 18, 2017 at 12:05 AM, Tom Herbert wrote: > There was some discussion about the problems of dealing with the > explosion of NIC features in the mlx directory restructuring proposal, > but I think the is a deeper issue here that should be discussed. > > It's hard not to notice that there has been quite a proliferation of > NIC features in several drivers. This trend had resulted in very > complex driver code that may or may not segment individual features. > One visible manifestation of this is number of ndo functions which is > somewhere around seventy-five now. > > I suspect the vast majority of these advances NIC features (e.g. > bridging, UDP offloads, tc offload, etc.) are only relevant to some of > the people some of the time. The problem we have, in this case those > of us that are attempting to deploy and maintain NICs at scale, is > when we have to deal with the ramifications of these features being > intertwined with core driver functionality that is relevant to > everyone. This becomes very obvious when we need to backport drivers > from later versions of kernel. > > I realize that backports of a driver is not a specific concern of the > Linux kernel, but nevertheless this is a real problem and a fact of > life for many users. Rebasing the full kernel is still a major effort > and it seems the best we could ever do is one rebase per year. In the > interim we need to occasionally backport drivers. Backporting drivers > is difficult precisely because of new features or API changes to > existing ones. These sort of changes tend to have a spiderweb of > dependencies in other parts of the stack so that the number of patches > we need to cherry-pick goes way beyond those that touch the driver we > are interested in. > I think backporting is not the only concern here, the other main issue is a pure software design related that cannot just be ignored, device drivers are getting smarter and are doing lots of offloads and logic, they are not as thin as they used to be, which is also a justification for why we should take a second (stop coding for a while :-) ) and give this issue some attention. > Currently we (FB) need to backport two NIC drivers. I've already gave > details of backporting mlx5 on the thread to restructure the driver > directories. The other driver being backporting seems to suffer from > the same type of feature complexity. > Can you share some more about the most complex stuff you faced while backporting? What would have made it simpler if we designed the driver differently ? > In short, I would like to ask if driver maintainers to start to > modularize driver features. If something being added is obviously a > narrow feature that only a subset of users will need can we allow > config options to #ifdef those out somehow? Furthermore can the file > and directory structure of drivers reflect that; our lives would be > _so_ much simpler to maintain drivers in production if we have such > modularity and the ability to build drivers with the features of our > choosing. > Before we do this or define the plan, there are some questions to be asked: 1. Can we allow ourselves to have kconfig or even an internal compilation flag per device driver feature ? 2. What about previous features ? i mean in order to have a clean and clear way to do have this isolation for new features, some kind of restructuring or core reorganizing is required, it is ugly to have driver with a hybrid structuring. 3. in case if we decide to do a restructuring phase as we suggested in the mlx5 patch, what is the plan for older kernels who still backport fixes to the previous structure. 4. What is the concrete plan ? is there a design reference or guidelines known to someone that every one can follow ? Anyway I would like to contribute some thoughts and design techniques to achieve this moularization and features isolation by design ( at least for new features): Device initialization and netdev registration: - most of the device drivers have main.c which handles driver initialization and netdev registration. - but today this file provide much more than the above. - I suggest to keep it as thin as possible and dedicated to what it should do. - keep HAL (Hardware Abstraction Layer) separated from main.c and main should call entry points exposed by the HAL layer. - basic netdev features RX/TX and most basic ndos for basic Ethernet functionality can still be in main.c - Advanced features (eswitch,TC offloads, vxlan and tunneling offloads, XDP, etc..) such features can go to separate file(s) with full logic implementation and clear code locality wrapped by #ifdef compilation or kconfig flag to have easy control on them and to give the reviewer/developer a chance to logically understand the code and distinguish between the different features by looking at the Makefile or the c file including those features. ( just keep the feature logic out of main.c) I've been partially followi
Getting a handle on all these new NIC features
There was some discussion about the problems of dealing with the explosion of NIC features in the mlx directory restructuring proposal, but I think the is a deeper issue here that should be discussed. It's hard not to notice that there has been quite a proliferation of NIC features in several drivers. This trend had resulted in very complex driver code that may or may not segment individual features. One visible manifestation of this is number of ndo functions which is somewhere around seventy-five now. I suspect the vast majority of these advances NIC features (e.g. bridging, UDP offloads, tc offload, etc.) are only relevant to some of the people some of the time. The problem we have, in this case those of us that are attempting to deploy and maintain NICs at scale, is when we have to deal with the ramifications of these features being intertwined with core driver functionality that is relevant to everyone. This becomes very obvious when we need to backport drivers from later versions of kernel. I realize that backports of a driver is not a specific concern of the Linux kernel, but nevertheless this is a real problem and a fact of life for many users. Rebasing the full kernel is still a major effort and it seems the best we could ever do is one rebase per year. In the interim we need to occasionally backport drivers. Backporting drivers is difficult precisely because of new features or API changes to existing ones. These sort of changes tend to have a spiderweb of dependencies in other parts of the stack so that the number of patches we need to cherry-pick goes way beyond those that touch the driver we are interested in. Currently we (FB) need to backport two NIC drivers. I've already gave details of backporting mlx5 on the thread to restructure the driver directories. The other driver being backporting seems to suffer from the same type of feature complexity. In short, I would like to ask if driver maintainers to start to modularize driver features. If something being added is obviously a narrow feature that only a subset of users will need can we allow config options to #ifdef those out somehow? Furthermore can the file and directory structure of drivers reflect that; our lives would be _so_ much simpler to maintain drivers in production if we have such modularity and the ability to build drivers with the features of our choosing. Thanks, Tom