Re: Getting a handle on all these new NIC features

2017-01-20 Thread Tom Herbert
On Fri, Jan 20, 2017 at 8:36 AM, Martin Habets  wrote:
> Hi Tom,
>
> On 17/01/17 22:05, Tom Herbert wrote:
>> There was some discussion about the problems of dealing with the
>> explosion of NIC features in the mlx directory restructuring proposal,
>> but I think the is a deeper issue here that should be discussed.
>>
>> It's hard not to notice that there has been quite a proliferation of
>> NIC features in several drivers. This trend had resulted in very
>> complex driver code that may or may not segment individual features.
>> One visible manifestation of this is number of ndo functions which is
>> somewhere around seventy-five now.
>>
>> I suspect the vast majority of these advances NIC features (e.g.
>> bridging, UDP offloads, tc offload, etc.) are only relevant to some of
>> the people some of the time. The problem we have, in this case those
>> of us that are attempting to deploy and maintain NICs at scale, is
>> when we have to deal with the ramifications of these features being
>> intertwined with core driver functionality that is relevant to
>> everyone. This becomes very obvious when we need to backport drivers
>> from later versions of kernel.
>>
>> I realize that backports of a driver is not a specific concern of the
>> Linux kernel, but nevertheless this is a real problem and a fact of
>> life for many users. Rebasing the full kernel is still a major effort
>> and it seems the best we could ever do is one rebase per year. In the
>> interim we need to occasionally backport drivers. Backporting drivers
>> is difficult precisely because of new features or API changes to
>> existing ones. These sort of changes tend to have a spiderweb of
>> dependencies in other parts of the stack so that the number of patches
>> we need to cherry-pick goes way beyond those that touch the driver we
>> are interested in.
>
> For the sfc driver (Solarflare Adapters) we currently do backports internally 
> for:
>  - RedHat Enterprise Linux5.10,  5.11
>  - RedHat Enterprise Linux6.5, 6.6, 6.7, 6.8
>- Redhat Messaging Realtime and Grid   2.5
>  - RedHat Enterprise Linux7.0, 7.1, 7.2
>- RedHat Enterprise Linux for Realtime 7.1, 7.2
>  - SuSE Linux Enterprise Server 11sp3, sp4
>- SuSE Linux Enterprise RealTime Extension 11
>  - SuSE Linux Enterprise Server 12base release, sp1
>  - Canonical Ubuntu Server LTS14.04, 16.04
>  - Canonical Ubuntu Server-
>  - Debian 7 "Wheezy"  7.X
>  - Debian 8 "Jessie"  8.X
>  - Linux  2.6.18 to 4.9-rc1
>
> We update this list as needed, and always try to support the latest kernel.
> I do not know if that would cover the kernel version you are using.
>
That really doesn't help us. We don't base which kernels we run in
datacenters on what distros are doing-- they don't seem to move as
fast in rebsing. Our general request is that vendors always do their
development upstream, if we need to do a backport in our kernel then
we take responsibility for that. As I mentioned, the churn and lack of
modularization seem to be making this process more and more difficult.

Tom

> Best regards,
> Martin
>
>> Currently we (FB) need to backport two NIC drivers. I've already gave
>> details of backporting mlx5 on the thread to restructure the driver
>> directories. The other driver being backporting seems to suffer from
>> the same type of feature complexity.
>>
>> In short, I would like to ask if driver maintainers to start to
>> modularize driver features. If something being added is obviously a
>> narrow feature that only a subset of users will need can we allow
>> config options to #ifdef those out somehow? Furthermore can the file
>> and directory structure of drivers reflect that; our lives would be
>> _so_ much simpler to maintain drivers in production if we have such
>> modularity and the ability to build drivers with the features of our
>> choosing.
>>
>> Thanks,
>> Tom


Re: Getting a handle on all these new NIC features

2017-01-20 Thread Martin Habets
Hi Tom,

On 17/01/17 22:05, Tom Herbert wrote:
> There was some discussion about the problems of dealing with the
> explosion of NIC features in the mlx directory restructuring proposal,
> but I think the is a deeper issue here that should be discussed.
> 
> It's hard not to notice that there has been quite a proliferation of
> NIC features in several drivers. This trend had resulted in very
> complex driver code that may or may not segment individual features.
> One visible manifestation of this is number of ndo functions which is
> somewhere around seventy-five now.
> 
> I suspect the vast majority of these advances NIC features (e.g.
> bridging, UDP offloads, tc offload, etc.) are only relevant to some of
> the people some of the time. The problem we have, in this case those
> of us that are attempting to deploy and maintain NICs at scale, is
> when we have to deal with the ramifications of these features being
> intertwined with core driver functionality that is relevant to
> everyone. This becomes very obvious when we need to backport drivers
> from later versions of kernel.
> 
> I realize that backports of a driver is not a specific concern of the
> Linux kernel, but nevertheless this is a real problem and a fact of
> life for many users. Rebasing the full kernel is still a major effort
> and it seems the best we could ever do is one rebase per year. In the
> interim we need to occasionally backport drivers. Backporting drivers
> is difficult precisely because of new features or API changes to
> existing ones. These sort of changes tend to have a spiderweb of
> dependencies in other parts of the stack so that the number of patches
> we need to cherry-pick goes way beyond those that touch the driver we
> are interested in.

For the sfc driver (Solarflare Adapters) we currently do backports internally 
for:
 - RedHat Enterprise Linux5.10,  5.11
 - RedHat Enterprise Linux6.5, 6.6, 6.7, 6.8
   - Redhat Messaging Realtime and Grid   2.5
 - RedHat Enterprise Linux7.0, 7.1, 7.2
   - RedHat Enterprise Linux for Realtime 7.1, 7.2
 - SuSE Linux Enterprise Server 11sp3, sp4
   - SuSE Linux Enterprise RealTime Extension 11  
 - SuSE Linux Enterprise Server 12base release, sp1
 - Canonical Ubuntu Server LTS14.04, 16.04
 - Canonical Ubuntu Server-
 - Debian 7 "Wheezy"  7.X
 - Debian 8 "Jessie"  8.X
 - Linux  2.6.18 to 4.9-rc1

We update this list as needed, and always try to support the latest kernel.
I do not know if that would cover the kernel version you are using.

Best regards,
Martin

> Currently we (FB) need to backport two NIC drivers. I've already gave
> details of backporting mlx5 on the thread to restructure the driver
> directories. The other driver being backporting seems to suffer from
> the same type of feature complexity.
> 
> In short, I would like to ask if driver maintainers to start to
> modularize driver features. If something being added is obviously a
> narrow feature that only a subset of users will need can we allow
> config options to #ifdef those out somehow? Furthermore can the file
> and directory structure of drivers reflect that; our lives would be
> _so_ much simpler to maintain drivers in production if we have such
> modularity and the ability to build drivers with the features of our
> choosing.
> 
> Thanks,
> Tom


Re: Getting a handle on all these new NIC features

2017-01-17 Thread Florian Fainelli
On 01/17/2017 02:05 PM, Tom Herbert wrote:
> I realize that backports of a driver is not a specific concern of the
> Linux kernel, but nevertheless this is a real problem and a fact of
> life for many users. Rebasing the full kernel is still a major effort
> and it seems the best we could ever do is one rebase per year. In the
> interim we need to occasionally backport drivers. Backporting drivers
> is difficult precisely because of new features or API changes to
> existing ones. These sort of changes tend to have a spiderweb of
> dependencies in other parts of the stack so that the number of patches
> we need to cherry-pick goes way beyond those that touch the driver we
> are interested in.

backports (formerly known as compat-wireless) dealt with that problem by
pulling in all dependencies from the networking stack (and beyond ),
this allowed people with a need to stay on a particular kernel version
to get the newest and latest networking bits and drivers with minor
disruption to other parts of the kernel. The project now seems to be
largely dead, but could be revived I presume:

https://backports.wiki.kernel.org/index.php/Main_Page

> 
> In short, I would like to ask if driver maintainers to start to
> modularize driver features. If something being added is obviously a
> narrow feature that only a subset of users will need can we allow
> config options to #ifdef those out somehow? 

Multiplying the number if #ifdef means that every config option is going
to be turned on by Linux distributions, and most likely just a subset
will be turned by specific kernel configurations (like yours), but all
in all, this multiplies the number of build combinations to a point
where this may not be manageable for an upstream driver and some
combinations won't be tested properly except by whoever diverges from
these. I understand the concern of modularizing and having clean
independent features/modules, I am unsure that more configuration
options is necessarily right approach.

Slightly tangential, once a series of patches lands in a given
maintainers' tree, it is very hard to match a given commit with its
original submission and say, locate the 11 other patches out of this 12
patch series adding feature XYZ of interest. David does a great job a
putting submissions in a branch, which helps a lot, but in general,
there is not enough information in git to associate a given patch with
its companion patches within a series, hence making backporting harder IMHO.
-- 
Florian


Re: Getting a handle on all these new NIC features

2017-01-17 Thread Saeed Mahameed
On Wed, Jan 18, 2017 at 12:05 AM, Tom Herbert  wrote:
> There was some discussion about the problems of dealing with the
> explosion of NIC features in the mlx directory restructuring proposal,
> but I think the is a deeper issue here that should be discussed.
>
> It's hard not to notice that there has been quite a proliferation of
> NIC features in several drivers. This trend had resulted in very
> complex driver code that may or may not segment individual features.
> One visible manifestation of this is number of ndo functions which is
> somewhere around seventy-five now.
>
> I suspect the vast majority of these advances NIC features (e.g.
> bridging, UDP offloads, tc offload, etc.) are only relevant to some of
> the people some of the time. The problem we have, in this case those
> of us that are attempting to deploy and maintain NICs at scale, is
> when we have to deal with the ramifications of these features being
> intertwined with core driver functionality that is relevant to
> everyone. This becomes very obvious when we need to backport drivers
> from later versions of kernel.
>
> I realize that backports of a driver is not a specific concern of the
> Linux kernel, but nevertheless this is a real problem and a fact of
> life for many users. Rebasing the full kernel is still a major effort
> and it seems the best we could ever do is one rebase per year. In the
> interim we need to occasionally backport drivers. Backporting drivers
> is difficult precisely because of new features or API changes to
> existing ones. These sort of changes tend to have a spiderweb of
> dependencies in other parts of the stack so that the number of patches
> we need to cherry-pick goes way beyond those that touch the driver we
> are interested in.
>

I think backporting is not the only concern here, the other main issue
 is a pure software
design related that cannot just be ignored, device drivers are getting
smarter and
are doing lots of offloads and logic, they are not as thin as they
used to be, which is also a justification for why we should take a
second (stop coding for a while :-) ) and give this issue some
attention.

> Currently we (FB) need to backport two NIC drivers. I've already gave
> details of backporting mlx5 on the thread to restructure the driver
> directories. The other driver being backporting seems to suffer from
> the same type of feature complexity.
>

Can you share some more about the most complex stuff you faced while
backporting?
What would have made it simpler if we designed the driver differently ?

> In short, I would like to ask if driver maintainers to start to
> modularize driver features. If something being added is obviously a
> narrow feature that only a subset of users will need can we allow
> config options to #ifdef those out somehow? Furthermore can the file
> and directory structure of drivers reflect that; our lives would be
> _so_ much simpler to maintain drivers in production if we have such
> modularity and the ability to build drivers with the features of our
> choosing.
>

Before we do this or define the plan, there are some questions to be asked:
1. Can we allow ourselves to have kconfig or even an internal
compilation flag per device driver feature ?
2. What about previous features ? i mean in order to have a clean and
clear way to do have this isolation for new features, some kind of
restructuring or core reorganizing is required, it is ugly to have
driver with a hybrid structuring.
3. in case if we decide to do a restructuring phase as we suggested in
the mlx5 patch, what is the plan for older kernels who still backport
fixes to the previous structure.
4. What is the concrete plan ? is there a design reference or
guidelines known to someone that every one can follow ?

Anyway I would like to contribute some thoughts and design techniques
to achieve this moularization and features isolation by design ( at
least for new features):

Device initialization and netdev registration:
 - most of the device drivers have main.c which handles driver
initialization and netdev registration.
 - but today this file provide much more than the above.
 - I suggest to keep it as thin as possible and dedicated to what
it should do.
 - keep HAL (Hardware Abstraction Layer) separated from main.c and
main should call entry points exposed by the HAL layer.
 - basic netdev features RX/TX and most basic ndos for basic
Ethernet functionality can still be in main.c
  - Advanced features (eswitch,TC offloads, vxlan and tunneling
offloads, XDP, etc..) such features can go to separate file(s) with
full logic implementation and clear code locality wrapped by #ifdef
compilation or kconfig flag to have easy control on them and to give
the reviewer/developer a chance to logically understand the code and
distinguish between the different features by looking at the Makefile
or the c file including those features. ( just keep the feature logic
out of main.c)

I've 

Getting a handle on all these new NIC features

2017-01-17 Thread Tom Herbert
There was some discussion about the problems of dealing with the
explosion of NIC features in the mlx directory restructuring proposal,
but I think the is a deeper issue here that should be discussed.

It's hard not to notice that there has been quite a proliferation of
NIC features in several drivers. This trend had resulted in very
complex driver code that may or may not segment individual features.
One visible manifestation of this is number of ndo functions which is
somewhere around seventy-five now.

I suspect the vast majority of these advances NIC features (e.g.
bridging, UDP offloads, tc offload, etc.) are only relevant to some of
the people some of the time. The problem we have, in this case those
of us that are attempting to deploy and maintain NICs at scale, is
when we have to deal with the ramifications of these features being
intertwined with core driver functionality that is relevant to
everyone. This becomes very obvious when we need to backport drivers
from later versions of kernel.

I realize that backports of a driver is not a specific concern of the
Linux kernel, but nevertheless this is a real problem and a fact of
life for many users. Rebasing the full kernel is still a major effort
and it seems the best we could ever do is one rebase per year. In the
interim we need to occasionally backport drivers. Backporting drivers
is difficult precisely because of new features or API changes to
existing ones. These sort of changes tend to have a spiderweb of
dependencies in other parts of the stack so that the number of patches
we need to cherry-pick goes way beyond those that touch the driver we
are interested in.

Currently we (FB) need to backport two NIC drivers. I've already gave
details of backporting mlx5 on the thread to restructure the driver
directories. The other driver being backporting seems to suffer from
the same type of feature complexity.

In short, I would like to ask if driver maintainers to start to
modularize driver features. If something being added is obviously a
narrow feature that only a subset of users will need can we allow
config options to #ifdef those out somehow? Furthermore can the file
and directory structure of drivers reflect that; our lives would be
_so_ much simpler to maintain drivers in production if we have such
modularity and the ability to build drivers with the features of our
choosing.

Thanks,
Tom