from:"Gal Pressman"

[PATCH ethtool 0/2] Documentation fixes and additions for advertise bitmask

2017-09-28 Thread Gal Pressman

Hi John,

The next two patches will fix bad formatting in the advertise modes bitmask and
document four missing 56000 modes.

Regards,
Gal.

Gal Pressman (2):
  ethtool.8: Fix formatting of advertise bitmask
  ethtool.8: Document 56000 advertise link modes

 ethtool.8.in | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

-- 
2.7.4

[PATCH ethtool 1/2] ethtool.8: Fix formatting of advertise bitmask

2017-09-28 Thread Gal Pressman

Fields should be separated with a tab instead of spaces.
Also, remove an accidental ' character from 2500baseX Full mode.

Fixes: 64dfc5e2f046 ("ethtool: Add support for 2500baseT/5000baseT link modes")
Signed-off-by: Gal Pressman 
Reviewed-by: Tariq Toukan 
---
 ethtool.8.in | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ethtool.8.in b/ethtool.8.in
index 7224b04..de2abc8 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -590,9 +590,9 @@ lB  l   lB.
 0x020  1000baseT Full
 0x21000baseKX Full
 0x200  1000baseX Full
-0x8000  2500baseT Full
-0x8000 2500baseX Full  (not supported by IEEE standards)'
-0x1  5000baseT Full
+0x8000 2500baseT Full
+0x8000 2500baseX Full  (not supported by IEEE standards)
+0x15000baseT Full
 0x1000 1baseT Full
 0x41baseKX4 Full
 0x81baseKR Full
-- 
2.7.4

[PATCH ethtool 2/2] ethtool.8: Document 56000 advertise link modes

2017-09-28 Thread Gal Pressman

Add the following advertise modes to the manual:
56000baseKR4 Full
56000baseCR4 Full
56000baseSR4 Full
56000baseLR4 Full

Signed-off-by: Gal Pressman 
Reviewed-by: Tariq Toukan 
---
 ethtool.8.in | 4 
 1 file changed, 4 insertions(+)

diff --git a/ethtool.8.in b/ethtool.8.in
index de2abc8..1654c73 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -613,6 +613,10 @@ lB l   lB.
 0x45baseCR2 Full
 0x85baseKR2 Full
 0x100  5baseSR2 Full
+0x800  56000baseKR4 Full
+0x1000 56000baseCR4 Full
+0x2000 56000baseSR4 Full
+0x4000 56000baseLR4 Full
 0x10   10baseKR4 Full
 0x20   10baseSR4 Full
 0x40   10baseCR4 Full
-- 
2.7.4

Re: Ethtool question

2017-10-17 Thread Gal Pressman

On 16-Oct-17 21:10, Ben Greear wrote:
> On 10/12/2017 03:00 PM, Roopa Prabhu wrote:
>> On Thu, Oct 12, 2017 at 2:45 PM, Ben Greear  wrote:
>>> On 10/11/2017 01:49 PM, David Miller wrote:

 From: "John W. Linville" 
 Date: Wed, 11 Oct 2017 16:44:07 -0400

> On Wed, Oct 11, 2017 at 09:51:56AM -0700, Ben Greear wrote:
>>
>> I noticed today that setting some ethtool settings to the same value
>> returns an error code.  I would think this should silently return
>> success instead?  Makes it easier to call it from scripts this way:
>>
>> [root@lf0313-6477 lanforge]# ethtool -L eth3 combined 1
>> combined unmodified, ignoring
>> no channel parameters changed, aborting
>> current values: tx 0 rx 0 other 1 combined 1
>> [root@lf0313-6477 lanforge]# echo $?
>> 1
>
>
> I just had this discussion a couple of months ago with someone. My
> initial feeling was like you, a no-op is not a failure. But someone
> convinced me otherwise...I will now endeavour to remember who that
> was and how they convinced me...
>
> Anyone else have input here?


 I guess this usually happens when drivers don't support changing the
 settings at all.  So they just make their ethtool operation for the
 'set' always return an error.

 We could have a generic ethtool helper that does "get" and then if the
 "set" request is identical just return zero.

 But from another perspective, the error returned from the "set" in this
 situation also indicates to the user that the driver does not support
 the "set" operation which has value and meaning in and of itself.  And
 we'd lose that with the given suggestion.
>>>
>>>
>>> In my case, the driver (igb) does support the set, my program just made the
>>> same
>>> ethtool call several times and it fails after the initial change (that
>>> actually
>>> changes something), as best as I can figure.
>>
>>
>> This error is returned by ethtool user-space. It does a get, check and
>> then set if user has requested changes.
>>
>
> So, should we fix ethtool to return 0 in this case instead of an error code?
>
> I think so.  If the driver itself returns an error, then probably return the
> error code and/or fix the driver as seems appropriate.
>
> Thanks,
> Ben
>

FWIW, seems like the code isn't consistent in this matter.
Setting pause/channels/coalescing with the same values will return an error 
code, but setting EEE will return success.

Re: [PATCH] ethtool: Add ETHTOOL_RESET support via --reset command

2017-12-03 Thread Gal Pressman


On 30-Nov-17 21:24, Scott Branden wrote:
> Add ETHTOOL_RESET support via --reset command.
>
> ie.  ethtool --reset DEVNAME 
>
> flagnames currently match the ETH_RESET_xxx names:
> mgmt,irq,dma,filter,offload,mac,phy,ram,dedicated,all
>
> Alternatively, you can specific component bitfield directly using
> ethtool --reset DEVNAME flags %x
>
> Signed-off-by: Scott Branden 
> ---
>  ethtool.8.in | 55 -
>  ethtool.c| 73 
> 
>  2 files changed, 127 insertions(+), 1 deletion(-)
>
> diff --git a/ethtool.8.in b/ethtool.8.in
> index 6ad3065..925cfe3 100644
> --- a/ethtool.8.in
> +++ b/ethtool.8.in
> @@ -355,6 +355,20 @@ ethtool \- query or control network driver and hardware 
> settings
>  .B ethtool \-\-get\-phy\-tunable
>  .I devname
>  .RB [ downshift ]
> +.HP
> +.B ethtool \-\-reset
> +.I devname
> +.BN flags
> +.B [mgmt]
> +.B [irq]
> +.B [dma]
> +.B [filter]
> +.B [offload]
> +.B [mac]
> +.B [phy]
> +.B [ram]
> +.B [dedicated]
> +.B [all]
>  .
Nit:
Usually, the brackets formatting is different than the keyword inside them.

Gal

Re: [PATCH] ethtool: Add ETHTOOL_RESET support via --reset command

2017-12-05 Thread Gal Pressman

On 04-Dec-17 22:16, Scott Branden wrote:
> Hi Gal,
>
> I do not understand you're comment - questions inline
>
>
> On 17-12-03 12:07 AM, Gal Pressman wrote:
>> On 30-Nov-17 21:24, Scott Branden wrote:
>>> Add ETHTOOL_RESET support via --reset command.
>>>
>>> ie.  ethtool --reset DEVNAME 
>>>
>>> flagnames currently match the ETH_RESET_xxx names:
>>> mgmt,irq,dma,filter,offload,mac,phy,ram,dedicated,all
>>>
>>> Alternatively, you can specific component bitfield directly using
>>> ethtool --reset DEVNAME flags %x
>>>
>>> Signed-off-by: Scott Branden 
>>> ---
>>>   ethtool.8.in | 55 -
>>>   ethtool.c    | 73 
>>> 
>>>   2 files changed, 127 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/ethtool.8.in b/ethtool.8.in
>>> index 6ad3065..925cfe3 100644
>>> --- a/ethtool.8.in
>>> +++ b/ethtool.8.in
>>> @@ -355,6 +355,20 @@ ethtool \- query or control network driver and 
>>> hardware settings
>>>   .B ethtool \-\-get\-phy\-tunable
>>>   .I devname
>>>   .RB [ downshift ]
>>> +.HP
>>> +.B ethtool \-\-reset
>>> +.I devname
>>> +.BN flags
>>> +.B [mgmt]
>>> +.B [irq]
>>> +.B [dma]
>>> +.B [filter]
>>> +.B [offload]
>>> +.B [mac]
>>> +.B [phy]
>>> +.B [ram]
>>> +.B [dedicated]
>>> +.B [all]
>>>   .
>> Nit:
>> Usually, the brackets formatting is different than the keyword inside them.
> Could you please explain what the formatting should be then?  Any of the 
> options listed are possible components that can be selected.  At least one 
> must be chosen.  Should the brackets be removed then?
The brackets are fine, IMO .B [mgmt] should be replaced with .RB [ mgmt ].
Looks better, and keeps us consistent with other flags (--get-phy-tunable for 
example).

Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-22 Thread Gal Pressman


>> +enum {
>> +ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in 
>> vendor_reason */
>> +ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
>> +ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
> I think OTHER would be better that UNKNOWN. 

Fine with me.
>> +ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
>> +ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
> These two are interesting. We have that information already. Why do we
> want it again?

My goal is to gather all link issue reasons in one place.
I don't want the user to gather pieces of other reports just to get a sense of 
what's wrong.
>> +ETHTOOL_LINK_AN_FAILED, /* Auto negotiation failed */
>> +ETHTOOL_LINK_TRAINING_FAILED, /* Link training failed */
>> +ETHTOOL_LINK_RMT_FAULT, /* Remote fault indication */
>> +ETHTOOL_LINK_BAD_SIGNAL_INTEGRITY, /* Bad signal integrity */
>> +ETHTOOL_LINK_CABLE_MISMATCH, /* Cable protocol mismatch */
>> +ETHTOOL_LINK_INTERNAL_ERR, /* Internal error */
> How does an internet error differ from an UNKNOWN?

Internal error means I know what happened but I can't tell you, your vendor 
support might be
able to provide a better analysis.
Unknown means that the driver doesn't have any useful information that can help 
in this case.
>> +ETHTOOL_LINK_CABLE_UNPLUGGED, /* Cable unplugged */
>> +ETHTOOL_LINK_UNSUPP_MODULE, /* Unsupported module */
>> +ETHTOOL_LINK_I2C_BUS_ERR, /* I2C bus error */
> Which I2C bus? What about MDIO BUS errors?

The I2C bus that's connected to this module (interface).
We can add another reason for MDIO BUS errors or merge to one BUS error reason.
>> +ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
> Which EEPROM?

Module EEPROM.
>> +ETHTOOL_LINK_OVERTEMP, /* Over temperature */
>> +ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
>> +ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
> It seems like these last 6 are all SFP issues? How about putting SFP
> into the name?

Might be a QSFP issue for example, we can put module in the name though.
>
>  Andrew

Re: [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool

2017-06-22 Thread Gal Pressman


>> +if (!netif_running(netdev)) {
>> +ldr->reason = ETHTOOL_LINK_NETDEV_CARRIER_DOWN;
>> +return 0;
>> +}
> This is generic, will work for any interface. The same is true for
> ADMIN_DOWN. Either it is not required at all, since the information is
> available via other means, or it should be in the core.

I agree, will move it to the core.

>
> Andrew

Re: [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool

2017-06-22 Thread Gal Pressman


> Is my reading correct that in case the reason is not in the
> pddr2ethtool_table opaque binary data will be passed from the firmware
> to user space?  Is there any particular reason to allow for this?  If
> it's just for the rare scenario where a new error code needs to be
> added perhaps it would be enough to dump the FW-provided message to the
> logs?

No binary data is passed in this patch, the monitor_opcode is simply a vendor 
specific
16 bit id that is used when the reason  is not generic enough to have it's own 
enum.

Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-22 Thread Gal Pressman


>> +ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
>> +ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
>> +
>> +ETHTOOL_LINK_REASONS_COUNT
>> +};
> Any enumerated list is going to get changed too often.
> Could the API just return a string?

The motivation for the enumerated list is to make this API as generic as 
possible and compatible
with all ethernet drivers.

Returning a string is a good idea, maybe change the vendor specific opcode to a 
string?

Re: [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting

2017-06-22 Thread Gal Pressman

> Ethtool doesn't work well as a monitoring/event interface.
> Maybe this would be better as extended information with the  netlink UP/DOW 
> event.

This API is not meant for monitoring events.
The main use case here is a new interface that's down and won't go up for some 
reason.
Nothing will magically change until someone fixes the issue, logging the reason 
as an event on
netdev creation seems less convenient to me.

Re: [RFC PATCH net-next 0/3] ethtool: Add link down reason reporting

2017-06-22 Thread Gal Pressman


> Any particular reason for implementing this ABI in ethtool rather than
> via some netlink-based interface?  Devlink naturally comes to mind,
> given that cabling problems are not really related to the L2 and netdev
> shouldn't be required for diagnostics..

ethtool is already used for reporting and handling of link related stuff 
through get/set_link_ksettings.
How is reporting the link modes/port type/speed/etc different from this?
This feature is only an extension of the already existent "link detected" field.

Implementing this ABI in devlink is a good idea, but it shouldn't be instead of 
ethtool but in addition to ethtool.
Many users already use ethtool for this kind of info, we can tell them to use 
devlink to check why their link is down,
but I think it's nicer to have it all in one place.

Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-23 Thread Gal Pressman

>> The I2C bus that's connected to this module (interface).
>> We can add another reason for MDIO BUS errors or merge to one BUS error 
>> reason.
 +  ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
>>> Which EEPROM?
>> Module EEPROM.
> Which module? This is all very vague. Some of the Marvell 10G PHYs
> have an EEPROM to boot from, for example. Would that count? Or are you
> talking about the SFP 'EEPROM', which is not actually an EEPROM, in
> that it is not Electrically Erasable, not is it a ROM, since things
> like temperature changes with time.

I am referring to the optical/electrical module EEPROM which is exposed through 
standard
interface such as SFF 8472. Might not be an actual EEPROM but that's how the 
SFF committee decided
to refer to it :).

>
 +  ETHTOOL_LINK_OVERTEMP, /* Over temperature */
 +  ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
 +  ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
>>> It seems like these last 6 are all SFP issues? How about putting SFP
>>> into the name?
>> Might be a QSFP issue for example, we can put module in the name though.
> What is the generic name of SFP, SFP+ QSFP, SFF?

AFAIK, the name is module.
>
>  Andrew

Re: [RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool

2017-06-23 Thread Gal Pressman


> On Thu, 22 Jun 2017 11:33:39 +0300, Gal Pressman wrote:
>>> Is my reading correct that in case the reason is not in the
>>> pddr2ethtool_table opaque binary data will be passed from the firmware
>>> to user space?  Is there any particular reason to allow for this?  If
>>> it's just for the rare scenario where a new error code needs to be
>>> added perhaps it would be enough to dump the FW-provided message to the
>>> logs?  
>> No binary data is passed in this patch, the monitor_opcode is simply a 
>> vendor specific
>> 16 bit id that is used when the reason  is not generic enough to have it's 
>> own enum.
> Sorry if I'm wrong, I thought this would potentially copy
> ETH_GSTRING_LEN bytes to userspace:
>
> + if (status_message)
> + memcpy(status_message,
> +MLX5_ADDR_OF(pddr_reg, out, 
> page_data.troubleshooting_info_page.status_message),
> +ETH_GSTRING_LEN);
>
> I'm also still not sure why a reason would not be generic enough for
> the enum, if it fits in the 16bit vendor enum... 
Some reasons might be very specific, and only relevant to my hardware for 
example.
In that case I'd like to avoid an enum entry that is not wide enough to be used 
by anyone else but me,
and still have a way to pass it to the user.

The userspace patches will allow for a vendor parser (similar to ethtool 
regdump parsers) that can
convert this 16bit id to a string.

Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-25 Thread Gal Pressman

> On Thu, Jun 22, 2017 at 11:09:04AM +0300, Gal Pressman wrote:
>>>> +enum {
>>>> +  ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in 
>>>> vendor_reason */
>>>> +  ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
>>>> +  ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
>>> I think OTHER would be better that UNKNOWN. 
>> Fine with me.
>>>> +  ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
>>>> +  ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
>>> These two are interesting. We have that information already. Why do we
>>> want it again?
>> My goal is to gather all link issue reasons in one place.
> I'm actually wondering if this is a user space problem. Nearly
> everything you list is already available. Some you get from ip link,
> others from ethtool or ethtool --module-info, including I2C bus
> error, since you would expect EIO or ETIMEOUT.
>
> If you were to write a user space tool using the information what is
> currently available, what would be missing?
>
> Andrew

I think most of the reasons in this list would be missing.
Auto negotiation failure, link training failure, remote fault indication, bad 
signal integrity, cable protocol mismatch, cable unplugged,
over temperature, power budget exceeded..

Keep in mind that this is just an initial list, not to mention the vendor 
reasons which are not part of any existing API.
I don't see how a user space tool that expects ETIMEOUT/EIO is better than this 
suggestion.

Re: [PATCH net-next 1/3] net: ethtool: add support for forward error correction modes

2017-06-25 Thread Gal Pressman


> ...
>
> SHOW FEC option:
> root@tor: ethtool --show-fec  swp1
> FEC parameters for swp1:
> Active FEC encodings: RS
> Configured FEC encodings:  RS | BaseR
>
> ETHTOOL DEVNAME output modification:
>
> ethtool devname output:
> root@tor:~# ethtool swp1
> Settings for swp1:
> root@hpe-7712-03:~# ethtool swp18
> Settings for swp18:
> Supported ports: [ FIBRE ]
> Supported link modes:   4baseCR4/Full
> 4baseSR4/Full
> 4baseLR4/Full
> 10baseSR4/Full
> 10baseCR4/Full
> 10baseLR4_ER4/Full
> Supported pause frame use: No
> Supports auto-negotiation: Yes
> Supported FEC modes: [RS | BaseR | None | Not reported]
> Advertised link modes:  Not reported
> Advertised pause frame use: No
> Advertised auto-negotiation: No
> Advertised FEC modes: [RS | BaseR | None | Not reported]
>  One or more FEC modes
> Speed: 10Mb/s
> Duplex: Full
> Port: FIBRE
> PHYAD: 106
> Transceiver: internal
> Auto-negotiation: off
> Link detected: yes
What is the difference between the information in ethtool DEVNAME and ethtool 
--show-fec DEVNAME?
I can't find a usage of LINK_MODE_FEC_* bits in downstream patches.

>
> This patch includes following changes
> a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by
>   the new get_fecparam/set_fecparam callbacks, provides support
>   for configuration of forward error correction modes.
> b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC
>   are defined so that users can configure these fec modes for supported
>   and advertising fields as part of link autonegotiation.
>
> Signed-off-by: Vidya Sagar Ravipati 
> ---
>  include/linux/ethtool.h  |  4 
>  include/uapi/linux/ethtool.h | 48 
> +++-
>  net/core/ethtool.c   | 34 +++
>  3 files changed, 85 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 83cc986..afdbb70 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -374,5 +374,9 @@ struct ethtool_ops {
> struct ethtool_link_ksettings *);
>   int (*set_link_ksettings)(struct net_device *,
> const struct ethtool_link_ksettings *);
> + int (*get_fecparam)(struct net_device *,
> +   struct ethtool_fecparam *);
> + int (*set_fecparam)(struct net_device *,
> +   struct ethtool_fecparam *);
>  };
>  #endif /* _LINUX_ETHTOOL_H */
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index 7d4a594..9c041da 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -1238,6 +1238,47 @@ struct ethtool_per_queue_op {
>   chardata[];
>  };
>  
> +/**
> + * struct ethtool_fecparam - Ethernet forward error correction(fec) 
> parameters
> + * @cmd: Command number = %ETHTOOL_GFECPARAM or %ETHTOOL_SFECPARAM
> + * @active_fec: FEC mode which is active on porte
port.

> + * @fec: Bitmask of supported/configured FEC modes
> + * @rsvd: Reserved for future extensions. i.e FEC bypass feature.
> + *
> + * Drivers should reject a non-zero setting of @autoneg when
> + * autoneogotiation is disabled (or not supported) for the link.
Which @autoneg?

> + *
> + */
> +struct ethtool_fecparam {
> + __u32   cmd;
> + /* bitmask of FEC modes */
> + __u32   active_fec;
> + __u32   fec;
> + __u32   reserved;
> +};
> +
> +/**
> + * enum ethtool_fec_config_bits - flags definition of 
> ethtool_fec_configuration
> + * @ETHTOOL_FEC_NONE: FEC mode configuration is not supported
> + * @ETHTOOL_FEC_AUTO: Default/Best FEC mode provided by driver
> + * @ETHTOOL_FEC_OFF: No FEC Mode
> + * @ETHTOOL_FEC_RS: Reed-Solomon Forward Error Detection mode
> + * @ETHTOOL_FEC_BASER: Base-R/Reed-Solomon Forward Error Detection mode
> + */
> +enum ethtool_fec_config_bits {
> + ETHTOOL_FEC_NONE_BIT,
> + ETHTOOL_FEC_AUTO_BIT,
> + ETHTOOL_FEC_OFF_BIT,
> + ETHTOOL_FEC_RS_BIT,
> + ETHTOOL_FEC_BASER_BIT,
> +};
> +
> +#define ETHTOOL_FEC_NONE (1 << ETHTOOL_FEC_NONE_BIT)
> +#define ETHTOOL_FEC_AUTO (1 << ETHTOOL_FEC_AUTO_BIT)
> +#define ETHTOOL_FEC_OFF  (1 << ETHTOOL_FEC_OFF_BIT)
> +#define ETHTOOL_FEC_RS   (1 << ETHTOOL_FEC_RS_BIT)
> +#define ETHTOOL_FEC_BASER(1 << ETHTOOL_FEC_BASER_BIT)
> +
>  /* CMDs currently supported */
>  #define ETHTOOL_GSET 0x0001 /* DEPRECATED, Get settings.
>   * Please use ETHTOOL_GLINKSETTINGS
> @@ -1330,6 +1371,8 @@ struct ethtool_per_queue_op {
>  #define ETHTOOL_SLINKSETTINGS0x004d /* Set ethtool_link_settings 
> */
>  #define ETHTOOL_

Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-26 Thread Gal Pressman


> On Sun, Jun 25, 2017 at 02:59:24PM +0300, Gal Pressman wrote:
>>> On Thu, Jun 22, 2017 at 11:09:04AM +0300, Gal Pressman wrote:
>>>>>> +enum {
>>>>>> +ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue 
>>>>>> provided in vendor_reason */
>>>>>> +ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
>>>>>> +ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
>>>>> I think OTHER would be better that UNKNOWN. 
>>>> Fine with me.
>>>>>> +ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
>>>>>> +ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
>>>>> These two are interesting. We have that information already. Why do we
>>>>> want it again?
>>>> My goal is to gather all link issue reasons in one place.
>>> I'm actually wondering if this is a user space problem. Nearly
>>> everything you list is already available. Some you get from ip link,
>>> others from ethtool or ethtool --module-info, including I2C bus
>>> error, since you would expect EIO or ETIMEOUT.
>>>
>>> If you were to write a user space tool using the information what is
>>> currently available, what would be missing?
>>>
>>>   Andrew
>> I think most of the reasons in this list would be missing.
>> Auto negotiation failure,
> You can probably get that from the PHY layer. You get both the local
> and remote AN capabilities.
>
>> unplugged, over temperature, power budget exceeded..
> Don't you get over temperature from the SFF data? Also power budget?
You are right, but it depends on other resources that might fail such as BUS 
failure, invalid EEPROM, etc..

> And what does cable unplugged actually mean? Do you have a micro
> switch inside the socket? So that is maybe a gpio-key?
No, some hardware devices can sense this state.
We would like to expose this information when it's available.

> Another thing to remember is that your device is the exception to the
> rule. You have some firmware doing a lot of the work bringing this all
> together. But nearly every other Ethernet interface has a discrete MAC
> and PHY, I2C bus driver, EEPROM driver, generic SFF decoder, HWMON
> temperature sensor, etc. How does your call work in this normal
> situation? How do you make calls into all these subsystems to get the
> information? You want a generic solution which can be made to work for
> everybody.
The driver has a good intimate information of his device implementation, and 
hence an analysis done by the device vendor is favorable.
The driver provider can perform the analysis inside the device (firmware) or in 
the driver according to his preferences.
We believe that since devices are becoming smarter, more analysis will be done 
by the device itself, which has more
information and faster access.
Smart NICs/SoCs are very popular this days and this API takes into account the 
different architectures.

Since this callback is optional, a user space analysis tool can be added in the 
future providing more generic analysis approach for
unsupported devices.

>   Andrew
>
>> Keep in mind that this is just an initial list, not to mention the vendor 
>> reasons which are not part of any existing API.
>> I don't see how a user space tool that expects ETIMEOUT/EIO is better than 
>> this suggestion.

Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-26 Thread Gal Pressman


> On Wed, Jun 21, 2017 at 04:04:44PM +0300, Gal Pressman wrote:
>> Currently, drivers can only tell whether the link is up/down through
>> ETHTOOL_GLINK callback, but no additional information is given in case
>> of link down.
>> This patch provides an infrastructure that allows drivers to hint
>> the user with the reason why the link is down, in order to ease the
>> debug process.
>>
>> Reasons are separated to two types, generic and vendor specific.
>> Drivers can reply with a generic reason using the enums provided in this
>> patch (and the ones that will be added in the future), which will be
>> translated to strings by the userspace ethtool.
>> In case of a vendor specific reason (not suitable for most vendors),
>> drivers can reply with ETHTOOL_VENDOR_SPECIFIC reason, in this case the
>> vendor_reason field should be filled with a vendor specific status code
>> which will be parsed by the vendor specific userspace parser if one is
>> available.
>>
>> This kind of information can save system administrators precious time
>> which will not be wasted trying to understand why the link won't go
>> up.
>>
>> For example, when the cable is unplugged:
>> $ ethtool ethXX
>> ...
>> Link detected: no (Cable unplugged)
>>
>> Signed-off-by: Gal Pressman 
>> Signed-off-by: Saeed Mahameed 
>> ---
>>  include/linux/ethtool.h  |  2 ++
>>  include/uapi/linux/ethtool.h | 33 +
>>  net/core/ethtool.c   | 24 
>>  3 files changed, 59 insertions(+)
>>
>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>> index 83cc986..d472047 100644
>> --- a/include/linux/ethtool.h
>> +++ b/include/linux/ethtool.h
>> @@ -374,5 +374,7 @@ struct ethtool_ops {
>>struct ethtool_link_ksettings *);
>>  int (*set_link_ksettings)(struct net_device *,
>>const struct ethtool_link_ksettings *);
>> +int (*get_link_down_reason)(struct net_device *,
>> +struct ethtool_link_down_reason *);
>>  };
>>  #endif /* _LINUX_ETHTOOL_H */
>> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
>> index 7d4a594..8cf9d2c 100644
>> --- a/include/uapi/linux/ethtool.h
>> +++ b/include/uapi/linux/ethtool.h
>> @@ -550,6 +550,13 @@ struct ethtool_pauseparam {
>>  
>>  #define ETH_GSTRING_LEN 32
>>  
>> +struct ethtool_link_down_reason {
>> +__u32   cmd;
>> +__u32   reason;
>> +__u32   vendor_reason;
>> +__u32   reserved[4];
>> +};
> Shouldn't this be a list? The device is over its power budget,
> overheating and does not have a cable plugged in. There can be
> multiple reasons it is down.
Current implementation will report one issue at a time, fix it to reveal the 
next one :).

Reporting more than one issue is possible, for example:
Add another callback to return the number of issues and allocate a buffer 
accordingly, sounds like an overkill to me.
Alternatively, use a bitmap instead of the enum, which will probably get too 
big very quickly.
I don't think it's worth it in this particular case since in most cases one 
reason should suffice.

>
>Andrew

Re: [PATCH net-next 1/3] net: ethtool: add support for forward error correction modes

2017-06-29 Thread Gal Pressman

> Hi Gal,
>
> ...
>> What is the difference between the information in ethtool DEVNAME and 
>> ethtool --show-fec DEVNAME?
> I think there are two questions there.  First, how does the FEC-related
> information from glinksettings differ from what's retrieved via
> gfecparam.  Second, how is that expressed through the ethtool UI.
>
> Regarding the uapi (as we imagined it), glinksettings returns FEC
> information through three fields:
>
> @supported: the complete set of FEC modes the hardware supports, at any
> speed, medium, or autoneg combination.
>
> @advertising: the set of modes advertised to the link partner through
> the relevant autoneg mechanism.
>
> @lp_advertising: the set of modes the link partner is advertising
> through autoneg.
>
>
> gfecparam is used to fetch a couple more important facts about the FEC
> configuration:
>
> 1) What FEC mode is currently active, either as a result of the autoneg
> process, or a previous call to sfecparam.  This is returned in
> sfecparam->active_fec
>
> 2) If autoneg is off, what is the currently configured FEC mode.  This
> is a bitmask returned in gfecparam->fec.  I imagine it's typically a
> single mode, but a mask makes it easier to implement a "don't care" policy,
> or otherwise allow the NIC/driver to pick between a set of modes.
>
>
> Regarding the UI.  ethtool DEVNAME gets most of its info from
> glinksettings and it's easy to represent the FEC parameters affected by
> autoneg there.  ethtool --show-fec simply reports the output of
> gfecparam.  I agree the difference is subtle, perhaps it makes sense to
> combine all the FEC information into ethtool DEVNAME?
IMHO it does, but the current UI is fine too.
Thanks for the explanation.

[PATCH ethtool] ethtool: Fix coding style warnings and errors reported by checkpatch

2017-10-26 Thread Gal Pressman

Checkpatch had a lot to say about ethtool.c, this will handle a lot of
the reported issues.

Signed-off-by: Gal Pressman 
Reviewed-by: Tariq Toukan 
---
 ethtool.c | 189 +-
 1 file changed, 89 insertions(+), 100 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index ad18704..2d53c29 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -235,7 +235,7 @@ get_uint_range(char *str, int base, unsigned long long max)
exit_bad_args();
errno = 0;
v = strtoull(str, &endp, base);
-   if ( errno || *endp || v > max)
+   if (errno || *endp || v > max)
exit_bad_args();
return v;
 }
@@ -261,9 +261,8 @@ static void get_mac_addr(char *src, unsigned char *dest)
if (count != ETH_ALEN)
exit_bad_args();
 
-   for (i = 0; i < count; i++) {
+   for (i = 0; i < count; i++)
dest[i] = buf[i];
-   }
 }
 
 static int parse_hex_u32_bitmap(const char *s,
@@ -408,7 +407,7 @@ static void parse_generic_cmdline(struct cmd_context *ctx,
break;
}
}
-   if( !found)
+   if (!found)
exit_bad_args();
}
 }
@@ -683,10 +682,10 @@ static void dump_link_caps(const char *prefix, const char 
*an_prefix,
  "1baseLRM/Full" },
{ 0, ETHTOOL_LINK_MODE_1baseER_Full_BIT,
  "1baseER/Full" },
-{ 0, ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
-  "2500baseT/Full" },
-{ 0, ETHTOOL_LINK_MODE_5000baseT_Full_BIT,
-  "5000baseT/Full" },
+   { 0, ETHTOOL_LINK_MODE_2500baseT_Full_BIT,
+ "2500baseT/Full" },
+   { 0, ETHTOOL_LINK_MODE_5000baseT_Full_BIT,
+ "5000baseT/Full" },
};
int indent;
int did1, new_line_pend, i;
@@ -715,7 +714,7 @@ static void dump_link_caps(const char *prefix, const char 
*an_prefix,
}
}
if (did1 == 0)
-fprintf(stdout, "Not reported");
+   fprintf(stdout, "Not reported");
fprintf(stdout, "\n");
 
if (!link_mode_only) {
@@ -887,32 +886,32 @@ static int parse_wolopts(char *optstr, u32 *data)
*data = 0;
while (*optstr) {
switch (*optstr) {
-   case 'p':
-   *data |= WAKE_PHY;
-   break;
-   case 'u':
-   *data |= WAKE_UCAST;
-   break;
-   case 'm':
-   *data |= WAKE_MCAST;
-   break;
-   case 'b':
-   *data |= WAKE_BCAST;
-   break;
-   case 'a':
-   *data |= WAKE_ARP;
-   break;
-   case 'g':
-   *data |= WAKE_MAGIC;
-   break;
-   case 's':
-   *data |= WAKE_MAGICSECURE;
-   break;
-   case 'd':
-   *data = 0;
-   break;
-   default:
-   return -1;
+   case 'p':
+   *data |= WAKE_PHY;
+   break;
+   case 'u':
+   *data |= WAKE_UCAST;
+   break;
+   case 'm':
+   *data |= WAKE_MCAST;
+   break;
+   case 'b':
+   *data |= WAKE_BCAST;
+   break;
+   case 'a':
+   *data |= WAKE_ARP;
+   break;
+   case 'g':
+   *data |= WAKE_MAGIC;
+   break;
+   case 's':
+   *data |= WAKE_MAGICSECURE;
+   break;
+   case 'd':
+   *data = 0;
+   break;
+   default:
+   return -1;
}
optstr++;
}
@@ -957,10 +956,11 @@ static int dump_wol(struct ethtool_wolinfo *wol)
if (wol->supported & WAKE_MAGICSECURE) {
int i;
int delim = 0;
+
fprintf(stdout, "SecureOn password: ");
for (i = 0; i < SOPASS_MAX; i++)

[PATCH ethtool] ethtool: Add extended compliance codes parsing to sfp modules

2017-11-19 Thread Gal Pressman

Update parsing according to SFP28 spec with extended compliance codes.
SFF-8472, SFF-8024 specify the description of module capability present
in the 36th byte.

Signed-off-by: Gal Pressman 
Reviewed-by: Eran Ben Elisha 
---
 sfpid.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/sfpid.c b/sfpid.c
index 1732e5e..a1753d3 100644
--- a/sfpid.c
+++ b/sfpid.c
@@ -40,10 +40,10 @@ static void sff8079_show_transceiver(const __u8 *id)
"\tTransceiver type  :";
 
printf("\t%-41s : 0x%02x 0x%02x 0x%02x " \
-  "0x%02x 0x%02x 0x%02x 0x%02x 0x%02x\n",
+  "0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x\n",
   "Transceiver codes",
   id[3], id[4], id[5], id[6],
-  id[7], id[8], id[9], id[10]);
+  id[7], id[8], id[9], id[10], id[36]);
/* 10G Ethernet Compliance Codes */
if (id[3] & (1 << 7))
printf("%s 10G Ethernet: 10G Base-ER" \
@@ -168,6 +168,27 @@ static void sff8079_show_transceiver(const __u8 *id)
printf("%s FC: 200 MBytes/sec\n", pfx);
if (id[10] & (1 << 0))
printf("%s FC: 100 MBytes/sec\n", pfx);
+   /* Extended Specification Compliance Codes from SFF-8024 */
+   if (id[36] == 0x1)
+   printf("%s Extended: 100G AOC or 25GAUI C2M AOC with worst BER 
of 5x10^(-5)\n", pfx);
+   if (id[36] == 0x2)
+   printf("%s Extended: 100G Base-SR4 or 25GBase-SR\n", pfx);
+   if (id[36] == 0x3)
+   printf("%s Extended: 100G Base-LR4 or 25GBase-LR\n", pfx);
+   if (id[36] == 0x4)
+   printf("%s Extended: 100G Base-ER4 or 25GBase-ER\n", pfx);
+   if (id[36] == 0x8)
+   printf("%s Extended: 100G ACC or 25GAUI C2M ACC with worst BER 
of 5x10^(-5)\n", pfx);
+   if (id[36] == 0xb)
+   printf("%s Extended: 100G Base-CR4 or 25G Base-CR CA-L\n", pfx);
+   if (id[36] == 0xc)
+   printf("%s Extended: 25G Base-CR CA-S\n", pfx);
+   if (id[36] == 0xd)
+   printf("%s Extended: 25G Base-CR CA-N\n", pfx);
+   if (id[36] == 0x18)
+   printf("%s Extended: 100G AOC or 25GAUI C2M AOC with worst BER 
of 10^(-12)\n", pfx);
+   if (id[36] == 0x19)
+   printf("%s Extended: 100G ACC or 25GAUI C2M ACC with worst BER 
of 10^(-12)\n", pfx);
 }
 
 static void sff8079_show_encoding(const __u8 *id)
-- 
2.7.4

Re: [PATCH net-next] net: Make RX-FCS and LRO mutually exclusive

2018-03-05 Thread Gal Pressman

On 05-Mar-18 17:27, David Miller wrote:
> From: Gal Pressman 
> Date: Sun,  4 Mar 2018 14:12:04 +0200
> 
>> LRO and RX-FCS offloads cannot be enabled at the same time since it is
>> not clear what should happen to the FCS of each coalesced packet.
>> The FCS is not really part of the TCP payload, hence cannot be merged
>> into one big packet. On the other hand, providing one big LRO packet
>> with one FCS contradicts the RX-FCS feature goal.
>>
>> Use the fix features mechanism in order to prevent intersection of the
>> features and drop LRO in case RX-FCS is requested.
>>
>> Enabling RX-FCS while LRO is enabled will result in:
>> $ ethtool -K ens6 rx-fcs on
>> Actual changes:
>> large-receive-offload: off [requested on]
>> rx-fcs: on
>>
>> Signed-off-by: Gal Pressman 
>> Reviewed-by: Tariq Toukan 
> 
> Agreed, having these two options enabled at the same time doesn't
> make any sense.
> 
> Applied.
> 
> Probably need to add the same restriction for HW GRO.
> 

I agree, I'll submit another patch to restrict it too.

[PATCH net-next] net: Make RX-FCS and HW GRO mutually exclusive

2018-03-12 Thread Gal Pressman

Same as LRO, hardware GRO cannot be enabled with RX-FCS.
When both are requested, hardware GRO will be dropped.

Suggested-by: David Miller 
Signed-off-by: Gal Pressman 
---
 net/core/dev.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1bc3792..fd87c2c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7542,10 +7542,17 @@ static netdev_features_t netdev_fix_features(struct 
net_device *dev,
}
}
 
-   /* LRO feature cannot be combined with RX-FCS */
-   if ((features & NETIF_F_LRO) && (features & NETIF_F_RXFCS)) {
-   netdev_dbg(dev, "Dropping LRO feature since RX-FCS is 
requested.\n");
-   features &= ~NETIF_F_LRO;
+   /* LRO/HW-GRO features cannot be combined with RX-FCS */
+   if (features & NETIF_F_RXFCS) {
+   if (features & NETIF_F_LRO) {
+   netdev_dbg(dev, "Dropping LRO feature since RX-FCS is 
requested.\n");
+   features &= ~NETIF_F_LRO;
+   }
+
+   if (features & NETIF_F_GRO_HW) {
+   netdev_dbg(dev, "Dropping HW-GRO feature since RX-FCS 
is requested.\n");
+   features &= ~NETIF_F_GRO_HW;
+   }
}
 
return features;
-- 
2.7.4

[PATCH net-next] net: Fix spelling mistake "greater then" -> "greater than"

2018-02-28 Thread Gal Pressman

Fix trivial spelling mistake "greater then" -> "greater than".

Signed-off-by: Gal Pressman 
---
 include/net/sch_generic.h | 2 +-
 net/core/dev.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e2ab136..d4907b5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -540,7 +540,7 @@ static inline bool skb_skip_tc_classify(struct sk_buff *skb)
return false;
 }
 
-/* Reset all TX qdiscs greater then index of a device.  */
+/* Reset all TX qdiscs greater than index of a device.  */
 static inline void qdisc_reset_all_tx_gt(struct net_device *dev, unsigned int 
i)
 {
struct Qdisc *qdisc;
diff --git a/net/core/dev.c b/net/core/dev.c
index 1bc3792..ff1d0ef 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2378,7 +2378,7 @@ EXPORT_SYMBOL(netdev_set_num_tc);
 
 /*
  * Routine to help set real_num_tx_queues. To avoid skbs mapped to queues
- * greater then real_num_tx_queues stale skbs on the qdisc must be flushed.
+ * greater than real_num_tx_queues stale skbs on the qdisc must be flushed.
  */
 int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq)
 {
-- 
2.7.4

[PATCH net-next] net: Make RX-FCS and LRO mutually exclusive

2018-03-04 Thread Gal Pressman

LRO and RX-FCS offloads cannot be enabled at the same time since it is
not clear what should happen to the FCS of each coalesced packet.
The FCS is not really part of the TCP payload, hence cannot be merged
into one big packet. On the other hand, providing one big LRO packet
with one FCS contradicts the RX-FCS feature goal.

Use the fix features mechanism in order to prevent intersection of the
features and drop LRO in case RX-FCS is requested.

Enabling RX-FCS while LRO is enabled will result in:
$ ethtool -K ens6 rx-fcs on
Actual changes:
large-receive-offload: off [requested on]
rx-fcs: on

Signed-off-by: Gal Pressman 
Reviewed-by: Tariq Toukan 
---
 net/core/dev.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index c9d3058..1bc3792 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7542,6 +7542,12 @@ static netdev_features_t netdev_fix_features(struct 
net_device *dev,
}
}
 
+   /* LRO feature cannot be combined with RX-FCS */
+   if ((features & NETIF_F_LRO) && (features & NETIF_F_RXFCS)) {
+   netdev_dbg(dev, "Dropping LRO feature since RX-FCS is 
requested.\n");
+   features &= ~NETIF_F_LRO;
+   }
+
return features;
 }
 
-- 
2.7.4

Re: [net-next 03/15] net/mlx5e: PFC stall prevention support

2018-03-25 Thread Gal Pressman

On 24-Mar-18 18:07, Andrew Lunn wrote:
> On Fri, Mar 23, 2018 at 03:39:13PM -0700, Saeed Mahameed wrote:
>> From: Inbar Karmy 
>>
>> Implement set/get functions to configure PFC stall prevention
>> timeout by tunables api through ethtool.
>> By default the stall prevention timeout is configured to 8 sec.
>> Timeout range is: 80-8000 msec.
>> Enabling stall prevention without a specific timeout will set
>> the timeout to 100 msec.
> 
> Hi Inbar
> 
> I think this last sentence should be talking about auto, not without a
> specific timeout.

Hi Andrew,
Good catch, we will fix that.

> 
>>
>> Signed-off-by: Inbar Karmy 
>> Signed-off-by: Saeed Mahameed 
>> ---
>>  .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 57 +++
>>  drivers/net/ethernet/mellanox/mlx5/core/port.c | 64 
>> +++---
>>  include/linux/mlx5/mlx5_ifc.h  | 17 --
>>  include/linux/mlx5/port.h  |  6 ++
>>  4 files changed, 132 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> index cc8048f68f11..62061fd23143 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
>> @@ -1066,6 +1066,57 @@ static int mlx5e_get_rxnfc(struct net_device *netdev,
>>  return err;
>>  }
>>  
>> +#define MLX5E_PFC_PREVEN_AUTO_TOUT_MSEC 100
>> +#define MLX5E_PFC_PREVEN_TOUT_MAX_MSEC  8000
>> +#define MLX5E_PFC_PREVEN_MINOR_PRECENT  85
>> +#define MLX5E_PFC_PREVEN_TOUT_MIN_MSEC  80
>> +#define MLX5E_DEVICE_STALL_MINOR_WATERMARK(critical_tout) \
>> +max_t(u16, MLX5E_PFC_PREVEN_TOUT_MIN_MSEC, \
>> +  (critical_tout * MLX5E_PFC_PREVEN_MINOR_PRECENT) / 100)
>> +
>> +static int mlx5e_get_pfc_prevention_tout(struct net_device *netdev,
>> + u16 *pfc_prevention_tout)
>> +{
>> +struct mlx5e_priv *priv= netdev_priv(netdev);
>> +struct mlx5_core_dev *mdev = priv->mdev;
>> +
>> +if (!MLX5_CAP_PCAM_FEATURE((priv)->mdev, pfcc_mask) ||
>> +!MLX5_CAP_DEBUG((priv)->mdev, stall_detect))
>> +return -EOPNOTSUPP;
>> +
>> +return mlx5_query_port_stall_watermark(mdev, pfc_prevention_tout, NULL);
> 
> Shouldn't you map a value of MLX5E_PFC_PREVEN_AUTO_TOUT_MSEC back to 
> PFC_STORM_PREVENTION_AUTO?

We discussed this point internally, mapping MLX5E_PFC_PREVEN_AUTO_TOUT_MSEC 
(100) to
PFC_STORM_PREVENTION_AUTO might cause confusion when the user explicitly asks 
for 100msec timeout
and gets auto in his following query.
Also, this way the "auto" timeout is visible to the user, which might help him 
get an initial
clue of which values are recommended.

> 
>   Andrew
>

Re: [net-next 02/15] ethtool: Add support for configuring PFC stall prevention in ethtool

2018-03-25 Thread Gal Pressman

On 24-Mar-18 17:57, Andrew Lunn wrote:
> On Fri, Mar 23, 2018 at 03:39:12PM -0700, Saeed Mahameed wrote:
>> From: Inbar Karmy 
>>
>> In the event where the device unexpectedly becomes unresponsive
>> for a long period of time, flow control mechanism may propagate
>> pause frames which will cause congestion spreading to the entire
>> network.
>> To prevent this scenario, when the device is stalled for a period
>> longer than a pre-configured timeout, flow control mechanisms are
>> automatically disabled.
>>
>> This patch adds support for the ETHTOOL_PFC_STALL_PREVENTION
>> as a tunable.
>> This API provides support for configuring flow control storm prevention
>> timeout (msec).
>>
>> Signed-off-by: Inbar Karmy 
>> Cc: Michal Kubecek 
>> Cc: Andrew Lunn 
>> Signed-off-by: Saeed Mahameed 
>> ---
>>  include/uapi/linux/ethtool.h | 4 
>>  net/core/ethtool.c   | 6 ++
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
>> index 20da156aaf64..9dc63a14a747 100644
>> --- a/include/uapi/linux/ethtool.h
>> +++ b/include/uapi/linux/ethtool.h
>> @@ -217,10 +217,14 @@ struct ethtool_value {
>>  __u32   data;
>>  };
>>  
>> +#define PFC_STORM_PREVENTION_AUTO   0x
>> +#define PFC_STORM_PREVENTION_DISABLE0
>> +
>>  enum tunable_id {
>>  ETHTOOL_ID_UNSPEC,
>>  ETHTOOL_RX_COPYBREAK,
>>  ETHTOOL_TX_COPYBREAK,
>> +ETHTOOL_PFC_PREVENTION_TOUT,
> 
> Hi Inbar
> 
> Please could you add a comment here about the units. Ideally we want
> this file to be self documenting.

Thank you for the review, we will fix that.

> 
>  Andrew
>

Re: [net-next 03/15] net/mlx5e: PFC stall prevention support

2018-03-27 Thread Gal Pressman

On 25-Mar-18 19:18, Andrew Lunn wrote:
>>> Shouldn't you map a value of MLX5E_PFC_PREVEN_AUTO_TOUT_MSEC back to 
>>> PFC_STORM_PREVENTION_AUTO?
>>
>> We discussed this point internally, mapping MLX5E_PFC_PREVEN_AUTO_TOUT_MSEC 
>> (100) to
>> PFC_STORM_PREVENTION_AUTO might cause confusion when the user explicitly 
>> asks for 100msec timeout
>> and gets auto in his following query.
>> Also, this way the "auto" timeout is visible to the user, which might help 
>> him get an initial
>> clue of which values are recommended.
> 
> Yes, this is a fair point, which is why i asked the question. Either
> way, it can cause confusion. 'I configured it to auto, but it always
> returns 100, not auto.'
> 
> Whatever is decided, it should be consistent across drivers. So please
> add some documentation to the ethtool header file about what is
> expected.

We didn't want to limit other drivers implementation, but I agree that
consistency across drivers is important in this case.
We will find a proper place to document it.

> 
>   Andrew
>

[PATCH net-next] net: Call add/kill vid ndo on vlan filter feature toggling

2018-03-28 Thread Gal Pressman

NETIF_F_HW_VLAN_[CS]TAG_FILTER features require more than just a bit
flip in dev->features in order to keep the driver in a consistent state.
These features notify the driver of each added/removed vlan, but toggling
of vlan-filter does not notify the driver accordingly for each of the
existing vlans.

This patch implements a similar solution to NETIF_F_RX_UDP_TUNNEL_PORT
behavior (which notifies the driver about UDP ports in the same manner
that vids are reported).

Each toggling of the features propagates to the 8021q module, which
iterates over the vlans and call add/kill ndo accordingly.

Signed-off-by: Gal Pressman 
Reviewed-by: Tariq Toukan 
---
 include/linux/if_vlan.h   |  24 +++
 include/linux/netdevice.h |   4 ++
 net/8021q/vlan.c  |  21 ++
 net/8021q/vlan.h  |   3 ++
 net/8021q/vlan_core.c | 101 ++
 net/core/dev.c|  20 +
 6 files changed, 148 insertions(+), 25 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index c4a1cff..24d1976 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -83,6 +83,30 @@ static inline bool is_vlan_dev(const struct net_device *dev)
 #define skb_vlan_tag_get_id(__skb) ((__skb)->vlan_tci & VLAN_VID_MASK)
 #define skb_vlan_tag_get_prio(__skb)   ((__skb)->vlan_tci & VLAN_PRIO_MASK)
 
+static inline int vlan_get_rx_ctag_filter_info(struct net_device *dev)
+{
+   ASSERT_RTNL();
+   return 
notifier_to_errno(call_netdevice_notifiers(NETDEV_CVLAN_FILTER_PUSH_INFO, dev));
+}
+
+static inline void vlan_drop_rx_ctag_filter_info(struct net_device *dev)
+{
+   ASSERT_RTNL();
+   call_netdevice_notifiers(NETDEV_CVLAN_FILTER_DROP_INFO, dev);
+}
+
+static inline int vlan_get_rx_stag_filter_info(struct net_device *dev)
+{
+   ASSERT_RTNL();
+   return 
notifier_to_errno(call_netdevice_notifiers(NETDEV_SVLAN_FILTER_PUSH_INFO, dev));
+}
+
+static inline void vlan_drop_rx_stag_filter_info(struct net_device *dev)
+{
+   ASSERT_RTNL();
+   call_netdevice_notifiers(NETDEV_SVLAN_FILTER_DROP_INFO, dev);
+}
+
 /**
  * struct vlan_pcpu_stats - VLAN percpu rx/tx stats
  * @rx_packets: number of received packets
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2a2d9cf..da44dab 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2349,6 +2349,10 @@ enum netdev_cmd {
NETDEV_UDP_TUNNEL_PUSH_INFO,
NETDEV_UDP_TUNNEL_DROP_INFO,
NETDEV_CHANGE_TX_QUEUE_LEN,
+   NETDEV_CVLAN_FILTER_PUSH_INFO,
+   NETDEV_CVLAN_FILTER_DROP_INFO,
+   NETDEV_SVLAN_FILTER_PUSH_INFO,
+   NETDEV_SVLAN_FILTER_DROP_INFO,
 };
 const char *netdev_cmd_to_name(enum netdev_cmd cmd);
 
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index bd0ed39..4b33728 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -360,6 +360,7 @@ static int vlan_device_event(struct notifier_block *unused, 
unsigned long event,
struct vlan_dev_priv *vlan;
bool last = false;
LIST_HEAD(list);
+   int err;
 
if (is_vlan_dev(dev)) {
int err = __vlan_device_event(dev, event);
@@ -489,6 +490,26 @@ static int vlan_device_event(struct notifier_block 
*unused, unsigned long event,
vlan_group_for_each_dev(grp, i, vlandev)
call_netdevice_notifiers(event, vlandev);
break;
+
+   case NETDEV_CVLAN_FILTER_PUSH_INFO:
+   err = vlan_filter_push_vids(vlan_info, htons(ETH_P_8021Q));
+   if (err)
+   return notifier_from_errno(err);
+   break;
+
+   case NETDEV_CVLAN_FILTER_DROP_INFO:
+   vlan_filter_drop_vids(vlan_info, htons(ETH_P_8021Q));
+   break;
+
+   case NETDEV_SVLAN_FILTER_PUSH_INFO:
+   err = vlan_filter_push_vids(vlan_info, htons(ETH_P_8021AD));
+   if (err)
+   return notifier_from_errno(err);
+   break;
+
+   case NETDEV_SVLAN_FILTER_DROP_INFO:
+   vlan_filter_drop_vids(vlan_info, htons(ETH_P_8021AD));
+   break;
}
 
 out:
diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index a8ba510..e23aac3 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -97,6 +97,9 @@ static inline struct net_device *vlan_find_dev(struct 
net_device *real_dev,
if (((dev) = __vlan_group_get_device((grp), (i) / VLAN_N_VID, \
(i) % VLAN_N_VID)))
 
+int vlan_filter_push_vids(struct vlan_info *vlan_info, __be16 proto);
+void vlan_filter_drop_vids(struct vlan_info *vlan_info, __be16 proto);
+
 /* found in vlan_dev.c */
 void vlan_dev_set_ingress_priority(const struct net_device *dev,
   u32 skb_prio, u16 vlan_prio);
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 45c9bf5..c8d7abd 10064

[PATCH net-next 0/6] Replace WARN_ONCE usages with netdev_WARN_ONCE

2018-01-04 Thread Gal Pressman

Hi,
This series will fix an issue in netdev_WARN_ONCE, improve its formatting and
replace drivers' usage of WARN_ONCE to netdev_WARN_ONCE.

Driver specific patches were compilation tested, in addition, functional tested
on Mellanox NIC.

Gal Pressman (6):
  net: Fix netdev_WARN_ONCE macro
  net: No line break on netdev_WARN* formatting
  net/mlx5e: Replace WARN_ONCE with netdev_WARN_ONCE
  e1000: Replace WARN_ONCE with netdev_WARN_ONCE
  bnx2x: Replace WARN_ONCE with netdev_WARN_ONCE
  8139cp: Replace WARN_ONCE with netdev_WARN_ONCE

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  | 5 +++--
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c  | 9 -
 drivers/net/ethernet/realtek/8139cp.c| 4 ++--
 include/linux/netdevice.h| 6 +++---
 5 files changed, 14 insertions(+), 14 deletions(-)

-- 
2.7.4

[PATCH net-next 1/6] net: Fix netdev_WARN_ONCE macro

2018-01-04 Thread Gal Pressman

netdev_WARN_ONCE is broken (whoops..), fix it.

Fixes: 375ef2b1f0d0 ("net: Introduce netdev_*_once functions")
Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
---
 include/linux/netdevice.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 352066e..5ff1ef9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4407,8 +4407,8 @@ do {  
\
WARN(1, "netdevice: %s%s\n" format, netdev_name(dev),   \
 netdev_reg_state(dev), ##args)
 
-#define netdev_WARN_ONCE(dev, condition, format, arg...)   \
-   WARN_ONCE(1, "netdevice: %s%s\n" format, netdev_name(dev)   \
+#define netdev_WARN_ONCE(dev, format, args...) \
+   WARN_ONCE(1, "netdevice: %s%s\n" format, netdev_name(dev),  \
  netdev_reg_state(dev), ##args)
 
 /* netif printk helpers, similar to netdev_printk */
-- 
2.7.4

[PATCH net-next 2/6] net: No line break on netdev_WARN* formatting

2018-01-04 Thread Gal Pressman

Remove the unnecessary line break between the netdev name and reg state
to the actual message that should be printed.

For example, this:
[86730.307236] [ cut here ]
[86730.313496] netdevice: enp27s0f0
Message from the driver
[...]

Will be replaced with:
[86770.259289] [ cut here ]
[86770.265191] netdevice: enp27s0f0: Message from the driver
[...]

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
---
 include/linux/netdevice.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5ff1ef9..87211c4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4404,11 +4404,11 @@ do {
\
  * file/line information and a backtrace.
  */
 #define netdev_WARN(dev, format, args...)  \
-   WARN(1, "netdevice: %s%s\n" format, netdev_name(dev),   \
+   WARN(1, "netdevice: %s%s: " format, netdev_name(dev),   \
 netdev_reg_state(dev), ##args)
 
 #define netdev_WARN_ONCE(dev, format, args...) \
-   WARN_ONCE(1, "netdevice: %s%s\n" format, netdev_name(dev),  \
+   WARN_ONCE(1, "netdevice: %s%s: " format, netdev_name(dev),  \
  netdev_reg_state(dev), ##args)
 
 /* netif printk helpers, similar to netdev_printk */
-- 
2.7.4

[PATCH net-next 3/6] net/mlx5e: Replace WARN_ONCE with netdev_WARN_ONCE

2018-01-04 Thread Gal Pressman

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 5334b2d..38803e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -495,8 +495,8 @@ static inline void mlx5e_poll_ico_single_cqe(struct 
mlx5e_cq *cq,
mlx5_cqwq_pop(&cq->wq);
 
if (unlikely((cqe->op_own >> 4) != MLX5_CQE_REQ)) {
-   WARN_ONCE(true, "mlx5e: Bad OP in ICOSQ CQE: 0x%x\n",
- cqe->op_own);
+   netdev_WARN_ONCE(cq->channel->netdev,
+"Bad OP in ICOSQ CQE: 0x%x\n", cqe->op_own);
return;
}
 
@@ -506,9 +506,8 @@ static inline void mlx5e_poll_ico_single_cqe(struct 
mlx5e_cq *cq,
}
 
if (unlikely(icowi->opcode != MLX5_OPCODE_NOP))
-   WARN_ONCE(true,
- "mlx5e: Bad OPCODE in ICOSQ WQE info: 0x%x\n",
- icowi->opcode);
+   netdev_WARN_ONCE(cq->channel->netdev,
+"Bad OPCODE in ICOSQ WQE info: 0x%x\n", 
icowi->opcode);
 }
 
 static void mlx5e_poll_ico_cq(struct mlx5e_cq *cq, struct mlx5e_rq *rq)
-- 
2.7.4

[PATCH net-next 4/6] e1000: Replace WARN_ONCE with netdev_WARN_ONCE

2018-01-04 Thread Gal Pressman

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
Cc: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c 
b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index 3b3983a..dc71e87 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -1838,8 +1838,8 @@ static void e1000_get_ethtool_stats(struct net_device 
*netdev,
p = (char *)adapter + stat->stat_offset;
break;
default:
-   WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n",
- stat->type, i);
+   netdev_WARN_ONCE(netdev, "Invalid E1000 stat type: %u 
index %d\n",
+stat->type, i);
continue;
}
 
-- 
2.7.4

[PATCH net-next 6/6] 8139cp: Replace WARN_ONCE with netdev_WARN_ONCE

2018-01-04 Thread Gal Pressman

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
Cc: Realtek linux nic maintainers 
---
 drivers/net/ethernet/realtek/8139cp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index e7ab23e..81045df 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -748,8 +748,8 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
mss = skb_shinfo(skb)->gso_size;
 
if (mss > MSSMask) {
-   WARN_ONCE(1, "Net bug: GSO size %d too large for 8139CP\n",
- mss);
+   netdev_WARN_ONCE(dev, "Net bug: GSO size %d too large for 
8139CP\n",
+mss);
goto out_dma_error;
}
 
-- 
2.7.4

[PATCH net-next 5/6] bnx2x: Replace WARN_ONCE with netdev_WARN_ONCE

2018-01-04 Thread Gal Pressman

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
Cc: Ariel Elior 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 57eb26d..d7c98e8 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -738,8 +738,9 @@ static void bnx2x_gro_receive(struct bnx2x *bp, struct 
bnx2x_fastpath *fp,
bnx2x_gro_csum(bp, skb, bnx2x_gro_ipv6_csum);
break;
default:
-   WARN_ONCE(1, "Error: FW GRO supports only IPv4/IPv6, 
not 0x%04x\n",
- be16_to_cpu(skb->protocol));
+   netdev_WARN_ONCE(bp->dev,
+"Error: FW GRO supports only 
IPv4/IPv6, not 0x%04x\n",
+be16_to_cpu(skb->protocol));
}
}
 #endif
-- 
2.7.4

Re: [PATCH net-next 1/6] net: Fix netdev_WARN_ONCE macro

2018-01-07 Thread Gal Pressman

On 05-Jan-18 20:35, David Miller wrote:
> From: Gal Pressman 
> Date: Thu,  4 Jan 2018 16:55:58 +0200
> 
>> netdev_WARN_ONCE is broken (whoops..), fix it.
> 
> You don't say how it is broken and how you are fixing that
> brokenness, and you must do this.
> 
> Yes, if I stare at your patch for a few minutes I can see
> the missing "," but you could have saved everyone reviewing
> this patch (now and in the future) by writing a proper and
> complete commit message.
> 

Thank you for reviewing,
I will fix the commit message and submit v2.

[PATCH net-next v2 4/6] e1000: Replace WARN_ONCE with netdev_WARN_ONCE

2018-01-07 Thread Gal Pressman

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
Cc: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c 
b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index 3b3983a..dc71e87 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -1838,8 +1838,8 @@ static void e1000_get_ethtool_stats(struct net_device 
*netdev,
p = (char *)adapter + stat->stat_offset;
break;
default:
-   WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n",
- stat->type, i);
+   netdev_WARN_ONCE(netdev, "Invalid E1000 stat type: %u 
index %d\n",
+stat->type, i);
continue;
}
 
-- 
2.7.4

[PATCH net-next v2 0/6] Replace WARN_ONCE usages with netdev_WARN_ONCE

2018-01-07 Thread Gal Pressman

Hi,
This series will fix an issue in netdev_WARN_ONCE, improve its formatting and
replace drivers' usage of WARN_ONCE to netdev_WARN_ONCE.

Driver specific patches were compilation tested, in addition, functional tested
on Mellanox NIC.

v1->v2:
- Addressed commit message comments in patch #1

Gal Pressman (6):
  net: Fix netdev_WARN_ONCE macro
  net: No line break on netdev_WARN* formatting
  net/mlx5e: Replace WARN_ONCE with netdev_WARN_ONCE
  e1000: Replace WARN_ONCE with netdev_WARN_ONCE
  bnx2x: Replace WARN_ONCE with netdev_WARN_ONCE
  8139cp: Replace WARN_ONCE with netdev_WARN_ONCE

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  | 5 +++--
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c  | 9 -
 drivers/net/ethernet/realtek/8139cp.c| 4 ++--
 include/linux/netdevice.h| 6 +++---
 5 files changed, 14 insertions(+), 14 deletions(-)

-- 
2.7.4

[PATCH net-next v2 6/6] 8139cp: Replace WARN_ONCE with netdev_WARN_ONCE

2018-01-07 Thread Gal Pressman

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
Cc: Realtek linux nic maintainers 
---
 drivers/net/ethernet/realtek/8139cp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index e7ab23e..81045df 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -748,8 +748,8 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
mss = skb_shinfo(skb)->gso_size;
 
if (mss > MSSMask) {
-   WARN_ONCE(1, "Net bug: GSO size %d too large for 8139CP\n",
- mss);
+   netdev_WARN_ONCE(dev, "Net bug: GSO size %d too large for 
8139CP\n",
+mss);
goto out_dma_error;
}
 
-- 
2.7.4

[PATCH net-next v2 3/6] net/mlx5e: Replace WARN_ONCE with netdev_WARN_ONCE

2018-01-07 Thread Gal Pressman

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 5334b2d..38803e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -495,8 +495,8 @@ static inline void mlx5e_poll_ico_single_cqe(struct 
mlx5e_cq *cq,
mlx5_cqwq_pop(&cq->wq);
 
if (unlikely((cqe->op_own >> 4) != MLX5_CQE_REQ)) {
-   WARN_ONCE(true, "mlx5e: Bad OP in ICOSQ CQE: 0x%x\n",
- cqe->op_own);
+   netdev_WARN_ONCE(cq->channel->netdev,
+"Bad OP in ICOSQ CQE: 0x%x\n", cqe->op_own);
return;
}
 
@@ -506,9 +506,8 @@ static inline void mlx5e_poll_ico_single_cqe(struct 
mlx5e_cq *cq,
}
 
if (unlikely(icowi->opcode != MLX5_OPCODE_NOP))
-   WARN_ONCE(true,
- "mlx5e: Bad OPCODE in ICOSQ WQE info: 0x%x\n",
- icowi->opcode);
+   netdev_WARN_ONCE(cq->channel->netdev,
+"Bad OPCODE in ICOSQ WQE info: 0x%x\n", 
icowi->opcode);
 }
 
 static void mlx5e_poll_ico_cq(struct mlx5e_cq *cq, struct mlx5e_rq *rq)
-- 
2.7.4

[PATCH net-next v2 5/6] bnx2x: Replace WARN_ONCE with netdev_WARN_ONCE

2018-01-07 Thread Gal Pressman

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
Cc: Ariel Elior 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 57eb26d..d7c98e8 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -738,8 +738,9 @@ static void bnx2x_gro_receive(struct bnx2x *bp, struct 
bnx2x_fastpath *fp,
bnx2x_gro_csum(bp, skb, bnx2x_gro_ipv6_csum);
break;
default:
-   WARN_ONCE(1, "Error: FW GRO supports only IPv4/IPv6, 
not 0x%04x\n",
- be16_to_cpu(skb->protocol));
+   netdev_WARN_ONCE(bp->dev,
+"Error: FW GRO supports only 
IPv4/IPv6, not 0x%04x\n",
+be16_to_cpu(skb->protocol));
}
}
 #endif
-- 
2.7.4

[PATCH net-next v2 2/6] net: No line break on netdev_WARN* formatting

2018-01-07 Thread Gal Pressman

Remove the unnecessary line break between the netdev name and reg state
to the actual message that should be printed.

For example, this:
[86730.307236] [ cut here ]
[86730.313496] netdevice: enp27s0f0
Message from the driver
[...]

Will be replaced with:
[86770.259289] [ cut here ]
[86770.265191] netdevice: enp27s0f0: Message from the driver
[...]

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
---
 include/linux/netdevice.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5ff1ef9..87211c4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4404,11 +4404,11 @@ do {
\
  * file/line information and a backtrace.
  */
 #define netdev_WARN(dev, format, args...)  \
-   WARN(1, "netdevice: %s%s\n" format, netdev_name(dev),   \
+   WARN(1, "netdevice: %s%s: " format, netdev_name(dev),   \
 netdev_reg_state(dev), ##args)
 
 #define netdev_WARN_ONCE(dev, format, args...) \
-   WARN_ONCE(1, "netdevice: %s%s\n" format, netdev_name(dev),  \
+   WARN_ONCE(1, "netdevice: %s%s: " format, netdev_name(dev),  \
  netdev_reg_state(dev), ##args)
 
 /* netif printk helpers, similar to netdev_printk */
-- 
2.7.4

[PATCH net-next v2 1/6] net: Fix netdev_WARN_ONCE macro

2018-01-07 Thread Gal Pressman

netdev_WARN_ONCE is broken (whoops..), this fix will remove the
unnecessary "condition" parameter, add the missing comma and change
"arg" to "args".

Fixes: 375ef2b1f0d0 ("net: Introduce netdev_*_once functions")
Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
---
 include/linux/netdevice.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 352066e..5ff1ef9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4407,8 +4407,8 @@ do {  
\
WARN(1, "netdevice: %s%s\n" format, netdev_name(dev),   \
 netdev_reg_state(dev), ##args)
 
-#define netdev_WARN_ONCE(dev, condition, format, arg...)   \
-   WARN_ONCE(1, "netdevice: %s%s\n" format, netdev_name(dev)   \
+#define netdev_WARN_ONCE(dev, format, args...) \
+   WARN_ONCE(1, "netdevice: %s%s\n" format, netdev_name(dev),  \
  netdev_reg_state(dev), ##args)
 
 /* netif printk helpers, similar to netdev_printk */
-- 
2.7.4

Re: [PATCH net-next v2 1/6] net: Fix netdev_WARN_ONCE macro

2018-01-09 Thread Gal Pressman

On 09-Jan-18 05:05, David Miller wrote:
> From: Joe Perches 
> Date: Mon, 08 Jan 2018 18:42:01 -0800
> 
>> On Sun, 2018-01-07 at 12:08 +0200, Gal Pressman wrote:
>>> netdev_WARN_ONCE is broken (whoops..), this fix will remove the
>>> unnecessary "condition" parameter, add the missing comma and change
>>> "arg" to "args".
>>>
>>> Fixes: 375ef2b1f0d0 ("net: Introduce netdev_*_once functions")
>>> Signed-off-by: Gal Pressman 
>>> Reviewed-by: Saeed Mahameed 
>>> ---
>>>  include/linux/netdevice.h | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 352066e..5ff1ef9 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -4407,8 +4407,8 @@ do {  
>>> \
>>> WARN(1, "netdevice: %s%s\n" format, netdev_name(dev),   \
>>>  netdev_reg_state(dev), ##args)
>>>  
>>> -#define netdev_WARN_ONCE(dev, condition, format, arg...)   \
>>> -   WARN_ONCE(1, "netdevice: %s%s\n" format, netdev_name(dev)   \
>>> +#define netdev_WARN_ONCE(dev, format, args...) 
>>> \
>>> +   WARN_ONCE(1, "netdevice: %s%s\n" format, netdev_name(dev),  \
>>
>> You sure you want the newline before the format?
> 
> Hmmm, Gal please send me a relative fix for this.
> 

The newline is removed in the next patch, which handles the formatting of both 
macros.
Do you want to remove it as part of this bug fix?

gso_type field for LRO packets

2017-12-12 Thread Gal Pressman

Hi all,
I've been exploring hardware LRO related code in the stack and came across an 
inconsistency I don't quite understand.
When receiving a hardware LRO packet, most of the drivers set 
skb_shinfo(skb)->gso_type (For example [1]) in order to prevent macvtap from 
breaking.
According to include/linux/skbuff.h +/skb_warn_if_lro, LRO should set gso_size 
but not gso_type, which contradicts the drivers implementation and makes the 
check redundant.

Can anyone advise what is the correct driver implementation? should we change 
skb_warn_if_lro() implementation?

Regards,
Gal

[1] commit bd69ba798e21 ("qlcnic: set gso_type")

[PATCH iproute2 1/3] iplink: Validate minimum tx rate is less than maximum tx rate

2018-01-16 Thread Gal Pressman

According to the documentation (man ip-link), the minimum TXRATE should
be always <= Maximum TXRATE, but commit f89a2a05ffa9 ("Add support to
configure SR-IOV VF minimum and maximum Tx rate through ip tool") didn't
enforce it.

Fixes: f89a2a05ffa9 ("Add support to configure SR-IOV VF minimum and maximum Tx 
rate through ip tool")
Cc: Sucheta Chakraborty 
Signed-off-by: Gal Pressman 
Reviewed-by: Eran Ben Elisha 
Reviewed-by: Leon Romanovsky 
---
 ip/iplink.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/ip/iplink.c b/ip/iplink.c
index 4c96711..22c9a29 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -539,6 +539,14 @@ static int iplink_parse_vf(int vf, int *argcp, char 
***argvp,
if (tivt.max_tx_rate == -1)
tivt.max_tx_rate = tmax;
}
+
+   if (tivt.max_tx_rate && tivt.min_tx_rate > tivt.max_tx_rate) {
+   fprintf(stderr,
+   "Invalid min_tx_rate %d - must be <= 
max_tx_rate %d\n",
+   tivt.min_tx_rate, tivt.max_tx_rate);
+   return -1;
+   }
+
addattr_l(&req->n, sizeof(*req), IFLA_VF_RATE, &tivt,
  sizeof(tivt));
}
-- 
2.7.4

[PATCH iproute2 0/3] Fixes for minimum/maximum VF rate API

2018-01-16 Thread Gal Pressman

Hi all,
The following patches will fix some issues with the "new" VF rate API, and
add some clarifications to the documentation.

Thanks,
Gal

Gal Pressman (3):
  iplink: Validate minimum tx rate is less than maximum tx rate
  ipaddress: Make sure VF min/max rate API is supported before using it
  man: Document the meaning of zero in min/max_tx_rate parameters

 ip/ipaddress.c| 6 ++
 ip/iplink.c   | 8 
 man/man8/ip-link.8.in | 2 ++
 3 files changed, 16 insertions(+)

-- 
2.7.4

[PATCH iproute2 3/3] man: Document the meaning of zero in min/max_tx_rate parameters

2018-01-16 Thread Gal Pressman

Zero value in min/max_tx_rate has a special meaning of no rate limit,
document it.

Signed-off-by: Gal Pressman 
Reviewed-by: Eran Ben Elisha 
Reviewed-by: Leon Romanovsky 
---
 man/man8/ip-link.8.in | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index ff6bb9a..5e49850 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -1595,6 +1595,7 @@ option instead.
 .sp
 .BI max_tx_rate " TXRATE"
 - change the allowed maximum transmit bandwidth, in Mbps, for the specified VF.
+Setting this parameter to 0 disables rate limiting.
 .B vf
 parameter must be specified.
 
@@ -1602,6 +1603,7 @@ parameter must be specified.
 .BI min_tx_rate " TXRATE"
 - change the allowed minimum transmit bandwidth, in Mbps, for the specified VF.
 Minimum TXRATE should be always <= Maximum TXRATE.
+Setting this parameter to 0 disables rate limiting.
 .B vf
 parameter must be specified.
 
-- 
2.7.4

[PATCH iproute2 2/3] ipaddress: Make sure VF min/max rate API is supported before using it

2018-01-16 Thread Gal Pressman

When using the new minimum rate API and providing only one parameter
(minimum rate/maximum rate), we query the VF min and max rate regardless
of kernel support.
This resulted in segmentation fault in ipaddr_loop_each_vf, which tries
to access NULL pointer.

This patch identifies such cases by testing the VF table for NULL
pointer in IFLA_VF_RATE, and aborts the operation.
Aborting on the first VF is valid since if the kernel does not support
the new API for the first VF, it will not support it for the other VFs
as well.

Fixes: f89a2a05ffa9 ("Add support to configure SR-IOV VF minimum and maximum Tx 
rate through ip tool")
Cc: Sucheta Chakraborty 
Signed-off-by: Gal Pressman 
Reviewed-by: Eran Ben Elisha 
Reviewed-by: Leon Romanovsky 
---
 ip/ipaddress.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index f150d91..953d673 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -2251,6 +2251,12 @@ ipaddr_loop_each_vf(struct rtattr *tb[], int vfnum, int 
*min, int *max)
 
for (i = RTA_DATA(vflist); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
parse_rtattr_nested(vf, IFLA_VF_MAX, i);
+
+   if (!vf[IFLA_VF_RATE]) {
+   fprintf(stderr, "VF min/max rate API not supported\n");
+   exit(1);
+   }
+
vf_rate = RTA_DATA(vf[IFLA_VF_RATE]);
if (vf_rate->vf == vfnum) {
*min = vf_rate->min_tx_rate;
-- 
2.7.4

Re: after adding > 200vlans to mlx nic no traffic

2018-01-30 Thread Gal Pressman

On 30-Jan-18 02:29, Paweł Staszewski wrote:
> Weird thing with mellanox mlx5 (connectx-4) kernel 4.15-rc9 - from net-next 
> davem tree
> 
> 
> 
> after:
> 
> ip link add link enp175s0f1 name vlan1538 type vlan id 1538
> 
> ip link set up dev vlan1538
> 
> 
> traffic on vlan is working
> 
> 
> But after
> 
> VID="1160 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 
> 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 
> 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 
> 1493 1494 1495 1496 1497 1498 1499 150
> 0 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 
> 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 
> 1531 1532 1534 1535 1394 1393 1550 1500 1526 1536 1537 1538 1539 1540 1542 
> 1541 1543 1544 1801 1546 1547 1548 1
> 549 1735 3132 3143 3104 3125 3103 3115 3134 3105 3113 3141 4009 3144 3130 
> 1803 3146 3148 3109 1551 1552 1553 1554 1555 1556 1558 1559 1560 1561 1562 
> 1563 1564 1565 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 
> 1579 1580 1581 1582 1583 1584 1585 1586
>  1587 1588 1589 1591 1592 1593 1594 1595 1596 1597 1598 1599 1557 1545 2001 
> 250 4043 1806 1600 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 
> 1613 1614 1615 1616 1617 1618 1619 1620 1621 1625 1626 1627 1628 1629 1630 
> 1631 1632 1634 1635 1636 1640 1641 164
> 2 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 
> 1658 1659 1660 1661 1662 1663 1664 1665 1601 1666 1667 1668 1669 1670 1671 
> 1672 1673 1674 1676 1677 1678 1680 1681 1682 1683 1684 1685 1686 1687 1688 
> 1689 1690 1691 1692 1693 1694 1696 1
> 697 1698 1712 1817 1869 1810 1814 1818 1855 1856 1857 1858 1859 1860 1861 
> 1862 1863 1864 1865 1866 1867 1868 1870 1871 1872 1873 1874 1875 1876 1877 
> 1878 1879 1880 1885 1890 1891 1892 1893 1894 1895 1898 1881 2190 2191 2192 
> 2193 2194 2195 2196 2197 2198 2199 2541
>  2542 2543 2544 2545 2546 2547 2548 2549 2550 2290"
> for i in $VID
> do
>     ip link add link enp175s0f1 name vlan$i type vlan id $i
> done
> 
> 
> And setting vlan 1538 up - there is no received traffic on this vlan.
> 
> 
> 
> So searching for broken things (last time same problem was with ixgbe)
> 
> ethtool -K enp175s0f1 rx-vlan-filter off
> 
> 
> And all vlans attached to this device start working
> 
> 
> 
Hi Pawel,
I tried to reproduce the issue in our local setups without success.
Can you please provide more information? are there any errors in dmesg? did you 
configure anything else that might be relevant to this issue?
Do you know if this is a new degradation to 4.15-rc9?

Try to send traffic over the vlans and sample the ethtool counters (ethtool -S 
enp175s0f1) of the receiver mlx5 interface over time,
this might help us trace where the packets drop.

Thank you for reporting this,
Gal

Re: after adding > 200vlans to mlx nic no traffic

2018-01-31 Thread Gal Pressman

On 30-Jan-18 17:57, Paweł Staszewski wrote:
> 
> 
> W dniu 30.01.2018 o 15:57, Gal Pressman pisze:
>> On 30-Jan-18 02:29, Paweł Staszewski wrote:
>>> Weird thing with mellanox mlx5 (connectx-4) kernel 4.15-rc9 - from net-next 
>>> davem tree
>>>
>>>
>>>
>>> after:
>>>
>>> ip link add link enp175s0f1 name vlan1538 type vlan id 1538
>>>
>>> ip link set up dev vlan1538
>>>
>>>
>>> traffic on vlan is working
>>>
>>>
>>> But after
>>>
>>> VID="1160 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 
>>> 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 
>>> 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 
>>> 1493 1494 1495 1496 1497 1498 1499 150
>>> 0 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 
>>> 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 
>>> 1530 1531 1532 1534 1535 1394 1393 1550 1500 1526 1536 1537 1538 1539 1540 
>>> 1542 1541 1543 1544 1801 1546 1547 1548 1
>>> 549 1735 3132 3143 3104 3125 3103 3115 3134 3105 3113 3141 4009 3144 3130 
>>> 1803 3146 3148 3109 1551 1552 1553 1554 1555 1556 1558 1559 1560 1561 1562 
>>> 1563 1564 1565 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 
>>> 1579 1580 1581 1582 1583 1584 1585 1586
>>>   1587 1588 1589 1591 1592 1593 1594 1595 1596 1597 1598 1599 1557 1545 
>>> 2001 250 4043 1806 1600 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 
>>> 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1625 1626 1627 1628 1629 
>>> 1630 1631 1632 1634 1635 1636 1640 1641 164
>>> 2 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 
>>> 1657 1658 1659 1660 1661 1662 1663 1664 1665 1601 1666 1667 1668 1669 1670 
>>> 1671 1672 1673 1674 1676 1677 1678 1680 1681 1682 1683 1684 1685 1686 1687 
>>> 1688 1689 1690 1691 1692 1693 1694 1696 1
>>> 697 1698 1712 1817 1869 1810 1814 1818 1855 1856 1857 1858 1859 1860 1861 
>>> 1862 1863 1864 1865 1866 1867 1868 1870 1871 1872 1873 1874 1875 1876 1877 
>>> 1878 1879 1880 1885 1890 1891 1892 1893 1894 1895 1898 1881 2190 2191 2192 
>>> 2193 2194 2195 2196 2197 2198 2199 2541
>>>   2542 2543 2544 2545 2546 2547 2548 2549 2550 2290"
>>> for i in $VID
>>> do
>>>  ip link add link enp175s0f1 name vlan$i type vlan id $i
>>> done
>>>
>>>
>>> And setting vlan 1538 up - there is no received traffic on this vlan.
>>>
>>>
>>>
>>> So searching for broken things (last time same problem was with ixgbe)
>>>
>>> ethtool -K enp175s0f1 rx-vlan-filter off
>>>
>>>
>>> And all vlans attached to this device start working
>>>
>>>
>>>
>> Hi Pawel,
>> I tried to reproduce the issue in our local setups without success.
>> Can you please provide more information? are there any errors in dmesg? did 
>> you configure anything else that might be relevant to this issue?
>> Do you know if this is a new degradation to 4.15-rc9?
> previous kernel used was 4.13.2 - without this problem.
> 
> current kernel is net-next 4.15.0-rc9+
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
> 
>> Try to send traffic over the vlans and sample the ethtool counters (ethtool 
>> -S enp175s0f1) of the receiver mlx5 interface over time,
>> this might help us trace where the packets drop.
> Yes traffic is going out from interface - bot there is nothing on RX - 
> tcpdump shows no packets arriving to interface
> 
I am running 4.15.0-rc9+ from Dave's tree, currently on commit 91e6dd828425 
("ipmr: Fix ptrdiff_t print formatting").
Tested with the commands you provided and same configuration, the issue does 
not reproduce on our setups.

Did you see any errors in dmesg? anything coming from mlx5 driver?
Which firmware version are you using?
Please provide your .config file, perhaps it is making the difference.

Attaching the output of mlxdump would help a lot in this case, please also 
attach the output of the following command (replace :00:08.0 with the 
device pci):
mlxdump -d :00:08.0 fsdump --type FT --no_zero=true

In case mlxdump is not installed, you can get it from 
http://www.mellanox.com/page/management_tools.

Thanks,
Gal

Re: after adding > 200vlans to mlx nic no traffic

2018-02-04 Thread Gal Pressman

On 01-Feb-18 10:25, Paweł Staszewski wrote:
> 
> 
> W dniu 31.01.2018 o 13:19, Gal Pressman pisze:
>> On 30-Jan-18 17:57, Paweł Staszewski wrote:
>>>
>>> W dniu 30.01.2018 o 15:57, Gal Pressman pisze:
>>>> On 30-Jan-18 02:29, Paweł Staszewski wrote:
>>>>> Weird thing with mellanox mlx5 (connectx-4) kernel 4.15-rc9 - from 
>>>>> net-next davem tree
>>>>>
>>>>>
>>>>>
>>>>> after:
>>>>>
>>>>> ip link add link enp175s0f1 name vlan1538 type vlan id 1538
>>>>>
>>>>> ip link set up dev vlan1538
>>>>>
>>>>>
>>>>> traffic on vlan is working
>>>>>
>>>>>
>>>>> But after
>>>>>
>>>>> VID="1160 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 
>>>>> 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 
>>>>> 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 
>>>>> 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 150
>>>>> 0 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 
>>>>> 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 
>>>>> 1529 1530 1531 1532 1534 1535 1394 1393 1550 1500 1526 1536 1537 1538 
>>>>> 1539 1540 1542 1541 1543 1544 1801 1546 1547 1548 1
>>>>> 549 1735 3132 3143 3104 3125 3103 3115 3134 3105 3113 3141 4009 3144 3130 
>>>>> 1803 3146 3148 3109 1551 1552 1553 1554 1555 1556 1558 1559 1560 1561 
>>>>> 1562 1563 1564 1565 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 
>>>>> 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586
>>>>>    1587 1588 1589 1591 1592 1593 1594 1595 1596 1597 1598 1599 1557 1545 
>>>>> 2001 250 4043 1806 1600 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 
>>>>> 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1625 1626 1627 1628 
>>>>> 1629 1630 1631 1632 1634 1635 1636 1640 1641 164
>>>>> 2 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 
>>>>> 1657 1658 1659 1660 1661 1662 1663 1664 1665 1601 1666 1667 1668 1669 
>>>>> 1670 1671 1672 1673 1674 1676 1677 1678 1680 1681 1682 1683 1684 1685 
>>>>> 1686 1687 1688 1689 1690 1691 1692 1693 1694 1696 1
>>>>> 697 1698 1712 1817 1869 1810 1814 1818 1855 1856 1857 1858 1859 1860 1861 
>>>>> 1862 1863 1864 1865 1866 1867 1868 1870 1871 1872 1873 1874 1875 1876 
>>>>> 1877 1878 1879 1880 1885 1890 1891 1892 1893 1894 1895 1898 1881 2190 
>>>>> 2191 2192 2193 2194 2195 2196 2197 2198 2199 2541
>>>>>    2542 2543 2544 2545 2546 2547 2548 2549 2550 2290"
>>>>> for i in $VID
>>>>> do
>>>>>   ip link add link enp175s0f1 name vlan$i type vlan id $i
>>>>> done
>>>>>
>>>>>
>>>>> And setting vlan 1538 up - there is no received traffic on this vlan.
>>>>>
>>>>>
>>>>>
>>>>> So searching for broken things (last time same problem was with ixgbe)
>>>>>
>>>>> ethtool -K enp175s0f1 rx-vlan-filter off
>>>>>
>>>>>
>>>>> And all vlans attached to this device start working
>>>>>
>>>>>
>>>>>
>>>> Hi Pawel,
>>>> I tried to reproduce the issue in our local setups without success.
>>>> Can you please provide more information? are there any errors in dmesg? 
>>>> did you configure anything else that might be relevant to this issue?
>>>> Do you know if this is a new degradation to 4.15-rc9?
>>> previous kernel used was 4.13.2 - without this problem.
>>>
>>> current kernel is net-next 4.15.0-rc9+
>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
>>>
>>>> Try to send traffic over the vlans and sample the ethtool counters 
>>>> (ethtool -S enp175s0f1) of the receiver mlx5 interface over time,
>>>> this might help us trace where the packets drop.
>>> Yes traffic is going out from interface - bot there is nothing on RX - 
>>> tcpdump shows no packets arriving to interface
>>>
>> I am running 4.15.0-rc9+ from Dave's tree, currently on commit 91e6dd828425 
>> ("ipmr: Fix ptrdiff_t print formatting").
>> Tested with the commands you provided and same configuration, the iss

[PATCH RFC 1/2] ethtool: Add get actual port speed

2016-11-02 Thread Gal Pressman

Add an additional actual speed field when using ethtool DEVNAME.
Actual speed will show the actual bandwidth exposed for the machine,
which may be different from the HCA operating speed.

Signed-off-by: Gal Pressman 
---
 include/linux/ethtool.h  |  1 +
 include/uapi/linux/ethtool.h |  2 ++
 net/core/ethtool.c   | 20 
 3 files changed, 23 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 9ded8c6..215baa1 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -311,6 +311,7 @@ struct ethtool_ops {
void(*set_msglevel)(struct net_device *, u32);
int (*nway_reset)(struct net_device *);
u32 (*get_link)(struct net_device *);
+   int (*get_actual_speed)(struct net_device *);
int (*get_eeprom_len)(struct net_device *);
int (*get_eeprom)(struct net_device *,
  struct ethtool_eeprom *, u8 *);
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 099a420..63057a2 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1315,6 +1315,8 @@ struct ethtool_per_queue_op {
 #define ETHTOOL_GLINKSETTINGS  0x004c /* Get ethtool_link_settings */
 #define ETHTOOL_SLINKSETTINGS  0x004d /* Set ethtool_link_settings */
 
+#define ETHTOOL_GASPD  0x004e /* Get port actual speed */
+
 
 /* compatibility with older code */
 #define SPARC_ETH_GSET ETHTOOL_GSET
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 9774898..a1921fd 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1516,6 +1516,22 @@ static int ethtool_get_link(struct net_device *dev, char 
__user *useraddr)
return 0;
 }
 
+static int ethtool_get_actual_speed(struct net_device *dev,
+   char __user *useraddr)
+{
+   struct ethtool_value edata = { .cmd = ETHTOOL_GASPD };
+
+   if (!dev->ethtool_ops->get_actual_speed)
+   return -EOPNOTSUPP;
+
+   edata.data = dev->ethtool_ops->get_actual_speed(dev);
+
+   if (copy_to_user(useraddr, &edata, sizeof(edata)))
+   return -EFAULT;
+
+   return 0;
+}
+
 static int ethtool_get_any_eeprom(struct net_device *dev, void __user 
*useraddr,
  int (*getter)(struct net_device *,
struct ethtool_eeprom *, u8 *),
@@ -2450,6 +2466,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
case ETHTOOL_GDRVINFO:
case ETHTOOL_GMSGLVL:
case ETHTOOL_GLINK:
+   case ETHTOOL_GASPD:
case ETHTOOL_GCOALESCE:
case ETHTOOL_GRINGPARAM:
case ETHTOOL_GPAUSEPARAM:
@@ -2531,6 +2548,9 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
case ETHTOOL_GLINK:
rc = ethtool_get_link(dev, useraddr);
break;
+   case ETHTOOL_GASPD:
+   rc = ethtool_get_actual_speed(dev, useraddr);
+   break;
case ETHTOOL_GEEPROM:
rc = ethtool_get_eeprom(dev, useraddr);
break;
-- 
2.7.4

[PATCH RFC 2/2] net/mlx5e: Add support for ethtool get actual speed callback

2016-11-02 Thread Gal Pressman

ethtool DEVNAME will now show actual port speed in addition
to physical port speed.

Signed-off-by: Gal Pressman 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 27ff401..933b687 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1504,9 +1504,16 @@ static int mlx5e_set_rxnfc(struct net_device *dev, 
struct ethtool_rxnfc *cmd)
return err;
 }
 
+static int mlx5e_get_actual_speed(struct net_device *netdev)
+{
+   /* TODO: FW command to query the actual speed */
+   return SPEED_25000;
+}
+
 const struct ethtool_ops mlx5e_ethtool_ops = {
.get_drvinfo   = mlx5e_get_drvinfo,
.get_link  = ethtool_op_get_link,
+   .get_actual_speed  = mlx5e_get_actual_speed,
.get_strings   = mlx5e_get_strings,
.get_sset_count= mlx5e_get_sset_count,
.get_ethtool_stats = mlx5e_get_ethtool_stats,
-- 
2.7.4

[PATCH RFC 0/2] ethtool: Add actual port speed reporting

2016-11-02 Thread Gal Pressman

Sending RFC to get feedback for the following ethtool proposal:

In some cases such as virtual machines and multi functions (SR-IOV), the actual
bandwidth exposed for each machine is not accurately shown in ethtool.
Currently ethtool shows only physical port link speed.
In our case we would like to show the virtual port operational link speed
which in some cases is less than the physical port speed.

This will give users better visibility for the actual speed running on their 
device.

$ ethtool ens6
...
Speed: 5Mb/s
Actual speed: 25000Mb/s

Gal Pressman (2):
  ethtool: Add get actual port speed support
  net/mlx5e: Add support for ethtool get actual speed callback

 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |  7 +++
 include/linux/ethtool.h  |  1 +
 include/uapi/linux/ethtool.h |  2 ++
 net/core/ethtool.c   | 20 
 4 files changed, 30 insertions(+)

-- 
2.7.4

Re: [RFC PATCH net-next] net: ethtool: add support for forward error correction modes

2016-11-03 Thread Gal Pressman



On 25/10/2016 05:50, Vidya Sagar Ravipati wrote:
> SET FEC option:
> root@tor: ethtool --set-fec  swp1 encoding [off | RS | BaseR | auto] autoneg 
> [off | on]
> 
> Encoding: Types of encoding
> Off:  Turning off any encoding
> RS :  enforcing RS-FEC encoding on supported speeds
> BaseR  :  enforcing Base R encoding on supported speeds
> Auto   :  Default FEC settings  for  divers , and would represent

divers? :)

>   asking the hardware to essentially go into a best effort mode.
> 
> Here are a few examples of what we would expect if encoding=auto:
> - if autoneg is on, we are  expecting FEC to be negotiated as on or off
>   as long as protocol supports it
> - if the hardware is capable of detecting the FEC encoding on it's
>   receiver it will reconfigure its encoder to match
> - in absence of the above, the configuration would be set to IEEE
>   defaults.

Not sure I follow, why do we need an autoneg option if encoding type can be set 
to auto?

Re: [PATCH RFC 0/2] ethtool: Add actual port speed reporting

2016-11-03 Thread Gal Pressman



On 02/11/2016 17:50, Mintz, Yuval wrote:
>> Sending RFC to get feedback for the following ethtool proposal:
>>
>> In some cases such as virtual machines and multi functions (SR-IOV), the 
>> actual
>> bandwidth exposed for each machine is not accurately shown in ethtool.
>> Currently ethtool shows only physical port link speed.
>> In our case we would like to show the virtual port operational link speed 
>> which
>> in some cases is less than the physical port speed.
>>
>> This will give users better visibility for the actual speed running on their 
>> device.
>>
>> $ ethtool ens6
>> ...
>> Speed: 5Mb/s
>> Actual speed: 25000Mb/s
> 
> Not saying this is a bad thing, but where exactly is it listed that ethtool 
> has
> to show the physical port speed?
> E.g., bnx2x shows the logical speed instead, and has been doing that for 
> years.
> [Perhaps that's a past wrongness, but that's how it goes].
> 
> And besides, one can argue that in the SR-IOV scenario the VF has no business
> knowing the physical port speed.
> 

Good point, but there are more use-cases we should consider.
For example, when using Multi-Host/Flex-10/Multi-PF each PF should
be able to query both physical port speed and actual speed.

Re: [net-next 11/15] net/mlx5e: Reduce number of heap allocated buffers for update stats

2017-06-21 Thread Gal Pressman


>> +u32 in[MLX5_ST_SZ_DW(ppcnt_reg)] = {0};
> How big is that?
>
> Allocating large on-stack buffers is even worse.
Both PPCNT and MPCNT registers are 252 bytes.
I don't see a problem allocating them on the stack.

[RFC PATCH net-next 2/3] net/mlx5: Add PDDR register infrastructure

2017-06-21 Thread Gal Pressman

PDDR (Port Diagnostics Database Register) is used to read the physical
layer debug database, which contains helpful troubleshooting information
regarding the state of the link.

PDDR register can only be queried when PCAM register reports it as
supported in its register mask. A new helper macro was added to
the MLX5_CAP_* infrastructure in order to access this mask.

Expose query functions for PDDR register which will be used in the
following patch.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h| 29 
 drivers/net/ethernet/mellanox/mlx5/core/port.c | 14 ++
 include/linux/mlx5/device.h|  3 ++
 include/linux/mlx5/driver.h|  1 +
 include/linux/mlx5/mlx5_ifc.h  | 51 ++
 5 files changed, 98 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 5ccdf43..cbb6a0e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -79,6 +79,35 @@ enum {
MLX5_DRIVER_SYND = 0xbadd00de,
 };
 
+enum mlx5_pddr_page_select {
+   MLX5_PDDR_OPERATIONAL_INFO_PAGE= 0x0,
+   MLX5_PDDR_TROUBLESHOOTING_INFO_PAGE= 0x1,
+   MLX5_PDDR_MODULE_INFO_PAGE = 0x3,
+};
+
+enum mlx5_pddr_monitor_opcodes {
+   MLX5_LINK_NO_ISSUE_OBSERVED= 0x0,
+   MLX5_LINK_PORT_CLOSED  = 0x1,
+   MLX5_LINK_AN_FAILURE   = 0x2,
+   MLX5_LINK_TRAINING_FAILURE = 0x5,
+   MLX5_LINK_LOGICAL_MISMATCH = 0x9,
+   MLX5_LINK_REMOTE_FAULT_INDICATION  = 0xe,
+   MLX5_LINK_BAD_SIGNAL_INTEGRITY = 0xf,
+   MLX5_LINK_CABLE_COMPLIANCE_CODE_MISMATCH   = 0x10,
+   MLX5_LINK_INTERNAL_ERR = 0x17,
+   MLX5_LINK_INFO_NOT_AVAIL   = 0x3ff,
+   MLX5_LINK_CABLE_UNPLUGGED  = 0x400,
+   MLX5_LINK_LONG_RANGE_FOR_NON_MLX_CABLE = 0x401,
+   MLX5_LINK_BUS_STUCK= 0x402,
+   MLX5_LINK_UNSUPP_EEPROM= 0x403,
+   MLX5_LINK_PART_NUM_LIST= 0x404,
+   MLX5_LINK_UNSUPP_CABLE = 0x405,
+   MLX5_LINK_MODULE_TEMP_SHUTDOWN = 0x406,
+   MLX5_LINK_SHORTED_CABLE= 0x407,
+   MLX5_LINK_POWER_BUDGET_EXCEEDED= 0x408,
+   MLX5_LINK_MNG_FORCED_DOWN  = 0x409,
+};
+
 int mlx5_query_hca_caps(struct mlx5_core_dev *dev);
 int mlx5_query_board_id(struct mlx5_core_dev *dev);
 int mlx5_cmd_init_hca(struct mlx5_core_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 1975d43..38e97b2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -789,6 +789,20 @@ int mlx5_query_port_wol(struct mlx5_core_dev *mdev, u8 
*wol_mode)
 }
 EXPORT_SYMBOL_GPL(mlx5_query_port_wol);
 
+static int mlx5_query_pddr(struct mlx5_core_dev *mdev,
+  int page_select, u32 *out, int outlen)
+{
+   u32 in[MLX5_ST_SZ_DW(pddr_reg)] = {0};
+
+   if (!MLX5_CAP_PCAM_REG(mdev, pddr))
+   return -EOPNOTSUPP;
+
+   MLX5_SET(pddr_reg, in, local_port, 1);
+   MLX5_SET(pddr_reg, in, page_select, page_select);
+
+   return mlx5_core_access_reg(mdev, in, sizeof(in), out, outlen, 
MLX5_REG_PDDR, 0, 0);
+}
+
 static int mlx5_query_ports_check(struct mlx5_core_dev *mdev, u32 *out,
  int outlen)
 {
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index b26a478..b70b283 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1094,6 +1094,9 @@ enum mlx5_mcam_feature_groups {
 #define MLX5_CAP_PCAM_FEATURE(mdev, fld) \
MLX5_GET(pcam_reg, (mdev)->caps.pcam, 
feature_cap_mask.enhanced_features.fld)
 
+#define MLX5_CAP_PCAM_REG(mdev, reg) \
+   MLX5_GET(pcam_reg, (mdev)->caps.pcam, 
port_access_reg_cap_mask.regs_5000_to_507f.reg)
+
 #define MLX5_CAP_MCAM_FEATURE(mdev, fld) \
MLX5_GET(mcam_reg, (mdev)->caps.mcam, 
mng_feature_cap_mask.enhanced_features.fld)
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index bf15e87..876056f 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -121,6 +121,7 @@ enum {
MLX5_REG_PMPE= 0x5010,
MLX5_REG_PELC= 0x500e,
MLX5_REG_PVLC= 0x500f,
+   MLX5_REG_PDDR= 0x5031,
MLX5_REG_PCMR= 0x5041,
MLX5_REG_PMLP= 0x5002,
MLX5_REG_PCAM= 0x507f,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
inde

[RFC PATCH net-next 0/3] ethtool: Add link down reason reporting

2017-06-21 Thread Gal Pressman

Hi All,

Currently, drivers can only tell whether the link is up/down through
ETHTOOL_GLINK callback, but no additional information is given in case
of link down/failure.

This series provides an infrastructure to ethtool that allows 
netdevice drivers to hint the user with the reason why the link is down,
in order to ease the debug process.

In addition two more mlx5 patches to demonstrate the usage.

Reasons are separated to two types, generic and vendor specific.
Drivers can reply with a generic reason using the predefined enums 
(and the ones that will be added in the future), which will be
translated to strings by the userspace ethtool.
In case of a vendor specific reason, drivers can reply with
ETHTOOL_VENDOR_SPECIFIC reason, in this case the vendor_reason field
should be filled with a vendor specific status code which will be parsed
by the vendor specific userspace parser if one is available.

This kind of information can save system administrators precious time,
which will not be wasted trying to understand why the link won't go up.

For example, when the cable is unplugged:
$ ethtool ethXX
...
Link detected: no (Cable unplugged)

Thanks,
Gal.

Gal Pressman (3):
  ethtool: Add link down reason callback
  net/mlx5: Add PDDR register infrastructure
  net/mlx5e: Expose link down reason to ethtool

 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 107 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  33 +++
 drivers/net/ethernet/mellanox/mlx5/core/port.c |  39 
 include/linux/ethtool.h|   2 +
 include/linux/mlx5/device.h|   3 +
 include/linux/mlx5/driver.h|   1 +
 include/linux/mlx5/mlx5_ifc.h  |  51 ++
 include/uapi/linux/ethtool.h   |  33 +++
 net/core/ethtool.c |  24 +
 9 files changed, 293 insertions(+)

-- 
2.7.4

[RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-21 Thread Gal Pressman

Currently, drivers can only tell whether the link is up/down through
ETHTOOL_GLINK callback, but no additional information is given in case
of link down.
This patch provides an infrastructure that allows drivers to hint
the user with the reason why the link is down, in order to ease the
debug process.

Reasons are separated to two types, generic and vendor specific.
Drivers can reply with a generic reason using the enums provided in this
patch (and the ones that will be added in the future), which will be
translated to strings by the userspace ethtool.
In case of a vendor specific reason (not suitable for most vendors),
drivers can reply with ETHTOOL_VENDOR_SPECIFIC reason, in this case the
vendor_reason field should be filled with a vendor specific status code
which will be parsed by the vendor specific userspace parser if one is
available.

This kind of information can save system administrators precious time
which will not be wasted trying to understand why the link won't go
up.

For example, when the cable is unplugged:
$ ethtool ethXX
...
Link detected: no (Cable unplugged)

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 include/linux/ethtool.h  |  2 ++
 include/uapi/linux/ethtool.h | 33 +
 net/core/ethtool.c   | 24 
 3 files changed, 59 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 83cc986..d472047 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -374,5 +374,7 @@ struct ethtool_ops {
  struct ethtool_link_ksettings *);
int (*set_link_ksettings)(struct net_device *,
  const struct ethtool_link_ksettings *);
+   int (*get_link_down_reason)(struct net_device *,
+   struct ethtool_link_down_reason *);
 };
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 7d4a594..8cf9d2c 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -550,6 +550,13 @@ struct ethtool_pauseparam {
 
 #define ETH_GSTRING_LEN32
 
+struct ethtool_link_down_reason {
+   __u32   cmd;
+   __u32   reason;
+   __u32   vendor_reason;
+   __u32   reserved[4];
+};
+
 /**
  * enum ethtool_stringset - string set ID
  * @ETH_SS_TEST: Self-test result names, for use with %ETHTOOL_TEST
@@ -1331,6 +1338,8 @@ struct ethtool_per_queue_op {
 #define ETHTOOL_PHY_GTUNABLE   0x004e /* Get PHY tunable configuration */
 #define ETHTOOL_PHY_STUNABLE   0x004f /* Set PHY tunable configuration */
 
+#define ETHTOOL_GLINK_DOWN_RSN 0x0050 /* Get link down reason */
+
 /* compatibility with older code */
 #define SPARC_ETH_GSET ETHTOOL_GSET
 #define SPARC_ETH_SSET ETHTOOL_SSET
@@ -1766,4 +1775,28 @@ struct ethtool_link_settings {
 * __u32 map_lp_advertising[link_mode_masks_nwords];
 */
 };
+
+enum {
+   ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in 
vendor_reason */
+   ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
+   ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
+   ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
+   ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
+   ETHTOOL_LINK_AN_FAILED, /* Auto negotiation failed */
+   ETHTOOL_LINK_TRAINING_FAILED, /* Link training failed */
+   ETHTOOL_LINK_RMT_FAULT, /* Remote fault indication */
+   ETHTOOL_LINK_BAD_SIGNAL_INTEGRITY, /* Bad signal integrity */
+   ETHTOOL_LINK_CABLE_MISMATCH, /* Cable protocol mismatch */
+   ETHTOOL_LINK_INTERNAL_ERR, /* Internal error */
+   ETHTOOL_LINK_CABLE_UNPLUGGED, /* Cable unplugged */
+   ETHTOOL_LINK_UNSUPP_MODULE, /* Unsupported module */
+   ETHTOOL_LINK_I2C_BUS_ERR, /* I2C bus error */
+   ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
+   ETHTOOL_LINK_OVERTEMP, /* Over temperature */
+   ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
+   ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
+
+   ETHTOOL_LINK_REASONS_COUNT
+};
+
 #endif /* _UAPI_LINUX_ETHTOOL_H */
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 03111a2..b818ad4 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2523,6 +2523,26 @@ static int set_phy_tunable(struct net_device *dev, void 
__user *useraddr)
return ret;
 }
 
+static int get_link_down_reason(struct net_device *dev, void __user *useraddr)
+{
+   struct ethtool_link_down_reason ldr;
+   int ret;
+
+   if (!dev->ethtool_ops->get_link_down_reason)
+   return -EOPNOTSUPP;
+
+   memset(&ldr, 0, sizeof(ldr));
+   ldr.cmd = ETHTOOL_GLINK_DOWN_RSN;
+   ret = dev->ethtool_ops->get_link_down_reason(dev, &ldr);
+   if (ret)
+   return ret;
+
+   if (copy_to_user(u

[RFC PATCH net-next 3/3] net/mlx5e: Expose link down reason to ethtool

2017-06-21 Thread Gal Pressman

Use the new ethtool link down reason api, and expose troubleshooting
info regarding the link status.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 107 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   4 +
 drivers/net/ethernet/mellanox/mlx5/core/port.c |  25 +
 3 files changed, 136 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index ab46061..a5e9f1f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -135,6 +135,88 @@ void mlx5e_build_ptys2ethtool_map(void)
   ETHTOOL_LINK_MODE_5baseKR2_Full_BIT);
 }
 
+static const struct {
+   u16 pddr;
+   u32 ethtool;
+} pddr2ethtool_table[] = {
+   {
+   .pddr= MLX5_LINK_NO_ISSUE_OBSERVED,
+   .ethtool = ETHTOOL_LINK_NO_ISSUE,
+   },
+   {
+   .pddr= MLX5_LINK_PORT_CLOSED,
+   .ethtool = ETHTOOL_LINK_ADMIN_DOWN,
+   },
+   {
+   .pddr= MLX5_LINK_AN_FAILURE,
+   .ethtool = ETHTOOL_LINK_AN_FAILED,
+   },
+   {
+   .pddr= MLX5_LINK_TRAINING_FAILURE,
+   .ethtool = ETHTOOL_LINK_TRAINING_FAILED,
+   },
+   {
+   .pddr= MLX5_LINK_REMOTE_FAULT_INDICATION,
+   .ethtool = ETHTOOL_LINK_RMT_FAULT,
+   },
+   {
+   .pddr= MLX5_LINK_BAD_SIGNAL_INTEGRITY,
+   .ethtool = ETHTOOL_LINK_BAD_SIGNAL_INTEGRITY,
+   },
+   {
+   .pddr= MLX5_LINK_CABLE_COMPLIANCE_CODE_MISMATCH,
+   .ethtool = ETHTOOL_LINK_CABLE_MISMATCH,
+   },
+   {
+   .pddr= MLX5_LINK_INTERNAL_ERR,
+   .ethtool = ETHTOOL_LINK_INTERNAL_ERR,
+   },
+   {
+   .pddr= MLX5_LINK_INFO_NOT_AVAIL,
+   .ethtool = ETHTOOL_LINK_REASON_UNKNOWN,
+   },
+   {
+   .pddr= MLX5_LINK_CABLE_UNPLUGGED,
+   .ethtool = ETHTOOL_LINK_CABLE_UNPLUGGED,
+   },
+   {
+   .pddr= MLX5_LINK_LONG_RANGE_FOR_NON_MLX_CABLE,
+   .ethtool = ETHTOOL_LINK_UNSUPP_MODULE,
+   },
+   {
+   .pddr= MLX5_LINK_BUS_STUCK,
+   .ethtool = ETHTOOL_LINK_I2C_BUS_ERR,
+   },
+   {
+   .pddr= MLX5_LINK_UNSUPP_EEPROM,
+   .ethtool = ETHTOOL_LINK_UNSUPP_EEPROM,
+   },
+   {
+   .pddr= MLX5_LINK_MODULE_TEMP_SHUTDOWN,
+   .ethtool = ETHTOOL_LINK_OVERTEMP,
+   },
+   {
+   .pddr= MLX5_LINK_POWER_BUDGET_EXCEEDED,
+   .ethtool = ETHTOOL_LINK_PWR_BUDGET_EXC,
+   },
+   {
+   .pddr= MLX5_LINK_MNG_FORCED_DOWN,
+   .ethtool = ETHTOOL_LINK_MODULE_ADMIN_DOWN,
+   },
+};
+
+static u32 mlx5e_pddr2ethtool(u16 pddr)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(pddr2ethtool_table); i++) {
+   if (pddr2ethtool_table[i].pddr == pddr)
+   return pddr2ethtool_table[i].ethtool;
+   }
+
+   return ETHTOOL_LINK_VENDOR_SPECIFIC;
+}
+
 static unsigned long mlx5e_query_pfc_combined(struct mlx5e_priv *priv)
 {
struct mlx5_core_dev *mdev = priv->mdev;
@@ -1795,6 +1877,30 @@ static int mlx5e_set_rxnfc(struct net_device *dev, 
struct ethtool_rxnfc *cmd)
return err;
 }
 
+static int mlx5e_get_link_down_reason(struct net_device *netdev,
+ struct ethtool_link_down_reason *ldr)
+{
+   struct mlx5e_priv *priv = netdev_priv(netdev);
+   u16 monitor_opcode;
+   int err;
+
+   if (!netif_running(netdev)) {
+   ldr->reason = ETHTOOL_LINK_NETDEV_CARRIER_DOWN;
+   return 0;
+   }
+
+   err = mlx5_query_pddr_troubleshooting_info(priv->mdev,
+  &monitor_opcode, NULL);
+   if (err)
+   return err;
+
+   ldr->reason = mlx5e_pddr2ethtool(monitor_opcode);
+   if (ldr->reason == ETHTOOL_LINK_VENDOR_SPECIFIC)
+   ldr->vendor_reason = monitor_opcode;
+
+   return 0;
+}
+
 const struct ethtool_ops mlx5e_ethtool_ops = {
.get_drvinfo   = mlx5e_get_drvinfo,
.get_link  = ethtool_op_get_link,
@@ -1828,4 +1934,5 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
.get_priv_flags= mlx5e_get_priv_flags,
.set_priv_flags= mlx5e_set_priv_flags,
.self_test = mlx5e_self_test,
+   .get_link_down_reason = mlx5e_get_link_down_reason,
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index cbb6a0e..b4cbaa0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/

[PATCH ethtool] ethtool: Fix SFF 8079 cable technology bit parsing

2017-03-19 Thread Gal Pressman

According to the transceiver compliance code definition in the spec, bits
2 & 3 in the 8th byte are indication of active/passive cable, and not
specifically related to FC/copper.

Fixes: 2edf56749abe ("ethtool: Addition of -m option to dump module eeprom")
Signed-off-by: Gal Pressman 
---
 sfpid.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sfpid.c b/sfpid.c
index fd6415c..1732e5e 100644
--- a/sfpid.c
+++ b/sfpid.c
@@ -137,9 +137,9 @@ static void sff8079_show_transceiver(const __u8 *id)
if (id[8] & (1 << 4))
printf("%s FC: Longwave laser (LL)\n", pfx);
if (id[8] & (1 << 3))
-   printf("%s FC: Copper Active\n", pfx);
+   printf("%s Active Cable\n", pfx);
if (id[8] & (1 << 2))
-   printf("%s FC: Copper Passive\n", pfx);
+   printf("%s Passive Cable\n", pfx);
if (id[8] & (1 << 1))
printf("%s FC: Copper FC-BaseT\n", pfx);
/* Fibre Channel transmission media */
-- 
2.7.4

Re: [PATCH ethtool] ethtool: Support for configurable RSS hash function

2017-03-26 Thread Gal Pressman

On 26/03/2017 00:50, Jakub Kicinski wrote:
> On Wed,  8 Mar 2017 16:03:51 +0200, Gal Pressman wrote:
>> This ethtool patch adds support to set and get the current RSS hash
>> function for the device through the new hfunc mask field in the
>> ethtool_rxfh struct. Kernel supported hash function names are queried
>> with ETHTOOL_GSTRINGS - each string is corresponding with a bit in hfunc
>> mask according to its index in the string-set.
>>
>> Signed-off-by: Eyal Perry 
>> Signed-off-by: Gal Pressman 
>> Reviewed-by: Saeed Mahameed 
> Hi John,
>
> It seems you have applied both my earlier patch with get support and
> this:
>
> adbaa18b9bc1 ("ethtool: Support for configurable RSS hash function")
> b932835d2302 ("ethtool: print hash function with ethtool 
> -x|--show-rxfh-indir")
>
> Now we print the RSS function twice:
>
> RX flow hash indirection table for em4 with 4 RX ring(s):
> 0:  [...]
> RSS hash function: toeplitz  <--- from my adbaa18b9bc1
> RSS hash key:
> Operation not supported
> RSS hash function:   <--- from this patch
> toeplitz: on
> xor: off
> crc32: off
>
> Reverting my patch is probably the easiest way forward, although I find
> it more concise and easier to parse in test scripts :)
Hi Jakub,
Nice catch!

IMHO we should keep the output from my patch, as it is more informative.
This way the user can query all hash functions supported by the kernel using -x 
flag, before
trying to change them.
I think it's nicer than "guessing" what's supported and what's not :).

Re: [PATCH ethtool] ethtool: Support for configurable RSS hash function

2017-03-29 Thread Gal Pressman

On 27/03/2017 21:02, John W. Linville wrote:
> On Sat, Mar 25, 2017 at 02:50:47PM -0700, Jakub Kicinski wrote:
>> On Wed,  8 Mar 2017 16:03:51 +0200, Gal Pressman wrote:
>>> This ethtool patch adds support to set and get the current RSS hash
>>> function for the device through the new hfunc mask field in the
>>> ethtool_rxfh struct. Kernel supported hash function names are queried
>>> with ETHTOOL_GSTRINGS - each string is corresponding with a bit in hfunc
>>> mask according to its index in the string-set.
>>>
>>> Signed-off-by: Eyal Perry 
>>> Signed-off-by: Gal Pressman 
>>> Reviewed-by: Saeed Mahameed 
>> Hi John,
>>
>> It seems you have applied both my earlier patch with get support and
>> this:
>>
>> adbaa18b9bc1 ("ethtool: Support for configurable RSS hash function")
>> b932835d2302 ("ethtool: print hash function with ethtool 
>> -x|--show-rxfh-indir")
>>
>> Now we print the RSS function twice:
>>
>> RX flow hash indirection table for em4 with 4 RX ring(s):
>> 0:  [...]
>> RSS hash function: toeplitz  <--- from my adbaa18b9bc1
>> RSS hash key:
>> Operation not supported
>> RSS hash function:   <--- from this patch
>> toeplitz: on
>> xor: off
>> crc32: off
>>
>> Reverting my patch is probably the easiest way forward, although I find
>> it more concise and easier to parse in test scripts :)
> Thanks for pointing-out this issue! I apologize for my own confusion.
>
> As you suggest, I will be reverting your patch.
>
> Thanks,
>
> John
126464e4da18 ('Revert "ethtool: Support for configurable RSS hash function"')

Seems like you ended up reverting my patch instead of Jakub's?
We lost the set hfunc functionality.

[PATCH ethtool] ethtool: Support for configurable RSS hash function

2017-03-08 Thread Gal Pressman

This ethtool patch adds support to set and get the current RSS hash
function for the device through the new hfunc mask field in the
ethtool_rxfh struct. Kernel supported hash function names are queried
with ETHTOOL_GSTRINGS - each string is corresponding with a bit in hfunc
mask according to its index in the string-set.

Signed-off-by: Eyal Perry 
Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
---
 ethtool.8.in |  6 ++
 ethtool.c| 66 +---
 2 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/ethtool.8.in b/ethtool.8.in
index 5c36c06385f6..2cbe3f158b3e 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -301,6 +301,8 @@ ethtool \- query or control network driver and hardware 
settings
 .BI weight\  W0
 .IR W1
 .RB ...\ | \ default \ ]
+.RB [ hfunc
+.IR FUNC ]
 .HP
 .B ethtool \-f|\-\-flash
 .I devname file
@@ -865,6 +867,10 @@ Sets RSS hash key of the specified network device. RSS 
hash key should be of dev
 Hash key format must be in xx:yy:zz:aa:bb:cc format meaning both the nibbles 
of a byte should be mentioned
 even if a nibble is zero.
 .TP
+.BI hfunc
+Sets RSS hash function of the specified network device.
+List of RSS hash functions which kernel supports is shown as a part of the 
--show-rxfh command output.
+.TP
 .BI equal\  N
 Sets the receive flow hash indirection table to spread flows evenly
 between the first \fIN\fR receive queues.
diff --git a/ethtool.c b/ethtool.c
index 7af039e26b50..61d5714285f9 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3640,6 +3640,7 @@ static int do_grxfhindir(struct cmd_context *ctx,
 
 static int do_grxfh(struct cmd_context *ctx)
 {
+   struct ethtool_gstrings *hfuncs = NULL;
struct ethtool_rxfh rss_head = {0};
struct ethtool_rxnfc ring_count;
struct ethtool_rxfh *rss;
@@ -3697,6 +3698,26 @@ static int do_grxfh(struct cmd_context *ctx)
printf("%02x:", (u8) hkey[i]);
}
 
+   printf("RSS hash function:\n");
+   if (!rss->hfunc) {
+   printf("Operation not supported\n");
+   goto out;
+   }
+
+   hfuncs = get_stringset(ctx, ETH_SS_RSS_HASH_FUNCS, 0, 1);
+   if (!hfuncs) {
+   perror("Cannot get hash functions names");
+   free(rss);
+   return 1;
+   }
+
+   for (i = 0; i < hfuncs->len; i++)
+   printf("%s: %s\n",
+  (const char *)hfuncs->data + i * ETH_GSTRING_LEN,
+  (rss->hfunc & (1 << i)) ? "on" : "off");
+
+out:
+   free(hfuncs);
free(rss);
return 0;
 }
@@ -3800,11 +3821,16 @@ static int do_srxfh(struct cmd_context *ctx)
struct ethtool_rxfh *rss;
struct ethtool_rxnfc ring_count;
int rxfhindir_equal = 0, rxfhindir_default = 0;
+   struct ethtool_gstrings *hfuncs = NULL;
char **rxfhindir_weight = NULL;
char *rxfhindir_key = NULL;
+   char *req_hfunc_name = NULL;
+   char *hfunc_name = NULL;
char *hkey = NULL;
int err = 0;
+   int i;
u32 arg_num = 0, indir_bytes = 0;
+   u32 req_hfunc = 0;
u32 entry_size = sizeof(rss_head.rss_config[0]);
u32 num_weights = 0;
 
@@ -3836,6 +3862,12 @@ static int do_srxfh(struct cmd_context *ctx)
} else if (!strcmp(ctx->argp[arg_num], "default")) {
++arg_num;
rxfhindir_default = 1;
+   } else if (!strcmp(ctx->argp[arg_num], "hfunc")) {
+   ++arg_num;
+   req_hfunc_name = ctx->argp[arg_num];
+   if (!req_hfunc_name)
+   exit_bad_args();
+   ++arg_num;
} else {
exit_bad_args();
}
@@ -3868,7 +3900,8 @@ static int do_srxfh(struct cmd_context *ctx)
 
rss_head.cmd = ETHTOOL_GRSSH;
err = send_ioctl(ctx, &rss_head);
-   if (err < 0 && errno == EOPNOTSUPP && !rxfhindir_key) {
+   if (err < 0 && errno == EOPNOTSUPP && !rxfhindir_key &&
+   !req_hfunc_name) {
return do_srxfhindir(ctx, rxfhindir_default, rxfhindir_equal,
 rxfhindir_weight, num_weights);
} else if (err < 0) {
@@ -3886,14 +3919,39 @@ static int do_srxfh(struct cmd_context *ctx)
if (rxfhindir_equal || rxfhindir_weight)
indir_bytes = rss_head.indir_size * entry_size;
 
+   if (rss_head.hfunc && req_hfunc_name) {
+   hfuncs = get_stringset(ctx, ETH_SS_RSS_HASH_FUNCS, 0, 1);
+   if (!hfuncs) {
+   perror("Cannot get hash functions names");
+   return 1;
+   }

Re: [PATCH ethtool 5/6] ethtool: correctly free hkey when get_stringset() fails

2018-06-08 Thread Gal Pressman

On 08-Jun-18 12:20, Ivan Vecera wrote:
> Memory allocated for 'hkey' is not freed when
> get_stringset(..., ETH_SS_RSS_HASH_FUNCS...) fails.
> 
> Fixes: b888f35 ("ethtool: Support for configurable RSS hash function")

Thanks for fixing this!
Please use the first 12 characters of the sha1 in the Fixes line.

> Cc: Gal Pressman 
> Signed-off-by: Ivan Vecera 
> ---
>  ethtool.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/ethtool.c b/ethtool.c
> index 2b90984..fb93ae8 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -3910,7 +3910,7 @@ static int do_srxfhindir(struct cmd_context *ctx, int 
> rxfhindir_default,
>  static int do_srxfh(struct cmd_context *ctx)
>  {
>   struct ethtool_rxfh rss_head = {0};
> - struct ethtool_rxfh *rss;
> + struct ethtool_rxfh *rss = NULL;
>   struct ethtool_rxnfc ring_count;
>   int rxfhindir_equal = 0, rxfhindir_default = 0;
>   struct ethtool_gstrings *hfuncs = NULL;
> @@ -4064,7 +4064,8 @@ static int do_srxfh(struct cmd_context *ctx)
>   hfuncs = get_stringset(ctx, ETH_SS_RSS_HASH_FUNCS, 0, 1);
>   if (!hfuncs) {
>   perror("Cannot get hash functions names");
> - return 1;
> + err = 1;
> + goto free;
>   }
>  
>   for (i = 0; i < hfuncs->len && !req_hfunc ; i++) {
> @@ -4078,8 +4079,8 @@ static int do_srxfh(struct cmd_context *ctx)
>   if (!req_hfunc) {
>   fprintf(stderr,
>   "Unknown hash function: %s\n", req_hfunc_name);
> - free(hfuncs);
> - return 1;
> + err = 1;
> + goto free;
>   }
>   }
>  
> @@ -4120,9 +4121,7 @@ static int do_srxfh(struct cmd_context *ctx)
>   }
>  
>  free:
> - if (hkey)
> - free(hkey);
> -
> + free(hkey);
>   free(rss);
>   free(hfuncs);
>   return err;
>

Re: [PATCH v2] net/mlx5: Delete unneeded function argument

2018-08-15 Thread Gal Pressman

On 15-Aug-18 18:08, Yuval Shaia wrote:
> priv argument is not used by the function, delete it.
> 
> Fixes: a89842811ea98 ("net/mlx5e: Merge per priority stats groups")
> Signed-off-by: Yuval Shaia 

nit: prefix should be net/mlx5e.

> ---
> v1 -> v2:
>   * Remove blank line as pointed by Leon.
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> index 1646859974ce..4c4d779dafa8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> @@ -813,7 +813,7 @@ static const struct counter_desc 
> pport_per_prio_traffic_stats_desc[] = {
>  
>  #define NUM_PPORT_PER_PRIO_TRAFFIC_COUNTERS  
> ARRAY_SIZE(pport_per_prio_traffic_stats_desc)
>  
> -static int mlx5e_grp_per_prio_traffic_get_num_stats(struct mlx5e_priv *priv)
> +static int mlx5e_grp_per_prio_traffic_get_num_stats(void)
>  {
>   return NUM_PPORT_PER_PRIO_TRAFFIC_COUNTERS * NUM_PPORT_PRIO;
>  }
> @@ -971,7 +971,7 @@ static int mlx5e_grp_per_prio_pfc_fill_stats(struct 
> mlx5e_priv *priv,
>  
>  static int mlx5e_grp_per_prio_get_num_stats(struct mlx5e_priv *priv)
>  {
> - return mlx5e_grp_per_prio_traffic_get_num_stats(priv) +
> + return mlx5e_grp_per_prio_traffic_get_num_stats() +
>   mlx5e_grp_per_prio_pfc_get_num_stats(priv);
>  }
>  
>

Re: [PATCH v2] net/mlx5: Delete unneeded function argument

2018-08-16 Thread Gal Pressman

On 16-Aug-18 12:00, Yuval Shaia wrote:
> On Thu, Aug 16, 2018 at 09:41:34AM +0300, Gal Pressman wrote:
>> On 15-Aug-18 18:08, Yuval Shaia wrote:
>>> priv argument is not used by the function, delete it.
>>>
>>> Fixes: a89842811ea98 ("net/mlx5e: Merge per priority stats groups")
>>> Signed-off-by: Yuval Shaia 
>>
>> nit: prefix should be net/mlx5e.
> 
> 1. Will do.
> 2. Why? (asking since dir name is mlx5)

Core related changes use net/mlx5 prefix, Ethernet related changes (en_* files) 
use net/mlx5e prefix.
Same thought goes into function names prefix (mlx5_*/mlx5e_*).

> 
>>
>>> ---
>>> v1 -> v2:
>>> * Remove blank line as pointed by Leon.
>>> ---
>>>  drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c 
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
>>> index 1646859974ce..4c4d779dafa8 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
>>> @@ -813,7 +813,7 @@ static const struct counter_desc 
>>> pport_per_prio_traffic_stats_desc[] = {
>>>  
>>>  #define NUM_PPORT_PER_PRIO_TRAFFIC_COUNTERS
>>> ARRAY_SIZE(pport_per_prio_traffic_stats_desc)
>>>  
>>> -static int mlx5e_grp_per_prio_traffic_get_num_stats(struct mlx5e_priv 
>>> *priv)
>>> +static int mlx5e_grp_per_prio_traffic_get_num_stats(void)
>>>  {
>>> return NUM_PPORT_PER_PRIO_TRAFFIC_COUNTERS * NUM_PPORT_PRIO;
>>>  }
>>> @@ -971,7 +971,7 @@ static int mlx5e_grp_per_prio_pfc_fill_stats(struct 
>>> mlx5e_priv *priv,
>>>  
>>>  static int mlx5e_grp_per_prio_get_num_stats(struct mlx5e_priv *priv)
>>>  {
>>> -   return mlx5e_grp_per_prio_traffic_get_num_stats(priv) +
>>> +   return mlx5e_grp_per_prio_traffic_get_num_stats() +
>>> mlx5e_grp_per_prio_pfc_get_num_stats(priv);
>>>  }
>>>  
>>>
>>

Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver

2020-09-18 Thread Gal Pressman

On 17/09/2020 20:18, Jason Gunthorpe wrote:
> On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
>> infrastructure for communication between multiple accelerators. Same
>> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
>> The RDMA implementation we did does NOT support some basic RDMA
>> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
>> library or to connect to the rdma infrastructure in the kernel. 
> 
> You can't create a parallel RDMA subsystem in netdev, or in misc, and
> you can't add random device offloads as IOCTL to nedevs.
> 
> RDMA is the proper home for all the networking offloads that don't fit
> into netdev.
> 
> EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> all. I'm sure this can too.

Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
was suggested to go through the vfio subsystem instead.

I think this comes back to the discussion we had when EFA was upstreamed, which
is what's the bar to get accepted to the RDMA subsystem.
IIRC, what we eventually agreed on is having a userspace rdma-core provider and
ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP 
types?).

Does GAUDI fit these requirements? If not, should it be in a different subsystem
or should we open the "what qualifies as an RDMA device" question again?

Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver

2020-09-21 Thread Gal Pressman

On 18/09/2020 18:28, Jason Gunthorpe wrote:
> On Fri, Sep 18, 2020 at 06:15:52PM +0300, Oded Gabbay wrote:
> 
>> I'm sorry, but you won't be able to convince me here that I need to
>> "enslave" my entire code to RDMA, just because my ASIC "also" has some
>> RDMA ports.
> 
> You can't recreate common shared subsystems in a driver just because
> you don't want to work with the subsystem.
> 
> I don't care what else the ASIC has. In Linux the netdev part is
> exposed through netdev, the RDMA part through RDMA, the
> totally-not-a-GPU part through drivers/misc.
> 
> It is always been this way. Chelsio didn't get to rebuild the SCSI
> stack in their driver just because "storage is a small part of their
> device"
> 
> Drivers are not allowed to re-implement I2C/SPI/etc without re-using
> the comon code for that just because "I2C is a small part of their
> device"
> 
> Exposing to userspace the creation of RoCE QPs and their related
> objects are unambiguously a RDMA subsystem task. I don't even know how
> you think you can argue it is not. It is your company proudly claiming
> the device has 100G RoCE ports in all the marketing literature, after
> all.
> 
> It is too bad the device has a non-standards compliant implementation
> of RoCE so this will be a bit hard for you. Oh well.

What is considered a RoCE port in this case if it's not compliant with RoCE?
Sounds like it's an implementation of RDMA over ethernet, not RoCE.
Does GAUDI support UD/RC/.. QPs? Is it using a proprietary wire protocol?
(BTW, Oded claims it's similar to nvlink, how is nvlink's implementation
exposed? Or is it closed source?)

Jason, how do you imagine GAUDI in the RDMA subsystem? Userspace control path
verbs (used by hl-thunk?) and all data path verbs exposed as kverbs (used by
habanalabs driver)?
So neither any userspace verbs apps could use it nor kernel ULPs?

Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver

2020-09-22 Thread Gal Pressman

On 22/09/2020 14:41, Jason Gunthorpe wrote:
> On Mon, Sep 21, 2020 at 02:22:02PM +0300, Gal Pressman wrote:
> 
>> What is considered a RoCE port in this case if it's not compliant with RoCE?
>> Sounds like it's an implementation of RDMA over ethernet, not RoCE.
>> Does GAUDI support UD/RC/.. QPs? Is it using a proprietary wire protocol?
>> (BTW, Oded claims it's similar to nvlink, how is nvlink's implementation
>> exposed? Or is it closed source?)
> 
> I think Oded was drawing a parallel to how nvlink is integral with the
> compute element. From Oded's descriptions I don't think it is much
> like nvlink at all.
> 
>> Jason, how do you imagine GAUDI in the RDMA subsystem? Userspace control path
>> verbs (used by hl-thunk?) and all data path verbs exposed as kverbs (used by
>> habanalabs driver)?
>> So neither any userspace verbs apps could use it nor kernel ULPs?
> 
> Based on what Oded described it seems like a reasonable RDMA device
> with some limitations around MR IOVA.
> 
> Looks like the desire is to create a RDMA WR and CQ ring in userspace,
> and then co-mingle that with the compute side of the device.
> 
> So instead of doing the special IOCTL and mmap against the compute FD
> it would create a RDMA QP and RDMA CQ, use dv to access the raw
> internals, and the propritary stack would have exactly the same stuff
> it would have had with the misc ioctl.
> 
> But, completely separately, they'd also have to implement some of
> verbs which serves as the open source userspace showing how this HW
> works. What that is depends largely on what their HW can do, and if
> they want to connect to UCX/mpi/libfabric/etc
> 
> A bunch of ioctl stubs or a few tests is far below our standard in
> RDMA.
> 
> There may have been some argument that the compute side of this device
> has no industry standards so should be a drivers/misc, but HPC
> networking *does* have extensive standards and extensive open source
> software stacks. It is very hard for me to see how a device in this
> market could be competitive without integrating with that stuff.

I agree, that makes sense.
But assuming Oded actually goes and implements all the needed verbs to get a
basic functional libibverbs provider (assuming their HW can do it somehow), is
it really useful if no one is going to use it?
It doesn't sound like habanalabs want people to use GAUDI as an RDMA adapter,
and I'm assuming the only real world use case is going to be using the hl stack,
which means we're left with a lot of dead code that's not used/tested by anyone.

Genuine question, wouldn't it be better if they only implement what's actually
going to be used and tested by their customers?

Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver

2020-09-22 Thread Gal Pressman

On 22/09/2020 19:14, Jason Gunthorpe wrote:
> On Tue, Sep 22, 2020 at 03:46:29PM +0300, Gal Pressman wrote:
> 
>> I agree, that makes sense.
>> But assuming Oded actually goes and implements all the needed verbs to get a
>> basic functional libibverbs provider (assuming their HW can do it somehow), 
>> is
>> it really useful if no one is going to use it?
>> It doesn't sound like habanalabs want people to use GAUDI as an RDMA adapter,
>> and I'm assuming the only real world use case is going to be using the hl 
>> stack,
>> which means we're left with a lot of dead code that's not used/tested by 
>> anyone.
>>
>> Genuine question, wouldn't it be better if they only implement what's 
>> actually
>> going to be used and tested by their customers?
> 
> The general standard for this 'accel' hardware, both in DRM and RDMA
> is to present an open source userspace. Companies are encouraged to
> use that as their main interface but I suppose are free to carry the
> cost of dual APIs, and the community's wrath if they want.

I didn't mean they should maintain two interfaces.
The question is whether they should implement libibverbs support that covers the
cases used by their stack, or should they implement all "mandatory" verbs so
they could be able to run libibverbs' examples/perftest/pyverbs as well, even
though these will likely be the only apps covering these verbs.

Re: [PATCH iproute2-rc 8/8] rdma: Document counter statistic

2019-07-16 Thread Gal Pressman

On 10/07/2019 10:24, Leon Romanovsky wrote:
> +.SH "EXAMPLES"
> +.PP
> +rdma statistic show
> +.RS 4
> +Shows the state of the default counter of all RDMA devices on the system.
> +.RE
> +.PP
> +rdma statistic show link mlx5_2/1
> +.RS 4
> +Shows the state of the default counter of specified RDMA port
> +.RE
> +.PP
> +rdma statistic qp show
> +.RS 4
> +Shows the state of all qp counters of all RDMA devices on the system.
> +.RE
> +.PP
> +rdma statistic qp show link mlx5_2/1
> +.RS 4
> +Shows the state of all qp counters of specified RDMA port.
> +.RE
> +.PP
> +rdma statistic qp show link mlx5_2 pid 30489
> +.RS 4
> +Shows the state of all qp counters of specified RDMA port and belonging to 
> pid 30489
> +.RE
> +.PP
> +rdma statistic qp mode
> +.RS 4
> +List current counter mode on all deivces

"deivces" -> "devices".

Re: [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions

2019-07-28 Thread Gal Pressman

On 26/07/2019 16:23, Jason Gunthorpe wrote:
> On Fri, Jul 26, 2019 at 08:42:07AM +, Michal Kalderon wrote:
> 
 But we don't free entires from the xa_array ( only when ucontext is
 destroyed) so how will There be an empty element after we wrap ?
>>>
>>> Oh!
>>>
>>> That should be fixed up too, in the general case if a user is
>>> creating/destroying driver objects in loop we don't want memory usage to
>>> be unbounded.
>>>
>>> The rdma_user_mmap stuff has VMA ops that can refcount the xa entry and
>>> now that this is core code it is easy enough to harmonize the two things and
>>> track the xa side from the struct rdma_umap_priv
>>>
>>> The question is, does EFA or qedr have a use model for this that allows a
>>> userspace verb to create/destroy in a loop? ie do we need to fix this right
>>> now?
> 
>> The mapping occurs for every qp and cq creation. So yes.
>>
>> So do you mean add a ref-cnt to the xarray entry and from umap
>> decrease the refcnt and free?
> 
> Yes, free the entry (release the HW resource) and release the xa_array
> ID.

This is a bit tricky for EFA.
The UAR BAR resources (LLQ for example) aren't cleaned up until the UAR is
deallocated, so many of the entries won't really be freed when the refcount
reaches zero (i.e the HW considers these entries as refcounted as long as the
UAR exists). The best we can do is free the DMA buffers for appropriate entries.

Re: [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions

2019-07-29 Thread Gal Pressman

On 29/07/2019 15:58, Michal Kalderon wrote:
>> From: linux-rdma-ow...@vger.kernel.org > ow...@vger.kernel.org> On Behalf Of Jason Gunthorpe
>>
>>> +   xa_lock(&ucontext->mmap_xa);
>>> +   if (check_add_overflow(ucontext->mmap_xa_page,
>>> +  (u32)(length >> PAGE_SHIFT),
>>> +  &next_mmap_page))
>>> +   goto err_unlock;
>>
>> I still don't like that this algorithm latches into a permanent failure when 
>> the
>> xa_page wraps.
>>
>> It seems worth spending a bit more time here to tidy this.. Keep using the
>> mmap_xa_page scheme, but instead do something like
>>
>> alloc_cyclic_range():
>>
>> while () {
>>// Find first empty element in a cyclic way
>>xa_page_first = mmap_xa_page;
>>xa_find(xa, &xa_page_first, U32_MAX, XA_FREE_MARK)
>>
>>// Is there a enough room to have the range?
>>if (check_add_overflow(xa_page_first, npages, &xa_page_end)) {
>>   mmap_xa_page = 0;
>>   continue;
>>}
>>
>>// See if the element before intersects
>>elm = xa_find(xa, &zero, xa_page_end, 0);
>>if (elm && intersects(xa_page_first, xa_page_last, elm->first, 
>> elm->last)) {
>>   mmap_xa_page = elm->last + 1;
>>   continue
>>}
>>
>>// xa_page_first -> xa_page_end should now be free
>>xa_insert(xa, xa_page_start, entry);
>>mmap_xa_page = xa_page_end + 1;
>>return xa_page_start;
>> }
>>
>> Approximately, please check it.
> Gal & Jason, 
> 
> Coming back to the mmap_xa_page algorithm. I couldn't find some background on 
> this. 
> Why do you need the length to be represented in the mmap_xa_page ?  
> Why not simply use xa_alloc_cyclic ( like in siw ) 
> This is simply a key to a mmap object... 

The intention was that the entry would "occupy" number of xarray elements
according to its size (in pages). It wasn't initially like this, but IIRC this
was preferred by Jason.

Re: [PATCH rdma-next 2/3] IB/mlx5: Expose ODP for DC capabilities to user

2019-08-01 Thread Gal Pressman

On 01/08/2019 15:21, Leon Romanovsky wrote:
>  enum mlx5_user_cmds_supp_uhw {
> @@ -147,6 +148,7 @@ struct mlx5_ib_alloc_ucontext_resp {
>   __u32   num_uars_per_page;
>   __u32   num_dyn_bfregs;
>   __u32   dump_fill_mkey;
> + __u32   dc_odp_caps;

This should be padded to 64 bits.

>  };
>  
>  struct mlx5_ib_alloc_pd_resp {
>

[PATCH iproute2-next] rdma: Add driver QP type string

2019-08-04 Thread Gal Pressman

RDMA resource tracker now tracks driver QPs as well, add driver QP type
string to qp_types_to_str function.

Signed-off-by: Gal Pressman 
---
 rdma/res.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/rdma/res.c b/rdma/res.c
index ef863f142eca..97a7b9640185 100644
--- a/rdma/res.c
+++ b/rdma/res.c
@@ -148,9 +148,11 @@ const char *qp_types_to_str(uint8_t idx)
 "UC", "UD", "RAW_IPV6",
 "RAW_ETHERTYPE",
 "UNKNOWN", "RAW_PACKET",
-"XRC_INI", "XRC_TGT" };
+"XRC_INI", "XRC_TGT",
+[0xFF] = "DRIVER",
+   };
 
-   if (idx < ARRAY_SIZE(qp_types_str))
+   if (idx < ARRAY_SIZE(qp_types_str) && qp_types_str[idx])
return qp_types_str[idx];
return "UNKNOWN";
 }
-- 
2.22.0

Re: [PATCH iproute2-next] rdma: Add driver QP type string

2019-08-05 Thread Gal Pressman

On 05/08/2019 22:08, David Ahern wrote:
> On 8/4/19 2:07 AM, Gal Pressman wrote:
>> RDMA resource tracker now tracks driver QPs as well, add driver QP type
>> string to qp_types_to_str function.
> 
> "now" means which kernel release? Leon: should this be in master or -next?

Now means the patch is merged to RDMA's for-rc branch (5.3).

[PATCH iproute2] rdma: Update node type strings

2019-05-14 Thread Gal Pressman

Fix typo in usnic_udp node type and add a string for the unspecified
node type.

Signed-off-by: Gal Pressman 
---
 rdma/dev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/rdma/dev.c b/rdma/dev.c
index 339625202200..904836221c1b 100644
--- a/rdma/dev.c
+++ b/rdma/dev.c
@@ -170,7 +170,8 @@ static const char *node_type_to_str(uint8_t node_type)
static const char * const node_type_str[] = { "unknown", "ca",
  "switch", "router",
  "rnic", "usnic",
- "usnic_dp" };
+ "usnic_udp",
+ "unspecified" };
if (node_type < ARRAY_SIZE(node_type_str))
return node_type_str[node_type];
return "unknown";
-- 
2.7.4

Re: [PATCH iproute2] rdma: Update node type strings

2019-05-19 Thread Gal Pressman

On 15/05/2019 9:58, Gal Pressman wrote:
> Fix typo in usnic_udp node type and add a string for the unspecified
> node type.
> 
> Signed-off-by: Gal Pressman 
> ---
>  rdma/dev.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/rdma/dev.c b/rdma/dev.c
> index 339625202200..904836221c1b 100644
> --- a/rdma/dev.c
> +++ b/rdma/dev.c
> @@ -170,7 +170,8 @@ static const char *node_type_to_str(uint8_t node_type)
>   static const char * const node_type_str[] = { "unknown", "ca",
> "switch", "router",
> "rnic", "usnic",
> -   "usnic_dp" };
> +   "usnic_udp",
> +   "unspecified" };
>   if (node_type < ARRAY_SIZE(node_type_str))
>   return node_type_str[node_type];
>   return "unknown";
> 

Sorry, forgot to add Stephen.

Re: [RFC v1 10/19] RDMA/irdma: Add connection manager

2019-02-24 Thread Gal Pressman

On 15-Feb-19 19:10, Shiraz Saleem wrote:
> +/**
> + * irdma_cm_teardown_connections - teardown QPs
> + * @iwdev: device pointer
> + * @ipaddr: Pointer to IPv4 or IPv6 address
> + * @ipv4: flag indicating IPv4 when true

There is no ipv4 parameter.

> + * @disconnect_all: flag indicating disconnect all QPs
> + * teardown QPs where source or destination addr matches ip addr
> + */
> +void irdma_cm_teardown_connections(struct irdma_device *iwdev, u32 *ipaddr,
> +struct irdma_cm_info *nfo,
> +bool disconnect_all)
> +{
> + struct irdma_cm_core *cm_core = &iwdev->cm_core;
> + struct list_head *list_core_temp;
> + struct list_head *list_node;
> + struct irdma_cm_node *cm_node;
> + struct list_head teardown_list;
> + struct ib_qp_attr attr;
> + struct irdma_sc_vsi *vsi = &iwdev->vsi;
> + struct irdma_sc_qp *sc_qp;
> + struct irdma_qp *qp;
> + int i;
> + unsigned long flags;
> +
> + INIT_LIST_HEAD(&teardown_list);
> +
> + spin_lock_irqsave(&cm_core->ht_lock, flags);
> + list_for_each_safe(list_node, list_core_temp,
> +&cm_core->accelerated_list) {
> + cm_node = container_of(list_node, struct irdma_cm_node, list);
> + if (disconnect_all ||
> + (nfo->vlan_id == cm_node->vlan_id &&
> + !memcmp(cm_node->loc_addr, ipaddr, nfo->ipv4 ? 4 : 16))) {
> + atomic_inc(&cm_node->ref_count);
> + list_add(&cm_node->teardown_entry, &teardown_list);
> + }
> + }
> + list_for_each_safe(list_node, list_core_temp,
> +&cm_core->non_accelerated_list) {
> + cm_node = container_of(list_node, struct irdma_cm_node, list);
> + if (disconnect_all ||
> + (nfo->vlan_id == cm_node->vlan_id &&
> + !memcmp(cm_node->loc_addr, ipaddr, nfo->ipv4 ? 4 : 16))) {
> + atomic_inc(&cm_node->ref_count);
> + list_add(&cm_node->teardown_entry, &teardown_list);
> + }
> + }
> + spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> +
> + list_for_each_safe(list_node, list_core_temp, &teardown_list) {
> + cm_node = container_of(list_node, struct irdma_cm_node,
> +teardown_entry);
> + attr.qp_state = IB_QPS_ERR;
> + irdma_modify_qp(&cm_node->iwqp->ibqp, &attr, IB_QP_STATE, NULL);
> + if (iwdev->reset)
> + irdma_cm_disconn(cm_node->iwqp);
> + irdma_rem_ref_cm_node(cm_node);
> + }
> + if (!iwdev->roce_mode)
> + return;
> +
> + INIT_LIST_HEAD(&teardown_list);
> + for (i = 0; i < IRDMA_MAX_USER_PRIORITY; i++) {
> + spin_lock_irqsave(&vsi->qos[i].lock, flags);
> + list_for_each_safe(list_node, list_core_temp, 
> &vsi->qos[i].qplist) {
> + u32 qp_ip[4];
> +
> + sc_qp = container_of(list_node, struct irdma_sc_qp, 
> list);
> + if (sc_qp->qp_type != IRDMA_QP_TYPE_ROCE_RC)
> + continue;
> +
> + qp = sc_qp->back_qp;
> + if (!disconnect_all) {
> + if (nfo->ipv4)
> + qp_ip[0] = qp->udp_info.local_ipaddr3;
> + else
> + memcpy(qp_ip,
> +&qp->udp_info.local_ipaddr0,
> +sizeof(qp_ip));
> + }
> +
> + if (disconnect_all ||
> + (nfo->vlan_id == qp->udp_info.vlan_tag &&
> + !memcmp(qp_ip, ipaddr, nfo->ipv4 ? 4 : 16))) {
> + spin_lock_irqsave(&iwdev->rf->qptable_lock, 
> flags);

You should use different 'flags' here.

> + if (iwdev->rf->qp_table[sc_qp->qp_uk.qp_id]) {
> + irdma_add_ref(&qp->ibqp);
> + list_add(&qp->teardown_entry, 
> &teardown_list);
> + }
> + 
> spin_unlock_irqrestore(&iwdev->rf->qptable_lock, flags);
> + }
> + }
> + spin_unlock_irqrestore(&vsi->qos[i].lock, flags);
> + }
> +
> + list_for_each_safe(list_node, list_core_temp, &teardown_list) {
> + qp = container_of(list_node, struct irdma_qp, teardown_entry);
> + attr.qp_state = IB_QPS_ERR;
> + irdma_modify_qp_roce(&qp->ibqp, &attr, IB_QP_STATE, NULL);
> + irdma_rem_ref(&qp->ibqp);
> + }
> +}

Re: [RFC v1 08/19] RDMA/irdma: Add privileged UDA queue implementation

2019-02-24 Thread Gal Pressman

On 15-Feb-19 19:10, Shiraz Saleem wrote:
> +/**
> + * irdma_puda_poll_info - poll cq for completion
> + * @cq: cq for poll
> + * @info: info return for successful completion
> + */
> +static enum irdma_status_code
> +irdma_puda_poll_info(struct irdma_sc_cq *cq, struct irdma_puda_cmpl_info 
> *info)
> +{
> + struct irdma_cq_uk *cq_uk = &cq->cq_uk;
> + u64 qword0, qword2, qword3, qword6;
> + __le64 *cqe;
> + __le64 *ext_cqe = NULL;
> + u64 qword7 = 0;
> + u64 comp_ctx;
> + bool valid_bit;
> + bool ext_valid = 0;
> + u32 major_err, minor_err;
> + u32 peek_head;
> + bool error;
> + u8 polarity;
> +
> + cqe = IRDMA_GET_CURRENT_CQ_ELEM(&cq->cq_uk);
> + get_64bit_val(cqe, 24, &qword3);
> + valid_bit = (bool)RS_64(qword3, IRDMA_CQ_VALID);
> + if (valid_bit != cq_uk->polarity)
> + return IRDMA_ERR_Q_EMPTY;
> +
> + if (cq->dev->hw_attrs.hw_rev > IRDMA_GEN_1)
> + ext_valid = (bool)RS_64(qword3, IRDMA_CQ_EXTCQE);
> +
> + if (ext_valid) {
> + enum irdma_status_code ret = 0;
> +
> + peek_head = (cq_uk->cq_ring.head + 1) % cq_uk->cq_ring.size;
> + ext_cqe = cq_uk->cq_base[peek_head].buf;
> + get_64bit_val(ext_cqe, 24, &qword7);
> + polarity = (u8)RS_64(qword7, IRDMA_CQ_VALID);
> + if (!peek_head)
> + polarity ^= 1;
> + if (polarity != cq_uk->polarity)
> + return IRDMA_ERR_Q_EMPTY;
> +
> + IRDMA_RING_MOVE_HEAD(cq_uk->cq_ring, ret);

Should you check for 'ret' value here? It was initialized to 0 and overriden
here but I can't see any actual use for it.

> + if (IRDMA_RING_CURRENT_HEAD(cq_uk->cq_ring) == 0)
> + cq_uk->polarity = !cq_uk->polarity;
> + /* update cq tail in cq shadow memory also */
> + IRDMA_RING_MOVE_TAIL(cq_uk->cq_ring);
> + }
> +
> + irdma_debug_buf(cq->dev, IRDMA_DEBUG_PUDA, "PUDA CQE", cqe, 32);
> + if (ext_valid)
> + irdma_debug_buf(cq->dev, IRDMA_DEBUG_PUDA, "PUDA EXT-CQE",
> + ext_cqe, 32);
> +
> + error = (bool)RS_64(qword3, IRDMA_CQ_ERROR);
> + if (error) {
> + irdma_debug(cq->dev, IRDMA_DEBUG_PUDA, "receive error\n");
> + major_err = (u32)(RS_64(qword3, IRDMA_CQ_MAJERR));
> + minor_err = (u32)(RS_64(qword3, IRDMA_CQ_MINERR));
> + info->compl_error = major_err << 16 | minor_err;
> + return IRDMA_ERR_CQ_COMPL_ERROR;
> + }
> +
> + get_64bit_val(cqe, 0, &qword0);
> + get_64bit_val(cqe, 16, &qword2);
> +
> + info->q_type = (u8)RS_64(qword3, IRDMA_CQ_SQ);
> + info->qp_id = (u32)RS_64(qword2, IRDMACQ_QPID);
> + if (cq->dev->hw_attrs.hw_rev > IRDMA_GEN_1)
> + info->ipv4 = (bool)RS_64(qword3, IRDMACQ_IPV4);
> +
> + get_64bit_val(cqe, 8, &comp_ctx);
> + info->qp = (struct irdma_qp_uk *)(unsigned long)comp_ctx;
> + info->wqe_idx = (u32)RS_64(qword3, IRDMA_CQ_WQEIDX);
> +
> + if (info->q_type == IRDMA_CQE_QTYPE_RQ) {
> + if (ext_valid) {
> + info->vlan_valid = (bool)RS_64(qword7, 
> IRDMA_CQ_UDVLANVALID);
> + if (info->vlan_valid) {
> + get_64bit_val(ext_cqe, 16, &qword6);
> + info->vlan = (u16)RS_64(qword6, 
> IRDMA_CQ_UDVLAN);
> + }
> + info->smac_valid = (bool)RS_64(qword7, 
> IRDMA_CQ_UDSMACVALID);
> + if (info->smac_valid) {
> + get_64bit_val(ext_cqe, 16, &qword6);
> + info->smac[0] = (u8)((qword6 >> 40) & 0xFF);
> + info->smac[1] = (u8)((qword6 >> 32) & 0xFF);
> + info->smac[2] = (u8)((qword6 >> 24) & 0xFF);
> + info->smac[3] = (u8)((qword6 >> 16) & 0xFF);
> + info->smac[4] = (u8)((qword6 >> 8) & 0xFF);
> + info->smac[5] = (u8)(qword6 & 0xFF);
> + }
> + }
> +
> + if (cq->dev->hw_attrs.hw_rev == IRDMA_GEN_1) {
> + info->vlan_valid = (bool)RS_64(qword3, 
> IRDMA_VLAN_TAG_VALID);
> + info->l4proto = (u8)RS_64(qword2, IRDMA_UDA_L4PROTO);
> + info->l3proto = (u8)RS_64(qword2, IRDMA_UDA_L3PROTO);
> + }
> +
> + info->payload_len = (u32)RS_64(qword0, IRDMACQ_PAYLDLEN);
> + }
> +
> + return 0;
> +}

Re: [RFC v1 12/19] RDMA/irdma: Implement device supported verb APIs

2019-02-24 Thread Gal Pressman

On 15-Feb-19 19:10, Shiraz Saleem wrote:
> /**
>  * irdma_dealloc_ucontext - deallocate the user context data structure
>  * @context: user context created during alloc
>  */
> static int irdma_dealloc_ucontext(struct ib_ucontext *context)
> {
>   struct irdma_ucontext *ucontext = to_ucontext(context);
>   unsigned long flags;
> 
>   spin_lock_irqsave(&ucontext->cq_reg_mem_list_lock, flags);
>   if (!list_empty(&ucontext->cq_reg_mem_list)) {
>   spin_unlock_irqrestore(&ucontext->cq_reg_mem_list_lock, flags);
>   return -EBUSY;
>   }
>   spin_unlock_irqrestore(&ucontext->cq_reg_mem_list_lock, flags);
> 
>   spin_lock_irqsave(&ucontext->qp_reg_mem_list_lock, flags);
>   if (!list_empty(&ucontext->qp_reg_mem_list)) {
>   spin_unlock_irqrestore(&ucontext->qp_reg_mem_list_lock, flags);
>   return -EBUSY;

Drivers are not permitted to fail dealloc_ucontext.

>   }
>   spin_unlock_irqrestore(&ucontext->qp_reg_mem_list_lock, flags);
>   kfree(ucontext);
> 
>   return 0;
> }

> +/**> + * irdma_disassociate_ucontext - Disassociate user context> + * 
> @context: ib user context> + */> +static void 
> irdma_disassociate_ucontext(struct ib_ucontext *context)
> +{
> +}

What's the motivation for a nop callback (over not implementing the
function)?

> +/**
> + * irdma_alloc_pd - allocate protection domain
> + * @pd: PD pointer
> + * @context: user context created during alloc
> + * @udata: user data
> + */
> +static int irdma_alloc_pd(struct ib_pd *pd,
> +   struct ib_ucontext *context,
> +   struct ib_udata *udata)
> +{
> + struct irdma_pd *iwpd = to_iwpd(pd);
> + struct irdma_device *iwdev = to_iwdev(pd->device);
> + struct irdma_sc_dev *dev = &iwdev->rf->sc_dev;
> + struct irdma_pci_f *rf = iwdev->rf;
> + struct irdma_alloc_pd_resp uresp = {};
> + struct irdma_sc_pd *sc_pd;
> + struct irdma_ucontext *ucontext;
> + u32 pd_id = 0;
> + int err;
> +
> + if (iwdev->closing)
> + return -ENODEV;
> +
> + err = irdma_alloc_rsrc(rf, rf->allocated_pds, rf->max_pd, &pd_id,
> +&rf->next_pd);
> + if (err)
> + return err;
> +
> + sc_pd = &iwpd->sc_pd;
> + if (context) {

I think this should be 'if (udata)', this applies to many other places in this 
driver.

> + ucontext = to_ucontext(context);
> + dev->iw_pd_ops->pd_init(dev, sc_pd, pd_id, ucontext->abi_ver);
> + uresp.pd_id = pd_id;
> + if (ib_copy_to_udata(udata, &uresp, sizeof(uresp))) {
> + err = -EFAULT;
> + goto error;
> + }
> + } else {
> + dev->iw_pd_ops->pd_init(dev, sc_pd, pd_id, -1);
> + }
> +
> + irdma_add_pdusecount(iwpd);
> +
> + return 0;
> +error:
> + irdma_free_rsrc(rf, rf->allocated_pds, pd_id);
> +
> + return err;
> +}
> +/**
> + * irdma_create_qp - create qp
> + * @ibpd: ptr of pd
> + * @init_attr: attributes for qp
> + * @udata: user data for create qp
> + */
> +static struct ib_qp *irdma_create_qp(struct ib_pd *ibpd,
> +  struct ib_qp_init_attr *init_attr,
> +  struct ib_udata *udata)
> +{
> + struct irdma_pd *iwpd = to_iwpd(ibpd);
> + struct irdma_device *iwdev = to_iwdev(ibpd->device);
> + struct irdma_pci_f *rf = iwdev->rf;
> + struct irdma_cqp *iwcqp = &rf->cqp;
> + struct irdma_qp *iwqp;
> + struct irdma_ucontext *ucontext;
> + struct irdma_create_qp_req req;
> + struct irdma_create_qp_resp uresp = {};
> + struct i40iw_create_qp_resp uresp_gen1 = {};
> + u32 qp_num = 0;
> + void *mem;
> + enum irdma_status_code ret;
> + int err_code = 0;
> + int sq_size;
> + int rq_size;
> + struct irdma_sc_qp *qp;
> + struct irdma_sc_dev *dev = &rf->sc_dev;
> + struct irdma_qp_init_info init_info = {};
> + struct irdma_create_qp_info *qp_info;
> + struct irdma_cqp_request *cqp_request;
> + struct cqp_cmds_info *cqp_info;
> + struct irdma_qp_host_ctx_info *ctx_info;
> + struct irdma_iwarp_offload_info *iwarp_info;
> + struct irdma_roce_offload_info *roce_info;
> + struct irdma_udp_offload_info *udp_info;
> + unsigned long flags;
> +
> + if (iwdev->closing)
> + return ERR_PTR(-ENODEV);
> +
> + if (init_attr->create_flags)
> + return ERR_PTR(-EINVAL);
> +
> + if (init_attr->cap.max_inline_data > dev->hw_attrs.max_hw_inline)
> + init_attr->cap.max_inline_data = dev->hw_attrs.max_hw_inline;
> +
> + if (init_attr->cap.max_send_sge > dev->hw_attrs.max_hw_wq_frags)
> + init_attr->cap.max_send_sge = dev->hw_attrs.max_hw_wq_frags;
> +
> + if (init_attr->cap.max_recv_sge > dev->hw_attrs.max_hw_wq_frags)
> + init_attr->cap.max_recv_sge = dev->hw_attrs.max_hw_wq_frags

Re: [RFC v1 04/19] RDMA/irdma: Add driver framework definitions

2019-02-24 Thread Gal Pressman

On 15-Feb-19 19:10, Shiraz Saleem wrote:
> +/* client interface functions */
> +static const struct i40e_client_ops i40e_ops = {
> + .open = i40iw_open,
> + .close = i40iw_close,
> + .l2_param_change = i40iw_l2param_change,
> + .virtchnl_receive = NULL,
> + .vf_reset = NULL,
> + .vf_enable = NULL,
> + .vf_capable = NULL

NULL assignments are redundant.

> +};
> +
> diff --git a/drivers/infiniband/hw/irdma/irdma_if.c 
> b/drivers/infiniband/hw/irdma/irdma_if.c
> new file mode 100644
> index 000..f7b89e9
> --- /dev/null
> +++ b/drivers/infiniband/hw/irdma/irdma_if.c
> @@ -0,0 +1,430 @@
> +// SPDX-License-Identifier: GPL-2.0 or Linux-OpenIB
> +/* Copyright (c) 2019, Intel Corporation. */
> +
> +#include 
> +#include 
> +#include 
> +#include "main.h"
> +#include "ws.h"
> +#include "icrdma_hw.h"
> +
> +void irdma_add_dev_ref(struct irdma_sc_dev *dev)
> +{
> + try_module_get(THIS_MODULE);
> +}
> +
> +void irdma_put_dev_ref(struct irdma_sc_dev *dev)
> +{
> + module_put(THIS_MODULE);
> +}

What are these used for?

> +
> +/**
> + * irdma_find_iwdev - find a vsi device given a name
> + * @name: name of iwdev
> + */

Can't find uses of this function as well.

> +struct irdma_device *irdma_find_iwdev(const char *name)
> +{
> + struct irdma_handler *hdl;
> + struct list_head *pos;
> + struct list_head *tmp;
> + struct irdma_device *iwdev;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&irdma_handler_lock, flags);
> + list_for_each_entry(hdl, &irdma_handlers, list) {
> + list_for_each_safe(pos, tmp, &hdl->rf.vsi_dev_list) {
> + iwdev = container_of(pos, struct irdma_device, list);
> + if (!strcmp(name, iwdev->iwibdev->ibdev.name)) {
> + spin_unlock_irqrestore(&irdma_handler_lock,
> +flags);
> + return iwdev;
> + }
> + }
> + }
> + spin_unlock_irqrestore(&irdma_handler_lock, flags);
> +
> + return NULL;
> +}
> +

Re: [RFC v1 12/19] RDMA/irdma: Implement device supported verb APIs

2019-02-26 Thread Gal Pressman

 char *buf)
>>> +{
>>> +   return sprintf(buf, "%.*s\n", 32, "IRDMA Board ID");
>>
>> That doesn't add much information.
> 
> Will fix.
> 
>>
>>> +}
>>> +
>>> +static DEVICE_ATTR_RO(hw_rev);
>>> +static DEVICE_ATTR_RO(hca_type);
>>> +static DEVICE_ATTR_RO(board_id);
>>> +
>>> +static struct attribute *irdma_dev_attributes[] = {
>>> +   &dev_attr_hw_rev.attr,
>>> +   &dev_attr_hca_type.attr,
>>> +   &dev_attr_board_id.attr,
>>> +   NULL
>>> +};
>>> +
>>> +static const struct attribute_group irdma_attr_group = {
>>> +   .attrs = irdma_dev_attributes,
>>> +};
>>> +
>>> +/**
>>> + * irdma_modify_port  Modify port properties
>>> + * @ibdev: device pointer from stack
>>> + * @port: port number
>>> + * @port_modify_mask: mask for port modifications
>>> + * @props: port properties
>>> + */
>>> +static int irdma_modify_port(struct ib_device *ibdev,
>>> +u8 port,
>>> +int port_modify_mask,
>>> +        struct ib_port_modify *props) {
>>> +   return 0;
>>> +}
>>
>> Same question as disacossiate_ucontext.
> 
> This was likely added during early dev. and can be removed.
> 
>>
>>> +
>>> +/**
>>> + * irdma_query_gid_roce - Query port GID for Roce
>>> + * @ibdev: device pointer from stack
>>> + * @port: port number
>>> + * @index: Entry index
>>> + * @gid: Global ID
>>> + */
>>> +static int irdma_query_gid_roce(struct ib_device *ibdev,
>>> +   u8 port,
>>> +   int index,
>>> +   union ib_gid *gid)
>>> +{
>>> +   int ret;
>>> +
>>> +   ret = rdma_query_gid(ibdev, port, index, gid);
>>> +   if (ret == -EAGAIN) {
>>
>> I can't see a path where rdma_query_gid returns -EAGAIN.
> 
> This function can be removed now. It's only applicable to non-Roce providers.
> 
>>
>>> +   memcpy(gid, &zgid, sizeof(*gid));
>>> +   return 0;
>>> +   }
>>> +
>>> +   return ret;
>>> +}
>>> +
>>
>>> +/**
>>> + * irdma_create_ah - create address handle
>>> + * @ibpd: ptr to protection domain
>>> + * @ah_attr: address handle attributes
>>
>> 'ah_attr' -> 'attr', missing flags and udata.
> 
> Will fix all these hits in the driver.
> 
> [..]
>>> + */
>>> +static int irdma_destroy_ah(struct ib_ah *ibah, u32 flags) {
>>> +   struct irdma_device *iwdev = to_iwdev(ibah->device);
>>> +   struct irdma_ah *ah = to_iwah(ibah);
>>> +   int err;
>>> +
>>> +   if (!ah->sc_ah.ah_info.ah_valid)
>>> +   return -EINVAL;
>>> +
>>> +   err = irdma_ah_cqp_op(iwdev->rf, &ah->sc_ah,
>> IRDMA_OP_AH_DESTROY,
>>> + flags & RDMA_DESTROY_AH_SLEEPABLE,
>>> + irdma_destroy_ah_cb, ah);
>>> +   if (!err)
>>> +   return 0;
>>
>> Why are the rest of the cleanups only in case of error?
> 
> On success, the cleanup is done in the callback, irdma_destroy_ah_cb.
> 
> [...]
> 
> 
>>> +static __be64 irdma_mac_to_guid(struct net_device *ndev) {
>>> +   unsigned char *mac = ndev->dev_addr;
>>> +   __be64 guid;
>>> +   unsigned char *dst = (unsigned char *)&guid;
>>> +
>>> +   dst[0] = mac[0] ^ 2;
>>> +   dst[1] = mac[1];
>>> +   dst[2] = mac[2];
>>> +   dst[3] = 0xff;
>>> +   dst[4] = 0xfe;
>>> +   dst[5] = mac[3];
>>> +   dst[6] = mac[4];
>>> +   dst[7] = mac[5];
>>> +
>>> +   return guid;
>>> +}
>>
>> There's a variant of this function in irdma, bnxt_re, ocrdma and qedr.
>> Maybe it's time to provide it in common code?
> 
> Agreed.
> 

Other than that:
Reviewed-by: Gal Pressman

Re: [PATCH rdma-next 1/8] RDMA/core: Check if client supports IB device or not

2021-04-04 Thread Gal Pressman

On 05/04/2021 8:49, Leon Romanovsky wrote:
> From: Parav Pandit 
> 
> RDMA devices are of different transport(iWarp, IB, RoCE) and have
> different attributes.
> Not all clients are interested in all type of devices.
> 
> Implement a generic callback that each IB client can implement to decide
> if client add() or remove() should be done by the IB core or not for a
> given IB device, client combination.
> 
> Signed-off-by: Parav Pandit 
> Signed-off-by: Leon Romanovsky 
> ---
>  drivers/infiniband/core/device.c | 3 +++
>  include/rdma/ib_verbs.h  | 9 +
>  2 files changed, 12 insertions(+)
> 
> diff --git a/drivers/infiniband/core/device.c 
> b/drivers/infiniband/core/device.c
> index c660cef66ac6..c9af2deba8c1 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -691,6 +691,9 @@ static int add_client_context(struct ib_device *device,
>   if (!device->kverbs_provider && !client->no_kverbs_req)
>   return 0;
>  
> + if (client->is_supported && !client->is_supported(device))
> + return 0;

Isn't it better to remove the kverbs_provider flag (from previous if statement)
and unify it with this generic support check?

Re: [net-next 06/14] net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flow

2019-04-23 Thread Gal Pressman

On 23-Apr-19 01:32, Saeed Mahameed wrote:
>  static inline void
> -mlx5e_xdp_mpwqe_add_dseg(struct mlx5e_xdpsq *sq, dma_addr_t dma_addr, u16 
> dma_len)
> +mlx5e_xdp_mpwqe_add_dseg(struct mlx5e_xdpsq *sq, struct mlx5e_xdp_info *xdpi,
> +  struct mlx5e_xdpsq_stats *stats)

Passing stats as a function parameter is weird, why not remove and use 
sq->stats?

>  {
>   struct mlx5e_xdp_mpwqe *session = &sq->mpwqe;
> + dma_addr_t dma_addr= xdpi->dma_addr;
> + struct xdp_frame *xdpf = xdpi->xdpf;
>   struct mlx5_wqe_data_seg *dseg =
> - (struct mlx5_wqe_data_seg *)session->wqe + session->ds_count++;
> + (struct mlx5_wqe_data_seg *)session->wqe + session->ds_count;
> + u16 dma_len = xdpf->len;
>  
> + session->pkt_count++;
> +
> +#define MLX5E_XDP_INLINE_WQE_SZ_THRSD (256 - sizeof(struct 
> mlx5_wqe_inline_seg))
> +
> + if (session->inline_on && dma_len <= MLX5E_XDP_INLINE_WQE_SZ_THRSD) {
> + struct mlx5_wqe_inline_seg *inline_dseg =
> + (struct mlx5_wqe_inline_seg *)dseg;
> + u16 ds_len = sizeof(*inline_dseg) + dma_len;
> + u16 ds_cnt = DIV_ROUND_UP(ds_len, MLX5_SEND_WQE_DS);
> +
> + if (unlikely(session->ds_count + ds_cnt > 
> session->max_ds_count)) {
> + /* Not enough space for inline wqe, send with memory 
> pointer */
> + session->complete = true;
> + goto no_inline;
> + }
> +
> + inline_dseg->byte_count = cpu_to_be32(dma_len | 
> MLX5_INLINE_SEG);
> + memcpy(inline_dseg->data, xdpf->data, dma_len);
> +
> + session->ds_count += ds_cnt;
> + stats->inlnw++;
> + return;
> + }
> +
> +no_inline:
>   dseg->addr   = cpu_to_be64(dma_addr);
>   dseg->byte_count = cpu_to_be32(dma_len);
>   dseg->lkey   = sq->mkey_be;
> + session->ds_count++;
>  }

Re: [net-next 13/15] net/mlx5: A write memory barrier is sufficient in EQ ci update

2019-04-04 Thread Gal Pressman

On 03-Apr-19 02:03, Saeed Mahameed wrote:
> From: Tariq Toukan 
> 
> Soften the memory barrier call of mb() by a sufficient wmb() in the
> consumer index update of the event queues.
> 
> Signed-off-by: Tariq Toukan 
> Signed-off-by: Saeed Mahameed 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index 46a747f7c162..e9837aeb7088 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -707,7 +707,7 @@ void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool 
> arm)
>  
>   __raw_writel((__force u32)cpu_to_be32(val), addr);
>   /* We still want ordering, just not swabbing, so add a barrier */
> - mb();
> + wmb();

Shouldn't this barrier be placed prior to __raw_writel()?

>  }
>  EXPORT_SYMBOL(mlx5_eq_update_ci);
>  
>

Re: [PATCH v5 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions

2019-07-08 Thread Gal Pressman

On 08/07/2019 12:14, Michal Kalderon wrote:
> diff --git a/drivers/infiniband/core/device.c 
> b/drivers/infiniband/core/device.c
> index 8a6ccb936dfe..a830c2c5d691 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev, const 
> struct ib_device_ops *ops)
>   SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
>   SET_DEVICE_OP(dev_ops, map_phys_fmr);
>   SET_DEVICE_OP(dev_ops, mmap);
> + SET_DEVICE_OP(dev_ops, mmap_free);
>   SET_DEVICE_OP(dev_ops, modify_ah);
>   SET_DEVICE_OP(dev_ops, modify_cq);
>   SET_DEVICE_OP(dev_ops, modify_device);
> diff --git a/drivers/infiniband/core/rdma_core.c 
> b/drivers/infiniband/core/rdma_core.c
> index ccf4d069c25c..7166741834c8 100644
> --- a/drivers/infiniband/core/rdma_core.c
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file 
> *ufile,
>   rdma_restrack_del(&ucontext->res);
>  
>   ib_dev->ops.dealloc_ucontext(ucontext);
> + rdma_user_mmap_entries_remove_free(ucontext);

This should happen before dealloc_ucontext.

> +struct rdma_user_mmap_entry *
> +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len)
> +{
> + struct rdma_user_mmap_entry *entry;
> + u64 mmap_page;
> +
> + mmap_page = key >> PAGE_SHIFT;
> + if (mmap_page > U32_MAX)
> + return NULL;
> +
> + entry = xa_load(&ucontext->mmap_xa, mmap_page);
> + if (!entry || rdma_user_mmap_get_key(entry) != key ||

I wonder if the 'rdma_user_mmap_get_key(entry) != key' check is still needed.

> +/*
> + * This is only called when the ucontext is destroyed and there can be no
> + * concurrent query via mmap or allocate on the xarray, thus we can be sure 
> no
> + * other thread is using the entry pointer. We also know that all the BAR
> + * pages have either been zap'd or munmaped at this point.  Normal pages are
> + * refcounted and will be freed at the proper time.
> + */
> +void rdma_user_mmap_entries_remove_free(struct ib_ucontext *ucontext)
> +{
> + struct rdma_user_mmap_entry *entry;
> + unsigned long mmap_page;
> +
> + xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
> + xa_erase(&ucontext->mmap_xa, mmap_page);
> +
> + ibdev_dbg(ucontext->device,
> +   "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] 
> removed\n",
> +   entry->obj, rdma_user_mmap_get_key(entry),
> +   entry->address, entry->length);
> + if (ucontext->device->ops.mmap_free)
> + ucontext->device->ops.mmap_free(entry->address,
> + entry->length,
> + entry->mmap_flag);

Pass entry instead?

> + kfree(entry);
> + }
> +}
> +
>  void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
>  {
>   struct rdma_umap_priv *priv, *next_priv;
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 26e9c2594913..54ce3fdae180 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1425,6 +1425,8 @@ struct ib_ucontext {
>* Implementation details of the RDMA core, don't use in drivers:
>*/
>   struct rdma_restrack_entry res;
> + struct xarray mmap_xa;
> + u32 mmap_xa_page;
>  };
>  
>  struct ib_uobject {
> @@ -2311,6 +2313,7 @@ struct ib_device_ops {
> struct ib_udata *udata);
>   void (*dealloc_ucontext)(struct ib_ucontext *context);
>   int (*mmap)(struct ib_ucontext *context, struct vm_area_struct *vma);
> + void (*mmap_free)(u64 address, u64 length, u8 mmap_flag);

I feel like this callback needs some documentation.

>   void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
>   int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
>   void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
> @@ -2706,9 +2709,23 @@ void  ib_set_client_data(struct ib_device *device, 
> struct ib_client *client,
>  void ib_set_device_ops(struct ib_device *device,
>  const struct ib_device_ops *ops);
>  
> +#define RDMA_USER_MMAP_INVALID U64_MAX
> +struct rdma_user_mmap_entry {
> + void  *obj;

I know EFA is the culprit here, but please remove the extra space :).

> + u64 address;
> + u64 length;
> + u32 mmap_page;
> + u8 mmap_flag;
> +};
> +

Re: [PATCH v5 rdma-next 2/6] RDMA/efa: Use the common mmap_xa helpers

2019-07-09 Thread Gal Pressman

On 08/07/2019 12:14, Michal Kalderon wrote:

Hi, a few nits:

> Remove the functions related to managing the mmap_xa database.
> This code was copied to the ib_core. Use the common API's instead.
> 
> Signed-off-by: Michal Kalderon 
> ---
>  drivers/infiniband/hw/efa/efa.h   |   3 +-
>  drivers/infiniband/hw/efa/efa_main.c  |   1 +
>  drivers/infiniband/hw/efa/efa_verbs.c | 183 
> --
>  3 files changed, 42 insertions(+), 145 deletions(-)
> diff --git a/drivers/infiniband/hw/efa/efa_verbs.c 
> b/drivers/infiniband/hw/efa/efa_verbs.c
> index df77bc312a25..5dff892da161 100644
> --- a/drivers/infiniband/hw/efa/efa_verbs.c
> +++ b/drivers/infiniband/hw/efa/efa_verbs.c
> @@ -13,34 +13,15 @@
>  
>  #include "efa.h"
>  
> -#define EFA_MMAP_FLAG_SHIFT 56
> -#define EFA_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
> -#define EFA_MMAP_INVALID U64_MAX
> -

Don't delete the blank line please.

>  enum {
>   EFA_MMAP_DMA_PAGE = 0,
>   EFA_MMAP_IO_WC,
>   EFA_MMAP_IO_NC,
>  };
> -
>  #define EFA_AENQ_ENABLED_GROUPS \
>   (BIT(EFA_ADMIN_FATAL_ERROR) | BIT(EFA_ADMIN_WARNING) | \
>BIT(EFA_ADMIN_NOTIFICATION) | BIT(EFA_ADMIN_KEEP_ALIVE))
>  
> -struct efa_mmap_entry {
> - void  *obj;
> - u64 address;
> - u64 length;
> - u32 mmap_page;
> - u8 mmap_flag;
> -};
> -
> -static inline u64 get_mmap_key(const struct efa_mmap_entry *efa)
> -{
> - return ((u64)efa->mmap_flag << EFA_MMAP_FLAG_SHIFT) |
> -((u64)efa->mmap_page << PAGE_SHIFT);
> -}
> -
>  #define EFA_CHUNK_PAYLOAD_SHIFT   12
>  #define EFA_CHUNK_PAYLOAD_SIZEBIT(EFA_CHUNK_PAYLOAD_SHIFT)
>  #define EFA_CHUNK_PAYLOAD_PTR_SIZE8
> @@ -145,105 +126,7 @@ static void *efa_zalloc_mapped(struct efa_dev *dev, 
> dma_addr_t *dma_addr,
>   return addr;
>  }
>  
> -/*
> - * This is only called when the ucontext is destroyed and there can be no
> - * concurrent query via mmap or allocate on the xarray, thus we can be sure 
> no
> - * other thread is using the entry pointer. We also know that all the BAR
> - * pages have either been zap'd or munmaped at this point.  Normal pages are
> - * refcounted and will be freed at the proper time.
> - */
> -static void mmap_entries_remove_free(struct efa_dev *dev,
> -  struct efa_ucontext *ucontext)
> -{
> - struct efa_mmap_entry *entry;
> - unsigned long mmap_page;
>  
> - xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
> - xa_erase(&ucontext->mmap_xa, mmap_page);
> -
> - ibdev_dbg(
> - &dev->ibdev,
> - "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] 
> removed\n",
> - entry->obj, get_mmap_key(entry), entry->address,
> - entry->length);
> - if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
> - /* DMA mapping is already gone, now free the pages */
> - free_pages_exact(phys_to_virt(entry->address),
> -  entry->length);
> - kfree(entry);
> - }
> -}
> -
> -static struct efa_mmap_entry *mmap_entry_get(struct efa_dev *dev,
> -  struct efa_ucontext *ucontext,
> -  u64 key, u64 len)
> -{
> - struct efa_mmap_entry *entry;
> - u64 mmap_page;
> -
> - mmap_page = (key & EFA_MMAP_PAGE_MASK) >> PAGE_SHIFT;
> - if (mmap_page > U32_MAX)
> - return NULL;
> -
> - entry = xa_load(&ucontext->mmap_xa, mmap_page);
> - if (!entry || get_mmap_key(entry) != key || entry->length != len)
> - return NULL;
> -
> - ibdev_dbg(&dev->ibdev,
> -   "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
> -   entry->obj, key, entry->address, entry->length);
> -
> - return entry;
> -}
> -
> -/*
> - * Note this locking scheme cannot support removal of entries, except during
> - * ucontext destruction when the core code guarentees no concurrency.
> - */
> -static u64 mmap_entry_insert(struct efa_dev *dev, struct efa_ucontext 
> *ucontext,
> -  void *obj, u64 address, u64 length, u8 mmap_flag)
> -{
> - struct efa_mmap_entry *entry;
> - u32 next_mmap_page;
> - int err;
> -
> - entry = kmalloc(sizeof(*entry), GFP_KERNEL);
> - if (!entry)
> - return EFA_MMAP_INVALID;
> -
> - entry->obj = obj;
> - entry->address = address;
> - entry->length = length;
> - entry->mmap_flag = mmap_flag;
> -
> - xa_lock(&ucontext->mmap_xa);
> - if (check_add_overflow(ucontext->mmap_xa_page,
> -(u32)(length >> PAGE_SHIFT),
> -&next_mmap_page))
> - goto err_unlock;
> -
> - entry->mmap_page = ucontext->mmap_xa_page;
> - ucontext->mmap_xa_page = next_mmap_page;
> - err = __xa_insert(&ucontext->mmap_xa, entry->mmap_page, entry,

Re: [PATCH v6 rdma-next 0/6] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA

2019-07-10 Thread Gal Pressman

On 09/07/2019 17:17, Michal Kalderon wrote:
> This patch series uses the doorbell overflow recovery mechanism
> introduced in
> commit 36907cd5cd72 ("qed: Add doorbell overflow recovery mechanism")
> for rdma ( RoCE and iWARP )
> 
> The first three patches modify the core code to contain helper
> functions for managing mmap_xa inserting, getting and freeing
> entries. The code was taken almost as is from the efa driver.
> There is still an open discussion on whether we should take
> this even further and make the entire mmap generic. Until a
> decision is made, I only created the database API and modified
> the efa and qedr driver to use it. The doorbell recovery code will be based
> on the common code.
> 
> Efa driver was compile tested only.

For the whole series:
Tested-by: Gal Pressman

Re: [PATCH v6 rdma-next 2/6] RDMA/efa: Use the common mmap_xa helpers

2019-07-10 Thread Gal Pressman

On 09/07/2019 17:17, Michal Kalderon wrote:
> Remove the functions related to managing the mmap_xa database.
> This code was copied to the ib_core. Use the common API's instead.
> 
> Signed-off-by: Ariel Elior 
> Signed-off-by: Michal Kalderon 

Thanks Michal,
Acked-by: Gal Pressman

Re: [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions

2019-07-10 Thread Gal Pressman

On 09/07/2019 17:17, Michal Kalderon wrote:
> Create some common API's for adding entries to a xa_mmap.
> Searching for an entry and freeing one.
> 
> The code was copied from the efa driver almost as is, just renamed
> function to be generic and not efa specific.
> 
> Signed-off-by: Ariel Elior 
> Signed-off-by: Michal Kalderon 

Reviewed-by: Gal Pressman

Re: [RDMA RFC v6 16/16] RDMA/irdma: Update MAINTAINERS file

2020-05-20 Thread Gal Pressman

On 20/05/2020 10:04, Jeff Kirsher wrote:
> From: Shiraz Saleem 
> 
> Add maintainer entry for irdma driver.
> 
> Signed-off-by: Mustafa Ismail 
> Signed-off-by: Shiraz Saleem 
> ---
>  MAINTAINERS | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 598d0e1b3501..8b8e3e0064cf 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8745,6 +8745,14 @@ L: linux...@vger.kernel.org
>  S:   Supported
>  F:   drivers/cpufreq/intel_pstate.c
>  
> +INTEL ETHERNET PROTOCL DRIVER FOR RDMA
> +M:   Mustafa Ismail 
> +M:   Shiraz Saleem 
> +L:   linux-r...@vger.kernel.org
> +S:   Supported
> +F:   drivers/infiniband/hw/irdma/
> +F:   include/uapi/rdma/irdma-abi.h

This list should be sorted alphabetically.

>  INTEL SPEED SELECT TECHNOLOGY
>  M:   Srinivas Pandruvada 
>  L:   platform-driver-...@vger.kernel.org
>

Re: [RDMA RFC v6 14/16] RDMA/irdma: Add ABI definitions

2020-05-20 Thread Gal Pressman

On 20/05/2020 10:04, Jeff Kirsher wrote:
> +struct i40iw_create_qp_resp {
> + __u32 qp_id;
> + __u32 actual_sq_size;
> + __u32 actual_rq_size;
> + __u32 i40iw_drv_opt;
> + __u16 push_idx;
> + __u8 lsmm;
> + __u8 rsvd;
> +};

This struct size should be 8 bytes aligned.

Re: [RDMA RFC v6 14/16] RDMA/irdma: Add ABI definitions

2020-05-20 Thread Gal Pressman

On 20/05/2020 11:52, Greg KH wrote:
> On Wed, May 20, 2020 at 10:54:25AM +0300, Gal Pressman wrote:
>> On 20/05/2020 10:04, Jeff Kirsher wrote:
>>> +struct i40iw_create_qp_resp {
>>> +   __u32 qp_id;
>>> +   __u32 actual_sq_size;
>>> +   __u32 actual_rq_size;
>>> +   __u32 i40iw_drv_opt;
>>> +   __u16 push_idx;
>>> +   __u8 lsmm;
>>> +   __u8 rsvd;
>>> +};
>>
>> This struct size should be 8 bytes aligned.
> 
> Aligned in what way?  Seems sane to me, what would you want it to look
> like instead?

The uverbs ABI structs sizes are assumed to be padded to 8 bytes alignment, I
would expect the reserved field to be an array of 5 bytes as done in other
structs in this file (irdma_modify_qp_req for example).
Jason could correct me if I'm wrong?

Re: [PATCH V6 mlx5-next 10/16] RDMA: Group create AH arguments in struct

2020-04-28 Thread Gal Pressman

On 26/04/2020 10:17, Maor Gottlieb wrote:
> Following patch adds additional argument to the create AH function,
> so it make sense to group ah_attr and flags arguments in struct.
> 
> Signed-off-by: Maor Gottlieb 

RDMA driver maintainers should probably be CC'd.

> diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
> index aa7396a1588a..45d519edb4c3 100644
> --- a/drivers/infiniband/hw/efa/efa.h
> +++ b/drivers/infiniband/hw/efa/efa.h
> @@ -153,8 +153,7 @@ int efa_mmap(struct ib_ucontext *ibucontext,
>struct vm_area_struct *vma);
>  void efa_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
>  int efa_create_ah(struct ib_ah *ibah,
> -   struct rdma_ah_attr *ah_attr,
> -   u32 flags,
> +   struct rdma_ah_init_attr *init_attr,
> struct ib_udata *udata);
>  void efa_destroy_ah(struct ib_ah *ibah, u32 flags);
>  int efa_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
> diff --git a/drivers/infiniband/hw/efa/efa_verbs.c 
> b/drivers/infiniband/hw/efa/efa_verbs.c
> index 5c57098a4aee..454b01b21e6a 100644
> --- a/drivers/infiniband/hw/efa/efa_verbs.c
> +++ b/drivers/infiniband/hw/efa/efa_verbs.c
> @@ -1639,10 +1639,10 @@ static int efa_ah_destroy(struct efa_dev *dev, struct 
> efa_ah *ah)
>  }
>  
>  int efa_create_ah(struct ib_ah *ibah,
> -   struct rdma_ah_attr *ah_attr,
> -   u32 flags,
> +   struct rdma_ah_init_attr *init_attr,
> struct ib_udata *udata)
>  {
> + struct rdma_ah_attr *ah_attr = init_attr->ah_attr;
>   struct efa_dev *dev = to_edev(ibah->device);
>   struct efa_com_create_ah_params params = {};
>   struct efa_ibv_create_ah_resp resp = {};
> @@ -1650,7 +1650,7 @@ int efa_create_ah(struct ib_ah *ibah,
>   struct efa_ah *ah = to_eah(ibah);
>   int err;
>  
> - if (!(flags & RDMA_CREATE_AH_SLEEPABLE)) {
> + if (!(init_attr->flags & RDMA_CREATE_AH_SLEEPABLE)) {
>   ibdev_dbg(&dev->ibdev,
>     "Create address handle is not supported in atomic 
> context\n");
>   err = -EOPNOTSUPP;

EFA part looks good,
Acked-by: Gal Pressman

Re: rfc: bool structure members (was Re: [PATCH V3] net/mlx4: Get rid of page operation after dma_alloc_coherent)

2018-12-23 Thread Gal Pressman

On 22-Dec-18 01:52, Jason Gunthorpe wrote:
> On Thu, Dec 20, 2018 at 09:12:43PM -0800, Joe Perches wrote:
>> Care to submit a coding_style.rst patch or
>> improve the one below this?
> 
> I took yours and revised it a little bit. I spent some time looking at
> assembly and decided to drop the performance note, the number of cases
> that run into overhead seems pretty small and probably already
> requires !! to be correct. There is also an equally unlikely gain, ie
> 'if (a & b)' optimizes a tiny bit better for bool types.
> 
> I also added a small intro on bool, as I know some people are
> unfamiliar with C11 _Bool and might think bool is just '#define bool
> u8'
> 
> (for those added to the cc) I'm looking at cases, like the patch that
> spawned this, where the struct has a single bool and no performance
> considerations. As CH said, seeing that get converted to int due to
> checkpatch is worse than having used bool. Using u8 won't make this
> struct smaller or faster.
> 
> Cheers,
> Jason
> 
> From c5e2c887f6326e1c4369876f39996417da5e90cc Mon Sep 17 00:00:00 2001
> From: Jason Gunthorpe 
> Date: Fri, 21 Dec 2018 16:29:22 -0700
> Subject: [PATCH] coding-style: Clarify the expectations around bool
> 
> There has been some confusion since checkpatch started warning about bool
> use in structures, and people have been avoiding using it.
> 
> Many people feel there is still a legitimate place for bool in structures,
> so provide some guidance on bool usage derived from the entire thread that
> spawned the checkpatch warning.

Since a "Using bool" section is added, perhaps it's worth documenting the bool
usage as a function parameter [1]?

[1] https://www.spinics.net/lists/linux-rdma/msg72336.html

> 
> Link: 
> https://lkml.kernel.org/r/ca+55afwvzk1ofb9t2v014ptakfhtvan_zj2dojncy3x6e4u...@mail.gmail.com
> Signed-off-by: Joe Perches 
> Signed-off-by: Jason Gunthorpe 
> ---
>  Documentation/process/coding-style.rst | 33 ++
>  1 file changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/process/coding-style.rst 
> b/Documentation/process/coding-style.rst
> index 4e7c0a1c427a9a..efdb1d32035fe1 100644
> --- a/Documentation/process/coding-style.rst
> +++ b/Documentation/process/coding-style.rst
> @@ -918,7 +918,32 @@ result.  Typical examples would be functions that return 
> pointers; they use
>  NULL or the ERR_PTR mechanism to report failure.
>  
>  
> -17) Don't re-invent the kernel macros
> +17) Using bool
> +--
> +
> +The Linux kernel uses the C11 standard for the bool type. bool values can 
> only
> +evaluate to 0 or 1, and implicit or explicit conversion to bool automatically
> +converts the value to true or false. When using bool types the !! 
> construction
> +is not needed, which eliminates a class of bugs.
> +
> +When working with bool values the true and false labels should be used 
> instead
> +of 0 and 1.
> +
> +bool function return types, arguments and stack variables are always fine to
> +use whenever appropriate. Use of bool is encouraged to improve readability 
> and
> +is often a better option than 'int' for storing boolean values.
> +
> +Do not use bool if cache line layout or size of the value matters, its size
> +and alignment varies based on the compiled architecture. Structures that are
> +optimized for alignment and size should not use bool.
> +
> +If a structure has many true/false values, consider consolidating them into a
> +bitfield with 1 bit members, or using an appropriate fixed width type, such 
> as
> +u8.
> +
> +Otherwise limited use of bool in structures does improve readability.
> +
> +18) Don't re-invent the kernel macros
>  -
>  
>  The header file include/linux/kernel.h contains a number of macros that
> @@ -941,7 +966,7 @@ need them.  Feel free to peruse that header file to see 
> what else is already
>  defined that you shouldn't reproduce in your code.
>  
>  
> -18) Editor modelines and other cruft
> +19) Editor modelines and other cruft
>  
>  
>  Some editors can interpret configuration information embedded in source 
> files,
> @@ -975,7 +1000,7 @@ own custom mode, or may have some other magic method for 
> making indentation
>  work correctly.
>  
>  
> -19) Inline assembly
> +20) Inline assembly
>  ---
>  
>  In architecture-specific code, you may need to use inline assembly to 
> interface
> @@ -1007,7 +1032,7 @@ the next instruction in the assembly output:
>: /* outputs */ : /* inputs */ : /* clobbers */);
>  
>  
> -20) Conditional Compilation
> +21) Conditional Compilation
>  ---
>  
>  Wherever possible, don't use preprocessor conditionals (#if, #ifdef) in .c
>

1 2 >

1 - 100 of 104 matches

Mail list logo