Re: [Intel-wired-lan] [PATCH net-next] e1000e: Mark e1000e_pm_prepare() as __maybe_unused

2021-03-21 Thread Neftin, Sasha

On 3/17/2021 16:52, 'w00385741 wrote:

From: Wei Yongjun 

The function e1000e_pm_prepare() may have no callers depending
on configuration, so it must be marked __maybe_unused to avoid
harmless warning:

drivers/net/ethernet/intel/e1000e/netdev.c:6926:12:
  warning: 'e1000e_pm_prepare' defined but not used [-Wunused-function]
  6926 | static int e1000e_pm_prepare(struct device *dev)
   |^

Fixes: ccf8b940e5fd ("e1000e: Leverage direct_complete to speed up s2ram")
Reported-by: Hulk Robot 
Signed-off-by: Wei Yongjun 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Acked-by: Sasha Neftin 

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index f1c9debd9f3b..d2e4653536c5 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6923,7 +6923,7 @@ static int __e1000_resume(struct pci_dev *pdev)
return 0;
  }
  
-static int e1000e_pm_prepare(struct device *dev)

+static __maybe_unused int e1000e_pm_prepare(struct device *dev)
  {
return pm_runtime_suspended(dev) &&
pm_suspend_via_firmware();

___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan



Re: [Intel-wired-lan] [PATCH] e1000e: Fix error handling in e1000_set_d0_lplu_state_82571

2021-03-07 Thread Neftin, Sasha

On 2/28/2021 11:44, Dinghao Liu wrote:

There is one e1e_wphy() call in e1000_set_d0_lplu_state_82571
that we have caught its return value but lack further handling.
Check and terminate the execution flow just like other e1e_wphy()
in this function.

Signed-off-by: Dinghao Liu 
---
  drivers/net/ethernet/intel/e1000e/82571.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/82571.c 
b/drivers/net/ethernet/intel/e1000e/82571.c
index 88faf05e23ba..0b1e890dd583 100644
--- a/drivers/net/ethernet/intel/e1000e/82571.c
+++ b/drivers/net/ethernet/intel/e1000e/82571.c
@@ -899,6 +899,8 @@ static s32 e1000_set_d0_lplu_state_82571(struct e1000_hw 
*hw, bool active)
} else {
data &= ~IGP02E1000_PM_D0_LPLU;
ret_val = e1e_wphy(hw, IGP02E1000_PHY_POWER_MGMT, data);
+   if (ret_val)
+   return ret_val;
/* LPLU and SmartSpeed are mutually exclusive.  LPLU is used
 * during Dx states where the power conservation is most
 * important.  During driver activity we should enable


Good for me.
Acked-by: Sasha Neftin 


Re: [Intel-wired-lan] [PATCH net 1/2] e1000e: Fix duplicate include guard

2021-03-02 Thread Neftin, Sasha

On 2/22/2021 06:00, Tom Seewald wrote:

The include guard "_E1000_HW_H_" is used by header files in three
different drivers (e1000/e1000_hw.h, e1000e/hw.h, and igb/e1000_hw.h).
Using the same include guard macro in more than one header file may
cause unexpected behavior from the compiler. Fix the duplicate include
guard in the e1000e driver by renaming it.

Fixes: bc7f75fa9788 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 
devices only)")
Signed-off-by: Tom Seewald 
---
  drivers/net/ethernet/intel/e1000e/hw.h | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/hw.h 
b/drivers/net/ethernet/intel/e1000e/hw.h
index 69a2329ea463..db79c4e6413e 100644
--- a/drivers/net/ethernet/intel/e1000e/hw.h
+++ b/drivers/net/ethernet/intel/e1000e/hw.h
@@ -1,8 +1,8 @@
  /* SPDX-License-Identifier: GPL-2.0 */
  /* Copyright(c) 1999 - 2018 Intel Corporation. */
  
-#ifndef _E1000_HW_H_

-#define _E1000_HW_H_
+#ifndef _E1000E_HW_H_
+#define _E1000E_HW_H_
  
  #include "regs.h"

  #include "defines.h"
@@ -714,4 +714,4 @@ struct e1000_hw {
  #include "80003es2lan.h"
  #include "ich8lan.h"
  
-#endif

+#endif /* _E1000E_HW_H_ */


Acked-by: Sasha Neftin 


Re: [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended

2021-01-31 Thread Neftin, Sasha

On 1/30/2021 20:12, Jakub Kicinski wrote:

On Sat, 30 Jan 2021 16:00:06 +0200 Neftin, Sasha wrote:

On 1/30/2021 08:22, Jakub Kicinski wrote:

On Thu, 28 Jan 2021 13:38:48 -0800 Tony Nguyen wrote:

From: Kai-Heng Feng 

Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown
when device is runtime suspended"), if we try to read speed and duplex
sysfs while the device is runtime suspended, igc will complain and
stops working:
   

The more generic approach will be wrap get_link_ksettings() with begin()
and complete() callbacks, and calls runtime resume and runtime suspend
routine respectively. However, igc is like igb, runtime resume routine
uses rtnl_lock() which upper ethtool layer also uses.

So to prevent a deadlock on rtnl, take a different approach, use
pm_runtime_suspended() to avoid reading register while device is runtime
suspended.


Is someone working on the full fix to how PM operates?

There is another rd32(IGC_STATUS) in this file which I don't think
is protected either.


What is another rd32(IGC_STATUS) you meant? in  igc_ethtool_get_regs?


Yes.


While the device in D3 state there is no configuration space registers
access.


That's to say similar stack trace will be generated to the one fixed
here, if someone runs ethtool -d, correct? I don't see anything
checking runtime there either.


yes.
This problem crosses many drivers. (not only igb, igc,...)

specific to this one (igc), can we check 'netif_running at begin of the 
_get_regs method:

if (!netif_running(netdev))
return;
what do you think? (only OS can put device to the D3)


To be clear I'm not asking for this to be addressed in this series.
Rather for a strong commitment that PM handling will be restructured.
It seems to me you should depend on refcounting / locking that the PM
subsystem does more rather than involving rtnl_lock.



Re: [PATCH net 1/4] igc: Report speed and duplex as unknown when device is runtime suspended

2021-01-30 Thread Neftin, Sasha

On 1/30/2021 08:22, Jakub Kicinski wrote:

On Thu, 28 Jan 2021 13:38:48 -0800 Tony Nguyen wrote:

From: Kai-Heng Feng 

Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown
when device is runtime suspended"), if we try to read speed and duplex
sysfs while the device is runtime suspended, igc will complain and
stops working:



The more generic approach will be wrap get_link_ksettings() with begin()
and complete() callbacks, and calls runtime resume and runtime suspend
routine respectively. However, igc is like igb, runtime resume routine
uses rtnl_lock() which upper ethtool layer also uses.

So to prevent a deadlock on rtnl, take a different approach, use
pm_runtime_suspended() to avoid reading register while device is runtime
suspended.


Is someone working on the full fix to how PM operates?

There is another rd32(IGC_STATUS) in this file which I don't think
is protected either.

Hello Jakub,
What is another rd32(IGC_STATUS) you meant? in  igc_ethtool_get_regs? 
While the device in D3 state there is no configuration space registers 
access.

sasha






Re: Fw: [External] Re: [PATCH v4 0/4] Improve s0ix flows for systems i219LM

2020-12-15 Thread Neftin, Sasha

On 12/15/2020 19:20, Limonciello, Mario wrote:



Absolutely - I'll ask them to look into this again.


we need to explain why on Windows systems required 1s and on Linux
systems up to 2.5s - otherwise it is not reliable approach - you will
encounter others buggy system.
(ME not POR on the Linux systems - is only one possible answer)


Sasha: In your opinion does this information need to block the series?
or can we follow up with more changes later on as more information becomes
available?


I do not think this should block the patches series.

For now v5 of the series extends the timeout but at least makes a mention
that there appears to be a firmware bug when more than 1 second is taken.





Re: Fw: [External] Re: [PATCH v4 0/4] Improve s0ix flows for systems i219LM

2020-12-15 Thread Neftin, Sasha

On 12/14/2020 20:40, Mark Pearson wrote:

Thanks Hans

On 14/12/2020 13:31, Mark Pearson wrote:




*From:* Hans de Goede 
*Sent:* December 14, 2020 13:24
*To:* Mario Limonciello ; Jeff Kirsher
; Tony Nguyen ;
intel-wired-...@lists.osuosl.org ;
David Miller ; Aaron Ma ;
Mark Pearson 
*Cc:* linux-ker...@vger.kernel.org ;
Netdev ; Alexander Duyck
; Jakub Kicinski ; Sasha
Netfin ; Aaron Brown ;
Stefan Assmann ; darc...@redhat.com
; yijun.s...@dell.com ;
perry.y...@dell.com ; anthony.w...@canonical.com

*Subject:* [External] Re: [PATCH v4 0/4] Improve s0ix flows for systems
i219LM
  
Hi All,






###

I've added Mark Pearson from Lenovo to the Cc so that Lenovo
can investigate this issue further.

Mark, this thread is about an issue with enabling S0ix support for
e1000e (i219lm) controllers. This was enabled in the kernel a
while ago, but then got disabled again on vPro / AMT enabled
systems because on some systems (Lenovo X1C7 and now also X1C8)
this lead to suspend/resume issues.

When AMT is active then there is a handover handshake for the
OS to get access to the ethernet controller from the ME. The
Intel folks have checked and the Windows driver is using a timeout
of 1 second for this handshake, yet on Lenovo systems this is
taking 2 seconds. This likely has something to do with the
ME firmware on these Lenovo models, can you get the firmware
team at Lenovo to investigate this further ?

Absolutely - I'll ask them to look into this again.

we need to explain why on Windows systems required 1s and on Linux 
systems up to 2.5s - otherwise it is not reliable approach - you will 
encounter others buggy system.

(ME not POR on the Linux systems - is only one possible answer)

We did try to make progress with this previously - but it got a bit
stuck and hence the need for these patchesbut I believe things may
have changed a bit so it's worth trying again

Mark


Sasha


Re: [PATCH v3 0/7] Improve s0ix flows for systems i219LM

2020-12-13 Thread Neftin, Sasha

On 12/10/2020 07:28, Neftin, Sasha wrote:

On 12/10/2020 04:24, Alexander Duyck wrote:

On Wed, Dec 9, 2020 at 6:44 AM Hans de Goede  wrote:


Hi,

On 12/8/20 5:14 PM, Alexander Duyck wrote:
On Tue, Dec 8, 2020 at 1:30 AM Hans de Goede  
wrote:


Hi,

On 12/8/20 6:08 AM, Neftin, Sasha wrote:

On 12/7/2020 17:41, Limonciello, Mario wrote:

First of all thank you for working on this.

I must say though that I don't like the approach taken here very
much.

This is not so much a criticism of this series as it is a criticism
of the earlier decision to simply disable s0ix on all devices
with the i219-LM + and active ME.


I was not happy with that decision either as it did cause 
regressions

on all of the "named" Comet Lake laptops that were in the market at
the time.  The "unnamed" ones are not yet released, and I don't feel
it's fair to call it a regression on "unreleased" hardware.



AFAIK there was a perfectly acceptable patch to workaround those
broken devices, which increased a timeout:
https://patchwork.ozlabs.org/project/intel-wired-
lan/patch/20200323191639.48826-1-aaron...@canonical.com/

That patch was nacked because it increased the resume time
*on broken devices*.

Officially CSME/ME not POR for Linux and we haven't interface to 
the ME. Nobody can tell how long (and why) ME will hold PHY access 
semaphore ant just increasing the resuming time (ULP configure) 
won't be solve the problem. This is not reliable approach.

I would agree users can add ME system on their responsibilities.


It is not clear to me what you are trying to say here.


Based on the earlier thread you had referenced and his comment here it
sounds like while adding time will work for most cases, it doesn't
solve it for all cases.


AFAIK there are 0 documented cases where the suspend/resume issue
continues to be a problem after the timeout has been increased.

If you know of actual documented cases (rather then this just being
a theoretical problem), then please provide links to those cases.


If there are such notes I wouldn't have access to them. Do we know if
any sort of errata document has been posted for this issue by Intel?
That would be where an explanation of the problems and the reasoning
behind the workaround would be defined. Without that I am just
speculating based off of what has been said here and in the other
thread.


The problem is as a vendor you are usually
stuck looking for a solution that will work for all cases which can
lead to things like having to drop features because they can be
problematic for a few cases.


I disagree, there will/might always be some broken corner case
laptop-model / hw-design out there on which a feature breaks. Simply
disabling all features which might cause problems in "a few cases"
would mean that we pretty much have to disable over half the features
in the kernel.

Take for example SATA NCQ (command queing) this is know to not work
on some devices, up to the point of where with some buggy firmwares
it may cause full systems hangs and/or data-corruption. So this is
a much bigger problem then the "system won't suspend" issue we
are talking about here. Still the ATA subsys maintainers have enabled
this by default because it is an important feature to have and they
are using a deny-list to avoid enabling this on known broken hardware;
and yes every know and then we need to add a new model to the deny-list.

And the same for SATA ALPM support (a power-management feature like 
s0ix)

that is enabled by default too, combined with a deny-list.
I'm very familiar with the ALPM case since I pushed of it being
enabled by default and I've done most of the maintenance work
of the deny-list since it was enabled by default.

The kernel is full of this pattern, we don't disable an important
feature (and power-management is important) just because of this
causing issues in "a few cases". And again you say "a few cases"
but I know of 0 documented cases where this issue is still a problem
after bumping the timeout.


It all comes down to who owns the maintenance in those cases. That is
the heart of the issue.

Last I knew Intel was maintaining the e1000e driver. So the decision
to support this or not is up to them unless Dave or Jakub want to
override. Basically the maintenance cost has to be assumed by whoever
decides what route to go. I guess Intel for now is opting to require
an allow-list rather than a deny-list for that reason. That way
whoever adds a new device is on the hook to verify it works, rather
than them having to fix things after something breaks.

Are you saying that you insist on keeping the e1000e_check_me check 
and

thus needlessly penalizing 100s of laptops models with higher
power-consumption unless these 100s of laptops are added manually
to an allow list for this?

I'm sorry but that is simply unacceptable, the maintenance burden
of that is jus

Re: [PATCH v3 0/7] Improve s0ix flows for systems i219LM

2020-12-09 Thread Neftin, Sasha

On 12/10/2020 04:24, Alexander Duyck wrote:

On Wed, Dec 9, 2020 at 6:44 AM Hans de Goede  wrote:


Hi,

On 12/8/20 5:14 PM, Alexander Duyck wrote:

On Tue, Dec 8, 2020 at 1:30 AM Hans de Goede  wrote:


Hi,

On 12/8/20 6:08 AM, Neftin, Sasha wrote:

On 12/7/2020 17:41, Limonciello, Mario wrote:

First of all thank you for working on this.

I must say though that I don't like the approach taken here very
much.

This is not so much a criticism of this series as it is a criticism
of the earlier decision to simply disable s0ix on all devices
with the i219-LM + and active ME.


I was not happy with that decision either as it did cause regressions
on all of the "named" Comet Lake laptops that were in the market at
the time.  The "unnamed" ones are not yet released, and I don't feel
it's fair to call it a regression on "unreleased" hardware.



AFAIK there was a perfectly acceptable patch to workaround those
broken devices, which increased a timeout:
https://patchwork.ozlabs.org/project/intel-wired-
lan/patch/20200323191639.48826-1-aaron...@canonical.com/

That patch was nacked because it increased the resume time
*on broken devices*.


Officially CSME/ME not POR for Linux and we haven't interface to the ME. Nobody 
can tell how long (and why) ME will hold PHY access semaphore ant just 
increasing the resuming time (ULP configure) won't be solve the problem. This 
is not reliable approach.
I would agree users can add ME system on their responsibilities.


It is not clear to me what you are trying to say here.


Based on the earlier thread you had referenced and his comment here it
sounds like while adding time will work for most cases, it doesn't
solve it for all cases.


AFAIK there are 0 documented cases where the suspend/resume issue
continues to be a problem after the timeout has been increased.

If you know of actual documented cases (rather then this just being
a theoretical problem), then please provide links to those cases.


If there are such notes I wouldn't have access to them. Do we know if
any sort of errata document has been posted for this issue by Intel?
That would be where an explanation of the problems and the reasoning
behind the workaround would be defined. Without that I am just
speculating based off of what has been said here and in the other
thread.


The problem is as a vendor you are usually
stuck looking for a solution that will work for all cases which can
lead to things like having to drop features because they can be
problematic for a few cases.


I disagree, there will/might always be some broken corner case
laptop-model / hw-design out there on which a feature breaks. Simply
disabling all features which might cause problems in "a few cases"
would mean that we pretty much have to disable over half the features
in the kernel.

Take for example SATA NCQ (command queing) this is know to not work
on some devices, up to the point of where with some buggy firmwares
it may cause full systems hangs and/or data-corruption. So this is
a much bigger problem then the "system won't suspend" issue we
are talking about here. Still the ATA subsys maintainers have enabled
this by default because it is an important feature to have and they
are using a deny-list to avoid enabling this on known broken hardware;
and yes every know and then we need to add a new model to the deny-list.

And the same for SATA ALPM support (a power-management feature like s0ix)
that is enabled by default too, combined with a deny-list.
I'm very familiar with the ALPM case since I pushed of it being
enabled by default and I've done most of the maintenance work
of the deny-list since it was enabled by default.

The kernel is full of this pattern, we don't disable an important
feature (and power-management is important) just because of this
causing issues in "a few cases". And again you say "a few cases"
but I know of 0 documented cases where this issue is still a problem
after bumping the timeout.


It all comes down to who owns the maintenance in those cases. That is
the heart of the issue.

Last I knew Intel was maintaining the e1000e driver. So the decision
to support this or not is up to them unless Dave or Jakub want to
override. Basically the maintenance cost has to be assumed by whoever
decides what route to go. I guess Intel for now is opting to require
an allow-list rather than a deny-list for that reason. That way
whoever adds a new device is on the hook to verify it works, rather
than them having to fix things after something breaks.


Are you saying that you insist on keeping the e1000e_check_me check and
thus needlessly penalizing 100s of laptops models with higher
power-consumption unless these 100s of laptops are added manually
to an allow list for this?

I'm sorry but that is simply unacceptable, the maintenance burden
of that is just way too high.


Think about this the other way

Re: [PATCH v3 0/7] Improve s0ix flows for systems i219LM

2020-12-07 Thread Neftin, Sasha

On 12/7/2020 17:41, Limonciello, Mario wrote:

First of all thank you for working on this.

I must say though that I don't like the approach taken here very
much.

This is not so much a criticism of this series as it is a criticism
of the earlier decision to simply disable s0ix on all devices
with the i219-LM + and active ME.


I was not happy with that decision either as it did cause regressions
on all of the "named" Comet Lake laptops that were in the market at
the time.  The "unnamed" ones are not yet released, and I don't feel
it's fair to call it a regression on "unreleased" hardware.



AFAIK there was a perfectly acceptable patch to workaround those
broken devices, which increased a timeout:
https://patchwork.ozlabs.org/project/intel-wired-
lan/patch/20200323191639.48826-1-aaron...@canonical.com/

That patch was nacked because it increased the resume time
*on broken devices*.

Officially CSME/ME not POR for Linux and we haven't interfrace to the 
ME. Nobody can tell how long (and why) ME will hold PHY access semaphore 
ant just increasing the resuming time (ULP configure) won't be solve the 
problem. This is not reliable approach.

I would agree users can add ME system on their responsibilities.

So it seems to me that we have a simple choice here:

1. Longer resume time on devices with an improperly configured ME
2. Higher power-consumption on all non-buggy devices

Your patches 4-7 try to workaround 2. but IMHO those are just
bandaids for getting the initial priorities *very* wrong.


They were done based upon the discussion in that thread you linked and others.
If the owners of this driver feel it's possible/scalable to follow your proposal
I'm happy to resubmit a new v4 series with these sets of patches:

1) Fixup for the exception corner case referenced in this thread
2) Patch 1 from this series that fixes cable connected case
3) Increase the timeout (from your referenced link)
4) Revert the ME disallow list



Instead of penalizing non-buggy devices with a higher power-consumption,
we should default to penalizing the buggy devices with a higher
resume time. And if it is decided that the higher resume time is
a worse problem then the higher power-consumption, then there
should be a list of broken devices and s0ix can be disabled on those.


I'm perfectly happy either way, my primary goal is that Dell's notebooks and
desktops that meet the architectural and firmware guidelines for appropriate
low power consumption over s0ix are not penalized.



The current allow-list approach is simply never going to work well
leading to too high power-consumption on countless devices.
This is going to be an endless game of whack-a-mole and as
such really is a bad idea.


I envisioned that it would evolve over time.  For example if by the time Dell
finished shipping new CML models it was deemed that all the CML hardware was 
done
properly it could instead by an allow list of Dell + Comet Point.
If all of Tiger Lake are done properly 'maybe' by the time the ML ships maybe it
could be an allow list of Dell + CML or newer.

But even if the heuristic changed - this particular configuration needs to be 
tested
on every single new model.  All of the notebooks that have a Tested-By clause 
were
checked by Dell and Dell's partners.



A deny-list for broken devices is a much better approach, esp.
since missing devices on that list will still work fine, they
will just have a somewhat larger resume time.


I don't have configuration deemed buggy.  Since you were commenting in that 
other
thread with the patch from Aaaron It sounds like you do. Can you perhaps check 
if
that proposal actually works?



So what needs to happen IMHO is:

1. Merge your fix from patch 1 of this set
2. Merge "e1000e: bump up timeout to wait when ME un-configure ULP mode"
3. Drop the e1000e_check_me check.

Then we also do not need the new "s0ix-enabled" ethertool flag
because we do not need userspace to work-around us doing the
wrong thing by default.


If we collectively agree to keep either an allow list "or" disallow list at
all I think you need a way check whether enabling this feature works.

If we are making an assertion it will always work properly all the time, I agree
that there is no need for an ethtool flag.



Note a while ago I had access to one of the devices having suspend/resume
issues caused by the S0ix support (a Lenovo Thinkpad X1 Carbon gen 7)
and I can confirm that the "e1000e: bump up timeout to wait when ME
un-configure ULP mode" patch fixes the suspend/resume problem without
any noticeable negative side-effects.



Can you or someone else with this model please check with a current kernel
w/ reverting ME check and adding the patch from Vitaly (included as patch 1
in my series)?


Regards,

Hans











Changes from v2 to v3:
  - Correct some grammar and spelling issues caught by Bjorn H.
* s/s0ix/S0ix/ in all commit messages
* Fix a typo in commit message
* Fix capitalization of proper nouns
  - Add more

Re: [Intel-wired-lan] [PATCH] igc: Report speed and duplex as unknown when device is runtime suspended

2020-12-03 Thread Neftin, Sasha

On 12/2/2020 09:50, Kai-Heng Feng wrote:

Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown
when device is runtime suspended"), if we try to read speed and duplex
sysfs while the device is runtime suspeneded, igc will complain and
stops working:

[  123.449883] igc :03:00.0 enp3s0: PCIe link lost, device now detached
[  123.450052] BUG: kernel NULL pointer dereference, address: 0008
[  123.450056] #PF: supervisor read access in kernel mode
[  123.450058] #PF: error_code(0x) - not-present page
[  123.450059] PGD 0 P4D 0
[  123.450064] Oops:  [#1] SMP NOPTI
[  123.450068] CPU: 0 PID: 2525 Comm: udevadm Tainted: G U  W  OE 
5.10.0-1002-oem #2+rkl2-Ubuntu
[  123.450078] RIP: 0010:igc_rd32+0x1c/0x90 [igc]
[  123.450080] Code: c0 5d c3 b8 fd ff ff ff c3 0f 1f 44 00 00 0f 1f 44 00 00 55 89 
f0 48 89 e5 41 56 41 55 41 54 49 89 c4 53 48 8b 57 08 48 01 d0 <44> 8b 28 41 83 
fd ff 74 0c 5b 44 89 e8 41 5c 41 5d 4

[  123.450083] RSP: 0018:b0d100d6fcc0 EFLAGS: 00010202
[  123.450085] RAX: 0008 RBX: b0d100d6fd30 RCX: 
[  123.450087] RDX:  RSI: 0008 RDI: 945a12716c10
[  123.450089] RBP: b0d100d6fce0 R08: 945a12716550 R09: 945a09874000
[  123.450090] R10:  R11:  R12: 0008
[  123.450092] R13: 945a12716000 R14: 945a037da280 R15: 945a037da290
[  123.450094] FS:  7f3b34c868c0() GS:945b8920() 
knlGS:
[  123.450096] CS:  0010 DS:  ES:  CR0: 80050033
[  123.450098] CR2: 0008 CR3: 0001144de006 CR4: 00770ef0
[  123.450100] PKRU: 5554
[  123.450101] Call Trace:
[  123.450111]  igc_ethtool_get_link_ksettings+0xd6/0x1b0 [igc]
[  123.450118]  __ethtool_get_link_ksettings+0x71/0xb0
[  123.450123]  duplex_show+0x74/0xc0
[  123.450129]  dev_attr_show+0x1d/0x40
[  123.450134]  sysfs_kf_seq_show+0xa1/0x100
[  123.450137]  kernfs_seq_show+0x27/0x30
[  123.450142]  seq_read+0xb7/0x400
[  123.450148]  ? common_file_perm+0x72/0x170
[  123.450151]  kernfs_fop_read+0x35/0x1b0
[  123.450155]  vfs_read+0xb5/0x1b0
[  123.450157]  ksys_read+0x67/0xe0
[  123.450160]  __x64_sys_read+0x1a/0x20
[  123.450164]  do_syscall_64+0x38/0x90
[  123.450168]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  123.450170] RIP: 0033:0x7f3b351fe142
[  123.450173] Code: c0 e9 c2 fe ff ff 50 48 8d 3d 3a ca 0a 00 e8 f5 19 02 00 0f 1f 
44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff 
ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[  123.450174] RSP: 002b:7fffef2ec138 EFLAGS: 0246 ORIG_RAX: 

[  123.450177] RAX: ffda RBX:  RCX: 7f3b351fe142
[  123.450179] RDX: 1001 RSI: 5644c047f070 RDI: 0003
[  123.450180] RBP: 7fffef2ec340 R08: 5644c047f070 R09: 7f3b352d9320
[  123.450182] R10: 5644c047c010 R11: 0246 R12: 5644c047cbf0
[  123.450184] R13: 5644c047e6d0 R14: 0003 R15: 7fffef2ec140
[  123.450189] Modules linked in: rfcomm ccm cmac algif_hash algif_skcipher 
af_alg bnep toshiba_acpi industrialio toshiba_haps hp_accel lis3lv02d btusb 
btrtl btbcm btintel bluetooth ecdh_generic ecc joydev input_leds nls_iso8859_1 
snd_sof_pci snd_sof_intel_byt snd_sof_intel_ipc snd_sof_intel_hda_common 
snd_soc_hdac_hda snd_hda_codec_hdmi snd_sof_xtensa_dsp snd_sof_intel_hda 
snd_sof snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi 
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel 
snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence 
snd_hda_codec snd_hda_core ath10k_pci snd_hwdep intel_rapl_msr 
intel_rapl_common ath10k_core soundwire_bus snd_soc_core x86_pkg_temp_thermal 
ath intel_powerclamp snd_compress ac97_bus snd_pcm_dmaengine mac80211 snd_pcm 
coretemp snd_seq_midi snd_seq_midi_event snd_rawmidi kvm_intel cfg80211 snd_seq 
snd_seq_device snd_timer mei_hdcp kvm libarc4 snd crct10dif_pclmul 
ghash_clmulni_intel aesni_intel
  mei_me dell_wmi
[  123.450266]  dell_smbios soundcore sparse_keymap dcdbas crypto_simd cryptd 
mei dell_uart_backlight glue_helper ee1004 wmi_bmof intel_wmi_thunderbolt 
dell_wmi_descriptor mac_hid efi_pstore acpi_pad acpi_tad intel_cstate 
sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs 
blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
dm_mirror dm_region_hash dm_log hid_generic usbhid hid i915 i2c_algo_bit 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec crc32_pclmul 
rc_core drm intel_lpss_pci i2c_i801 ahci igc intel_lpss i2c_smbus idma64 
xhci_pci libahci virt_dma xhci_pci_renesas wmi video pinctrl_tigerlake
[  123.450335] CR2: 0008
[  123.450338] ---[ end trace 9f731e38b53c35cc ]---

The more generic approach will be wrap get_link

Re: [Intel-wired-lan] [PATCH v4] e1000e: Increase polling timeout on MDIC ready bit

2020-09-29 Thread Neftin, Sasha

Hello Kai-Heng,
On 9/29/2020 16:31, Kai-Heng Feng wrote:

Hi Sasha,


On Sep 29, 2020, at 21:08, Neftin, Sasha  wrote:

On 9/28/2020 11:36, Kai-Heng Feng wrote:

We are seeing the following error after S3 resume:
[  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
[  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.943155] e1000e :00:1f.6 eno1: MDI Error
...
[  705.108161] e1000e :00:1f.6 eno1: Hardware Error
As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
increase polling iteration can resolve the issue.
This patch only papers over the symptom, as we don't really know the
root cause of the issue. The most possible culprit is Intel ME, which
may do its own things that conflict with software.
Signed-off-by: Kai-Heng Feng 
---
v4:
  - States that this patch just papers over the symptom.
v3:
  - Moving delay to end of loop doesn't save anytime, move it back.
  - Point out this is quitely likely caused by Intel ME.
v2:
  - Increase polling iteration instead of powering down the phy.
  drivers/net/ethernet/intel/e1000e/phy.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/e1000e/phy.c 
b/drivers/net/ethernet/intel/e1000e/phy.c
index e11c877595fb..e6d4acd90937 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -203,7 +203,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 
offset, u16 data)
 * Increasing the time out as testing showed failures with
 * the lower time out
 */
-   for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
+   for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 10); i++) {

As we discussed (many threads) - AMT/ME systems not supported on Linux as 
properly. I do not think increasing polling iteration will solve the problem. 
Rather mask it.


I am aware of the status quo of no proper support on Intel ME.


I prefer you check option to disable ME vi BIOS on your system.


We can't ask user to change the BIOS to accommodate Linux. So before a proper 
solution comes out, masking the problem is good enough for me.
Until then, I'll carry it as a downstream distro patch.
What will you do with system that even after increasing polling time 
will run into HW error?


Kai-Heng


udelay(50);
mdic = er32(MDIC);
if (mdic & E1000_MDIC_READY)

Thanks,
Sasha



Thanks,
Sasha


Re: [Intel-wired-lan] [PATCH v4] e1000e: Increase polling timeout on MDIC ready bit

2020-09-29 Thread Neftin, Sasha

On 9/28/2020 11:36, Kai-Heng Feng wrote:

We are seeing the following error after S3 resume:
[  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
[  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.943155] e1000e :00:1f.6 eno1: MDI Error
...
[  705.108161] e1000e :00:1f.6 eno1: Hardware Error

As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
increase polling iteration can resolve the issue.

This patch only papers over the symptom, as we don't really know the
root cause of the issue. The most possible culprit is Intel ME, which
may do its own things that conflict with software.

Signed-off-by: Kai-Heng Feng 
---
v4:
  - States that this patch just papers over the symptom.

v3:
  - Moving delay to end of loop doesn't save anytime, move it back.
  - Point out this is quitely likely caused by Intel ME.

v2:
  - Increase polling iteration instead of powering down the phy.

  drivers/net/ethernet/intel/e1000e/phy.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/phy.c 
b/drivers/net/ethernet/intel/e1000e/phy.c
index e11c877595fb..e6d4acd90937 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -203,7 +203,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 
offset, u16 data)
 * Increasing the time out as testing showed failures with
 * the lower time out
 */
-   for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
+   for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 10); i++) {
As we discussed (many threads) - AMT/ME systems not supported on Linux 
as properly. I do not think increasing polling iteration will solve the 
problem. Rather mask it.

I prefer you check option to disable ME vi BIOS on your system.

udelay(50);
mdic = er32(MDIC);
if (mdic & E1000_MDIC_READY)


Thanks,
Sasha


Re: [Intel-wired-lan] [PATCH] igc: Do not use link uninitialized in igc_check_for_copper_link

2020-07-18 Thread Neftin, Sasha

On 7/17/2020 05:12, Nathan Chancellor wrote:

On Thu, Jul 16, 2020 at 07:29:03PM +0300, Neftin, Sasha wrote:

On 7/16/2020 07:49, Nathan Chancellor wrote:

Clang warns:


Hello Nathan,
Thanks for tracking our code.Please, look at 
https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20200709073416.14126-1-sasha.nef...@intel.com/
- I hope this patch already address this Clang warns - please, let me know.


I have not explicitly tested it but it seems obvious that it will. Let's
go with that.

Good Nathan, let's go with my 
https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20200709073416.14126-1-sasha.nef...@intel.com/ 
and let me know if warning still generated.

Thanks,
Sasha

Cheers,
Nathan


drivers/net/ethernet/intel/igc/igc_mac.c:374:6: warning: variable 'link'
is used uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
  if (!mac->get_link_status) {
  ^
drivers/net/ethernet/intel/igc/igc_mac.c:424:33: note: uninitialized use
occurs here
  ret_val = igc_set_ltr_i225(hw, link);
 ^~~~
drivers/net/ethernet/intel/igc/igc_mac.c:374:2: note: remove the 'if' if
its condition is always false
  if (!mac->get_link_status) {
  ^~~~
drivers/net/ethernet/intel/igc/igc_mac.c:367:11: note: initialize the
variable 'link' to silence this warning
  bool link;
   ^
= 0
1 warning generated.

It is not wrong, link is only uninitialized after this through
igc_phy_has_link. Presumably, if we skip the majority of this function
when get_link_status is false, we should skip calling igc_set_ltr_i225
as well. Just directly return 0 in this case, rather than bothering with
adding another label or initializing link in the if statement.

Fixes: 707abf069548 ("igc: Add initial LTR support")
Link: https://github.com/ClangBuiltLinux/linux/issues/1095
Signed-off-by: Nathan Chancellor 
---
   drivers/net/ethernet/intel/igc/igc_mac.c | 6 ++
   1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c 
b/drivers/net/ethernet/intel/igc/igc_mac.c
index b47e7b0a6398..26e3c56a4a8b 100644
--- a/drivers/net/ethernet/intel/igc/igc_mac.c
+++ b/drivers/net/ethernet/intel/igc/igc_mac.c
@@ -371,10 +371,8 @@ s32 igc_check_for_copper_link(struct igc_hw *hw)
 * get_link_status flag is set upon receiving a Link Status
 * Change or Rx Sequence Error interrupt.
 */
-   if (!mac->get_link_status) {
-   ret_val = 0;
-   goto out;
-   }
+   if (!mac->get_link_status)
+   return 0;
/* First we want to see if the MII Status Register reports
 * link.  If so, then we want to get the current speed/duplex

base-commit: ca0e494af5edb59002665bf12871e94b4163a257


Thanks,
Sasha




Re: [Intel-wired-lan] [PATCH] igc: Do not use link uninitialized in igc_check_for_copper_link

2020-07-16 Thread Neftin, Sasha

On 7/16/2020 07:49, Nathan Chancellor wrote:

Clang warns:


Hello Nathan,
Thanks for tracking our code.Please, look at 
https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20200709073416.14126-1-sasha.nef...@intel.com/ 
- I hope this patch already address this Clang warns - please, let me know.

drivers/net/ethernet/intel/igc/igc_mac.c:374:6: warning: variable 'link'
is used uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
 if (!mac->get_link_status) {
 ^
drivers/net/ethernet/intel/igc/igc_mac.c:424:33: note: uninitialized use
occurs here
 ret_val = igc_set_ltr_i225(hw, link);
^~~~
drivers/net/ethernet/intel/igc/igc_mac.c:374:2: note: remove the 'if' if
its condition is always false
 if (!mac->get_link_status) {
 ^~~~
drivers/net/ethernet/intel/igc/igc_mac.c:367:11: note: initialize the
variable 'link' to silence this warning
 bool link;
  ^
   = 0
1 warning generated.

It is not wrong, link is only uninitialized after this through
igc_phy_has_link. Presumably, if we skip the majority of this function
when get_link_status is false, we should skip calling igc_set_ltr_i225
as well. Just directly return 0 in this case, rather than bothering with
adding another label or initializing link in the if statement.

Fixes: 707abf069548 ("igc: Add initial LTR support")
Link: https://github.com/ClangBuiltLinux/linux/issues/1095
Signed-off-by: Nathan Chancellor 
---
  drivers/net/ethernet/intel/igc/igc_mac.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c 
b/drivers/net/ethernet/intel/igc/igc_mac.c
index b47e7b0a6398..26e3c56a4a8b 100644
--- a/drivers/net/ethernet/intel/igc/igc_mac.c
+++ b/drivers/net/ethernet/intel/igc/igc_mac.c
@@ -371,10 +371,8 @@ s32 igc_check_for_copper_link(struct igc_hw *hw)
 * get_link_status flag is set upon receiving a Link Status
 * Change or Rx Sequence Error interrupt.
 */
-   if (!mac->get_link_status) {
-   ret_val = 0;
-   goto out;
-   }
+   if (!mac->get_link_status)
+   return 0;
  
  	/* First we want to see if the MII Status Register reports

 * link.  If so, then we want to get the current speed/duplex

base-commit: ca0e494af5edb59002665bf12871e94b4163a257


Thanks,
Sasha


Re: [Intel-wired-lan] [PATCH] e1000e: Squash an unused function warning

2020-06-10 Thread Neftin, Sasha

On 6/10/2020 04:49, Palmer Dabbelt wrote:

From: Palmer Dabbelt 

e1000e_check_me is only used under CONFIG_PM_SLEEP but exists
unconditionally, which triggers a warning.

Signed-off-by: Palmer Dabbelt 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index a279f4fa9962..f7148d1fcba2 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -134,6 +134,7 @@ static const struct e1000e_me_supported me_supported[] = {
{0}
  };
  
+#ifdef CONFIG_PM_SLEEP

Thanks Palmer for catching this warning,
can we use "__maybe_unused" declaration instead of wrapping? I think it 
is more convenient and consistent.

  static bool e1000e_check_me(u16 device_id)
  {
struct e1000e_me_supported *id;
@@ -145,6 +146,7 @@ static bool e1000e_check_me(u16 device_id)
  
  	return false;

  }
+#endif
  
  /**

   * __ew32_prepare - prepare to write to MAC CSR register on certain parts





Re: [net-next 4/7] e1000e: Add support for S0ix

2019-10-16 Thread Neftin, Sasha

On 10/17/2019 03:02, Jakub Kicinski wrote:

On Wed, 16 Oct 2019 16:47:08 -0700, Jeff Kirsher wrote:

  static int e1000e_pm_freeze(struct device *dev)
  {
struct net_device *netdev = dev_get_drvdata(dev);
@@ -6650,6 +6822,9 @@ static int e1000e_pm_thaw(struct device *dev)
  static int e1000e_pm_suspend(struct device *dev)
  {
struct pci_dev *pdev = to_pci_dev(dev);
+   struct net_device *netdev = pci_get_drvdata(to_pci_dev(dev));
+   struct e1000_adapter *adapter = netdev_priv(netdev);
+   struct e1000_hw *hw = &adapter->hw;
int rc;


reverse xmas tree?


What do you suggest, move the 'struct pci_dev *pdev' two lines below?
  
  	e1000e_flush_lpic(pdev);

@@ -6660,14 +6835,25 @@ static int e1000e_pm_suspend(struct device *dev)
if (rc)
e1000e_pm_thaw(dev);
  
+	/* Introduce S0ix implementation */

+   if (hw->mac.type >= e1000_pch_cnp)
+   e1000e_s0ix_entry_flow(adapter);


the entry/exit functions never fail, you can make them return void

Thanks Jakub for yor comment.This is only initial flow. We continue 
development S0ix support for next product and I consider add error code 
if it will required.

return rc;
  }
  
  static int e1000e_pm_resume(struct device *dev)

  {
struct pci_dev *pdev = to_pci_dev(dev);
+   struct net_device *netdev = pci_get_drvdata(to_pci_dev(dev));
+   struct e1000_adapter *adapter = netdev_priv(netdev);
+   struct e1000_hw *hw = &adapter->hw;
int rc;
  
+	/* Introduce S0ix implementation */

+   if (hw->mac.type >= e1000_pch_cnp)
+   e1000e_s0ix_exit_flow(adapter);
+
rc = __e1000_resume(pdev);
if (rc)
return rc;




Re: [Intel-wired-lan] [PATCH] e1000e: Work around hardware unit hang by disabling TSO

2019-05-22 Thread Neftin, Sasha

On 5/21/2019 18:42, Juliana Rodrigueiro wrote:

So I ask myself, how actually feasible is it to gamble the usage of "ethtool"
to turn on or off TSO every time the network configuration changes?

Hello Juliana,
There are many PCH2 devices with different SKU's.  Not all devices have 
this problem (Tx hand). We do not want to set disabling TSO as the 
default version. Let's keep this option for all other users.
Also, this is very old known HW bug - unfortunately we didn't fixed it. 
Our more new devices have not this problem.


Re: [Intel-wired-lan] [PATCH] e1000e: Work around hardware unit hang by disabling TSO

2019-05-14 Thread Neftin, Sasha

On 5/9/2019 13:34, Juliana Rodrigueiro wrote:

When forwarding traffic to a client behind NAT, some e1000e devices
become unstable, hanging and then being reset by the watchdog.

Output from syslog:

kernel: e1000e :00:19.0 eth0: Detected Hardware Unit Hang:
kernel:  TDH  <5f>
kernel:  TDT  <8d>
kernel:  next_to_use  <8d>
kernel:  next_to_clean<5c>
kernel: buffer_info[next_to_clean]:
kernel:  time_stamp   <6bd7b>
kernel:  next_to_watch<5f>
kernel:  jiffies  <6c180>
kernel:  next_to_watch.status <0>
kernel: MAC Status <40080083>
kernel: PHY Status <796d>
kernel: PHY 1000BASE-T Status  <7800>
kernel: PHY Extended Status<3000>
kernel: PCI Status <10>
kernel: e1000e :00:19.0 eth0: Reset adapter unexpectedly

This repeats several times and never recovers.

Disabling TCP segmentation offload (TSO) seems to be the only way to
work around this problem on the affected devices.

This issue was first reported in 14.01.2015:
https://marc.info/?l=linux-netdev&m=142124954120315

Signed-off-by: Juliana Rodrigueiro 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 8b11682ebba2..4781a45c1047 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6936,6 +6936,12 @@ static netdev_features_t e1000_fix_features(struct 
net_device *netdev,
if ((hw->mac.type >= e1000_pch2lan) && (netdev->mtu > ETH_DATA_LEN))
features &= ~NETIF_F_RXFCS;
  
+	if (adapter->pdev->device == E1000_DEV_ID_PCH2_LV_V) {

+   e_info("Disabling TSO on problematic device to avoid hardware unit 
hang.\n");
+   features &= ~NETIF_F_TSO;
+   features &= ~NETIF_F_TSO6;
+   }
+
/* Since there is no support for separate Rx/Tx vlan accel
 * enable/disable make sure Tx flag is always in same state as Rx.
 */

You are right, in some particular configurations e1000e devices stuck at 
Tx hang while TCP segmentation offload is on. But for all other users we 
should keep the TCP segmentation option is enabled as default. I suggest 
to use 'ethtool' command: ethtool -K  tso on/off to workaround 
Tx hang in your situation.

Thanks,
Sasha


Re: [PATCH] igc: remove unused igc_priv_flags_strings array

2019-03-07 Thread Neftin, Sasha

On 3/7/2019 09:29, David Miller wrote:

From: Arnd Bergmann 
Date: Thu,  7 Mar 2019 10:29:57 +0100


clang points out that the igc_priv_flags_strings[] array is never
referenced, aside from being used for calculating its length:

drivers/net/ethernet/intel/igc/igc_ethtool.c:9:19: error: variable 
'igc_priv_flags_strings' is not needed and will not
   be emitted [-Werror,-Wunneeded-internal-declaration]
static const char igc_priv_flags_strings[][ETH_GSTRING_LEN] = {

A similar array is present in several other intel ethernet drivers,
but all the others use it in their .get_strings() callback, which
igc does not implement (yet).

Probably it should be implemented, but as I have no way of testing
it, this does the simpler alternative of removing the array to
get rid of the warning.

Fixes: 8c5ad0dae93c ("igc: Add ethtool support")
Signed-off-by: Arnd Bergmann 


Jeff, I assume you will pick this up.

I've submitted igc: Add support for statistics commit 
7aa2547613233a232b9b2e1942d67da4997e6ad0. This patch in Jeff's Kirsher 
next-queue and used igc_priv_flags_strings.

Thanks,
Sasha


Re: [Intel-wired-lan] [PATCH net-next 3/4] e1000e: Fix -Wformat-truncation warnings

2019-02-25 Thread Neftin, Sasha

On 22/02/2019 6:09, Florian Fainelli wrote:

Provide precision hints to snprintf() since we know the destination
buffer size of the RX/TX ring names are IFNAMSIZ + 5 - 1. This fixes the
following warnings:

drivers/net/ethernet/intel/e1000e/netdev.c: In function
'e1000_request_msix':
drivers/net/ethernet/intel/e1000e/netdev.c:2109:13: warning: 'snprintf'
output may be truncated before the last format character
[-Wformat-truncation=]
  "%s-rx-0", netdev->name);
  ^
drivers/net/ethernet/intel/e1000e/netdev.c:2107:3: note: 'snprintf'
output between 6 and 21 bytes into a destination of size 20
snprintf(adapter->rx_ring->name,
^~~~
  sizeof(adapter->rx_ring->name) - 1,
  ~~~
  "%s-rx-0", netdev->name);
  
drivers/net/ethernet/intel/e1000e/netdev.c:2125:13: warning: 'snprintf'
output may be truncated before the last format character
[-Wformat-truncation=]
  "%s-tx-0", netdev->name);
  ^
drivers/net/ethernet/intel/e1000e/netdev.c:2123:3: note: 'snprintf'
output between 6 and 21 bytes into a destination of size 20
snprintf(adapter->tx_ring->name,
^~~~
  sizeof(adapter->tx_ring->name) - 1,
  ~~~
  "%s-tx-0", netdev->name);
  

Signed-off-by: Florian Fainelli 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 736fa51878f8..7acc61e4f645 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -2106,7 +2106,7 @@ static int e1000_request_msix(struct e1000_adapter 
*adapter)
if (strlen(netdev->name) < (IFNAMSIZ - 5))
snprintf(adapter->rx_ring->name,
 sizeof(adapter->rx_ring->name) - 1,
-"%s-rx-0", netdev->name);
+"%.14s-rx-0", netdev->name);
else
memcpy(adapter->rx_ring->name, netdev->name, IFNAMSIZ);
err = request_irq(adapter->msix_entries[vector].vector,
@@ -2122,7 +2122,7 @@ static int e1000_request_msix(struct e1000_adapter 
*adapter)
if (strlen(netdev->name) < (IFNAMSIZ - 5))
snprintf(adapter->tx_ring->name,
 sizeof(adapter->tx_ring->name) - 1,
-"%s-tx-0", netdev->name);
+"%.14s-tx-0", netdev->name);
else
memcpy(adapter->tx_ring->name, netdev->name, IFNAMSIZ);
err = request_irq(adapter->msix_entries[vector].vector,


Acked-by: Sasha Neftin 


Re: [Intel-wired-lan] [PATCH net-next] igc: Make function igc_write_rss_indir_tbl() static

2019-02-17 Thread Neftin, Sasha

On 2/17/2019 12:06, Neftin, Sasha wrote:

On 2/16/2019 04:04, Wei Yongjun wrote:

Fixes the following sparse warning:

drivers/net/ethernet/intel/igc/igc_ethtool.c:646:6: warning:
  symbol 'igc_write_rss_indir_tbl' was not declared. Should it be static?

Fixes: 8c5ad0dae93c ("igc: Add ethtool support")
Signed-off-by: Wei Yongjun 
---
  drivers/net/ethernet/intel/igc/igc_ethtool.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c 
b/drivers/net/ethernet/intel/igc/igc_ethtool.c

index eff37a6c0afa..544239422577 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -643,7 +643,7 @@ static int igc_set_coalesce(struct net_device 
*netdev,

  return 0;
  }
-void igc_write_rss_indir_tbl(struct igc_adapter *adapter)
+static void igc_write_rss_indir_tbl(struct igc_adapter *adapter)
  {
  struct igc_hw *hw = &adapter->hw;
  u32 reg = IGC_RETA(0);


NACK
'igc_write_rss_indir_tbl' method declared in igc.h file. This method 
used in both igc_ethtool.c and igc_main.c and can't be a 'static'

__
please, refer to commit 429abdf0cd3c... (Add multiple receive queues 
control supporting)



_

Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan




Re: [PATCH net-next] igc: Make function igc_write_rss_indir_tbl() static

2019-02-17 Thread Neftin, Sasha

On 2/16/2019 04:04, Wei Yongjun wrote:

Fixes the following sparse warning:

drivers/net/ethernet/intel/igc/igc_ethtool.c:646:6: warning:
  symbol 'igc_write_rss_indir_tbl' was not declared. Should it be static?

Fixes: 8c5ad0dae93c ("igc: Add ethtool support")
Signed-off-by: Wei Yongjun 
---
  drivers/net/ethernet/intel/igc/igc_ethtool.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c 
b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index eff37a6c0afa..544239422577 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -643,7 +643,7 @@ static int igc_set_coalesce(struct net_device *netdev,
return 0;
  }
  
-void igc_write_rss_indir_tbl(struct igc_adapter *adapter)

+static void igc_write_rss_indir_tbl(struct igc_adapter *adapter)
  {
struct igc_hw *hw = &adapter->hw;
u32 reg = IGC_RETA(0);


NACK
'igc_write_rss_indir_tbl' method declared in igc.h file. This method 
used in both igc_ethtool.c and igc_main.c and can't be a 'static'


Re: [Intel-wired-lan] [PATCH] e1000e: fix cyclic resets at link up with active tx

2019-01-16 Thread Neftin, Sasha

On 1/14/2019 15:29, Konstantin Khlebnikov wrote:

I'm seeing series of e1000e resets (sometimes endless) at system boot
if something generates tx traffic at this time. In my case this is
netconsole who sends message "e1000e :02:00.0: Some CPU C-states
have been disabled in order to enable jumbo frames" from e1000e itself.
As result e1000_watchdog_task sees used tx buffer while carrier is off
and start this reset cycle again.

[   17.794359] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
None
[   17.794714] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[   22.936455] e1000e :02:00.0 eth1: changing MTU from 1500 to 9000
[   23.06] e1000e :02:00.0: Some CPU C-states have been disabled in 
order to enable jumbo frames
[   26.102364] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
None
[   27.174495] 8021q: 802.1Q VLAN Support v1.8
[   27.174513] 8021q: adding VLAN 0 to HW filter on device eth1
[   30.671724] cgroup: cgroup: disabling cgroup2 socket matching due to 
net_prio or net_cls activation
[   30.898564] netpoll: netconsole: local port 
[   30.898566] netpoll: netconsole: local IPv6 address 
2a02:6b8:0:80b:beae:c5ff:fe28:23f8
[   30.898567] netpoll: netconsole: interface 'eth1'
[   30.898568] netpoll: netconsole: remote port 
[   30.898568] netpoll: netconsole: remote IPv6 address 
2a02:6b8:b000:605c:e61d:2dff:fe03:3790
[   30.898569] netpoll: netconsole: remote ethernet address b0:a8:6e:f4:ff:c0
[   30.917747] console [netcon0] enabled
[   30.917749] netconsole: network logging started
[   31.453353] e1000e :02:00.0: Some CPU C-states have been disabled in 
order to enable jumbo frames
[   34.185730] e1000e :02:00.0: Some CPU C-states have been disabled in 
order to enable jumbo frames
[   34.321840] e1000e :02:00.0: Some CPU C-states have been disabled in 
order to enable jumbo frames
[   34.465822] e1000e :02:00.0: Some CPU C-states have been disabled in 
order to enable jumbo frames
[   34.597423] e1000e :02:00.0: Some CPU C-states have been disabled in 
order to enable jumbo frames
[   34.745417] e1000e :02:00.0: Some CPU C-states have been disabled in 
order to enable jumbo frames
[   34.877356] e1000e :02:00.0: Some CPU C-states have been disabled in 
order to enable jumbo frames
[   35.005441] e1000e :02:00.0: Some CPU C-states have been disabled in 
order to enable jumbo frames
[   35.157376] e1000e :02:00.0: Some CPU C-states have been disabled in 
order to enable jumbo frames
[   35.289362] e1000e :02:00.0: Some CPU C-states have been disabled in 
order to enable jumbo frames
[   35.417441] e1000e :02:00.0: Some CPU C-states have been disabled in 
order to enable jumbo frames
[   37.790342] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
None

This patch flushes tx buffers only once when carrier is off
rather than at each watchdog iteration.

Signed-off-by: Konstantin Khlebnikov 
---
  drivers/net/ethernet/intel/e1000e/netdev.c |   15 ++-
  1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 189f231075c2..d10083beec83 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5309,8 +5309,13 @@ static void e1000_watchdog_task(struct work_struct *work)
/* 8000ES2LAN requires a Rx packet buffer work-around
 * on link down event; reset the controller to flush
 * the Rx packet buffer.
+*
+* If the link is lost the controller stops DMA, but
+* if there is queued Tx work it cannot be done.  So
+* reset the controller to flush the Tx packet buffers.
 */
-   if (adapter->flags & FLAG_RX_NEEDS_RESTART)
+   if ((adapter->flags & FLAG_RX_NEEDS_RESTART) ||
+   e1000_desc_unused(tx_ring) + 1 < tx_ring->count)
adapter->flags |= FLAG_RESTART_NOW;
else
pm_schedule_suspend(netdev->dev.parent,
@@ -5333,14 +5338,6 @@ static void e1000_watchdog_task(struct work_struct *work)
adapter->gotc_old = adapter->stats.gotc;
spin_unlock(&adapter->stats64_lock);
  
-	/* If the link is lost the controller stops DMA, but

-* if there is queued Tx work it cannot be done.  So
-* reset the controller to flush the Tx packet buffers.
-*/
-   if (!netif_carrier_ok(netdev) &&
-   (e1000_desc_unused(tx_ring) + 1 < tx_ring->count))
-   adapter->flags |= FLAG_RESTART_NOW;
-
/* If reset is necessary, do it outside of interrupt context. */
if (adapter->flags & FLAG_RESTART_NOW) {
schedule_work(&adapter->reset_t

Re: [net-next 12/12] igc: Clean up code

2018-11-08 Thread Neftin, Sasha

On 11/8/2018 13:00, Joe Perches wrote:

On Wed, 2018-11-07 at 14:48 -0800, Jeff Kirsher wrote:

From: Sasha Neftin 

Address few community comments.
Remove unused code, will be added per demand.
Remove blank lines and unneeded includes.

Signed-off-by: Sasha Neftin 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
  drivers/net/ethernet/intel/igc/igc.h  |  9 -
  drivers/net/ethernet/intel/igc/igc_main.c | 15 ---
  2 files changed, 24 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h 
b/drivers/net/ethernet/intel/igc/igc.h

[]

  #define IGC_ERR(args...) pr_err("igc: " args)


This is used just once and should probably be removed.

maybe:

---
  drivers/net/ethernet/intel/igc/igc_main.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c 
b/drivers/net/ethernet/intel/igc/igc_main.c
index 9d85707e8a81..b58542b20623 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -3551,7 +3551,7 @@ static int igc_probe(struct pci_dev *pdev,
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32));
if (err) {
-   IGC_ERR("Wrong DMA configuration, aborting\n");
+   dev_err(&pdev->dev, "igc: Wrong DMA configuration, 
aborting\n");
goto err_dma;
}
}



Good note, thanks. I will process another patch address this suggestion.


Re: [net-next 03/11] igc: Add netdev

2018-10-24 Thread Neftin, Sasha

On 10/18/2018 20:15, Jakub Kicinski wrote:

On Wed, 17 Oct 2018 15:23:14 -0700, Jeff Kirsher wrote:

+/**
+ * igc_ioctl - I/O control method
+ * @netdev: network interface device structure
+ * @ifreq: frequency


Is it? :)


Ah... Good catch. I will fix that and submit patch.

+ * @cmd: command
+ */
+static int igc_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
+{
+   switch (cmd) {
+   default:
+   return -EOPNOTSUPP;
+   }
+}


You don't seem to be adding anything to this function in the series.
Why add the stub?


Right. Still not in use. I will remove and add per demand.
Thanks for your comments.


Re: [PATCH][next] igc: fix error return handling from call to netif_set_real_num_tx_queues

2018-10-24 Thread Neftin, Sasha

On 10/19/2018 21:16, Colin King wrote:

From: Colin Ian King 

The call to netif_set_real_num_tx_queues is not assigning the error
return to variable err even though the next line checks err for an
error.  Fix this by adding the missing err assignment.

Detected by CoverityScan, CID#1474551 ("Logically dead code")

Fixes: 3df25e4c1e66 ("igc: Add interrupt support")
Signed-off-by: Colin Ian King 
---
  drivers/net/ethernet/intel/igc/igc_main.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c 
b/drivers/net/ethernet/intel/igc/igc_main.c
index 9d85707e8a81..80ddbd987764 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -3358,7 +3358,7 @@ static int __igc_open(struct net_device *netdev, bool 
resuming)
goto err_req_irq;
  
  	/* Notify the stack of the actual queue counts. */

-   netif_set_real_num_tx_queues(netdev, adapter->num_tx_queues);
+   err = netif_set_real_num_tx_queues(netdev, adapter->num_tx_queues);
if (err)
goto err_set_queues;
  


Thanks for the patch. Good catch of my typo.

Acked-by: Sasha Neftin 


Re: [PATCH net-next] igc: Remove set but not used variables 'ctrl_ext, link_mode'

2018-10-24 Thread Neftin, Sasha

On 10/19/2018 15:40, YueHaibing wrote:

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/intel/igc/igc_base.c: In function 
'igc_init_phy_params_base':
drivers/net/ethernet/intel/igc/igc_base.c:240:6: warning:
  variable 'ctrl_ext' set but not used [-Wunused-but-set-variable]
   u32 ctrl_ext;
   
drivers/net/ethernet/intel/igc/igc_base.c: In function 'igc_get_invariants_base':

drivers/net/ethernet/intel/igc/igc_base.c:290:6: warning:
  variable 'link_mode' set but not used [-Wunused-but-set-variable]
   u32 link_mode = 0;

It never used since introduction in
commit c0071c7aa5fe ("igc: Add HW initialization code")

Signed-off-by: YueHaibing 
---
I'm not sure that reading IGC_CTRL_EXT is necessary.
---
  drivers/net/ethernet/intel/igc/igc_base.c | 8 
  1 file changed, 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_base.c 
b/drivers/net/ethernet/intel/igc/igc_base.c
index 832da609..df40af7 100644
--- a/drivers/net/ethernet/intel/igc/igc_base.c
+++ b/drivers/net/ethernet/intel/igc/igc_base.c
@@ -237,7 +237,6 @@ static s32 igc_init_phy_params_base(struct igc_hw *hw)
  {
struct igc_phy_info *phy = &hw->phy;
s32 ret_val = 0;
-   u32 ctrl_ext;
  
  	if (hw->phy.media_type != igc_media_type_copper) {

phy->type = igc_phy_none;
@@ -247,8 +246,6 @@ static s32 igc_init_phy_params_base(struct igc_hw *hw)
phy->autoneg_mask= AUTONEG_ADVERTISE_SPEED_DEFAULT_2500;
phy->reset_delay_us  = 100;
  
-	ctrl_ext = rd32(IGC_CTRL_EXT);

-
/* set lan id */
hw->bus.func = (rd32(IGC_STATUS) & IGC_STATUS_FUNC_MASK) >>
IGC_STATUS_FUNC_SHIFT;
@@ -287,8 +284,6 @@ static s32 igc_init_phy_params_base(struct igc_hw *hw)
  static s32 igc_get_invariants_base(struct igc_hw *hw)
  {
struct igc_mac_info *mac = &hw->mac;
-   u32 link_mode = 0;
-   u32 ctrl_ext = 0;
s32 ret_val = 0;
  
  	switch (hw->device_id) {

@@ -302,9 +297,6 @@ static s32 igc_get_invariants_base(struct igc_hw *hw)
  
  	hw->phy.media_type = igc_media_type_copper;
  
-	ctrl_ext = rd32(IGC_CTRL_EXT);

-   link_mode = ctrl_ext & IGC_CTRL_EXT_LINK_MODE_MASK;
-
/* mac initialization and operations */
ret_val = igc_init_mac_params_base(hw);
if (ret_val)


Thanks for the patch. Good.

Acked-by: Sasha Neftin 


Re: [PATCH net-next] igc: Remove set but not used variable 'pci_using_dac'

2018-10-24 Thread Neftin, Sasha

On 10/19/2018 15:48, YueHaibing wrote:

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/intel/igc/igc_main.c: In function 'igc_probe':
drivers/net/ethernet/intel/igc/igc_main.c:3535:11: warning:
  variable 'pci_using_dac' set but not used [-Wunused-but-set-variable]

It never used since introduction in commit
d89f88419f99 ("igc: Add skeletal frame for Intel(R) 2.5G Ethernet Controller 
support")

Signed-off-by: YueHaibing 
---
  drivers/net/ethernet/intel/igc/igc_main.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c 
b/drivers/net/ethernet/intel/igc/igc_main.c
index 9d85707..06a4afbe 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -3532,19 +3532,16 @@ static int igc_probe(struct pci_dev *pdev,
struct net_device *netdev;
struct igc_hw *hw;
const struct igc_info *ei = igc_info_tbl[ent->driver_data];
-   int err, pci_using_dac;
+   int err;
  
  	err = pci_enable_device_mem(pdev);

if (err)
return err;
  
-	pci_using_dac = 0;

err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err) {
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(64));
-   if (!err)
-   pci_using_dac = 1;
} else {
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (err) {


Thanks for the patch. Good.

Acked-by: Sasha Neftin 



Re: [net-next 01/11] igc: Add skeletal frame for Intel(R) 2.5G Ethernet Controller support

2018-10-24 Thread Neftin, Sasha

On 10/18/2018 20:14, Jakub Kicinski wrote:

On Wed, 17 Oct 2018 15:23:12 -0700, Jeff Kirsher wrote:

From: Sasha Neftin 

This patch adds the beginning framework onto which I am going to add
the igc driver which supports the Intel(R) I225-LM/I225-V 2.5G
Ethernet Controller.

Signed-off-by: Sasha Neftin 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 


bunch of minor nit picks


diff --git a/drivers/net/ethernet/intel/igc/igc.h 
b/drivers/net/ethernet/intel/igc/igc.h
new file mode 100644
index ..afe595cfcf63
--- /dev/null
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c)  2018 Intel Corporation */
+
+#ifndef _IGC_H_
+#define _IGC_H_
+
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+
+#define IGC_ERR(args...) pr_err("igc: " args)


Looks like you're reimplementing pr_fmt()

yes, it is convenient for a debug. Same methodological approach I saw in 
other drivers.

+#define PFX "igc: "
+
+#include 
+#include 
+#include 


Splitting the includes looks fairly weird.  Also, you're not using any
of them here.

Good catch. I'll remove splits and unused "include"'s. All "include"'s 
will be add per demand. I will send patch to fix that.

+/* main */
+extern char igc_driver_name[];
+extern char igc_driver_version[];
+
+#endif /* _IGC_H_ */
diff --git a/drivers/net/ethernet/intel/igc/igc_hw.h 
b/drivers/net/ethernet/intel/igc/igc_hw.h
new file mode 100644
index ..aa68b4516700
--- /dev/null
+++ b/drivers/net/ethernet/intel/igc/igc_hw.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c)  2018 Intel Corporation */
+
+#ifndef _IGC_HW_H_
+#define _IGC_HW_H_
+
+#define IGC_DEV_ID_I225_LM 0x15F2
+#define IGC_DEV_ID_I225_V  0x15F3
+
+#endif /* _IGC_HW_H_ */
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c 
b/drivers/net/ethernet/intel/igc/igc_main.c
new file mode 100644
index ..753749ce5ae0
--- /dev/null
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c)  2018 Intel Corporation */
+
+#include 
+#include 
+
+#include "igc.h"
+#include "igc_hw.h"
+
+#define DRV_VERSION"0.0.1-k"


You can just use the kernel version, it works pretty well in presence
of backports.


Good idea, I think I will do it too.

+#define DRV_SUMMARY"Intel(R) 2.5G Ethernet Linux Driver"
+
+MODULE_AUTHOR("Intel Corporation, ");
+MODULE_DESCRIPTION(DRV_SUMMARY);
+MODULE_LICENSE("GPL v2");
+MODULE_VERSION(DRV_VERSION);
+
+char igc_driver_name[] = "igc";
+char igc_driver_version[] = DRV_VERSION;
+static const char igc_driver_string[] = DRV_SUMMARY;
+static const char igc_copyright[] =
+   "Copyright(c) 2018 Intel Corporation.";
+
+static const struct pci_device_id igc_pci_tbl[] = {
+   { PCI_VDEVICE(INTEL, IGC_DEV_ID_I225_LM) },
+   { PCI_VDEVICE(INTEL, IGC_DEV_ID_I225_V) },
+   /* required last entry */
+   {0, }
+};
+
+MODULE_DEVICE_TABLE(pci, igc_pci_tbl);
+
+/**
+ * igc_probe - Device Initialization Routine
+ * @pdev: PCI device information struct
+ * @ent: entry in igc_pci_tbl
+ *
+ * Returns 0 on success, negative on failure
+ *
+ * igc_probe initializes an adapter identified by a pci_dev structure.
+ * The OS initialization, configuring the adapter private structure,
+ * and a hardware reset occur.
+ */
+static int igc_probe(struct pci_dev *pdev,
+const struct pci_device_id *ent)
+{
+   int err, pci_using_dac;
+
+   err = pci_enable_device_mem(pdev);
+   if (err)
+   return err;
+
+   pci_using_dac = 0;
+   err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
+   if (!err) {
+   err = dma_set_coherent_mask(&pdev->dev,
+   DMA_BIT_MASK(64));
+   if (!err)
+   pci_using_dac = 1;


You never use this pci_using_dac.  dma_set_mask_and_coherent() maybe?


Right. I saw already somebody sent out patch to fix that.

+   } else {
+   err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
+   if (err) {
+   err = dma_set_coherent_mask(&pdev->dev,
+   DMA_BIT_MASK(32));
+   if (err) {
+   IGC_ERR("Wrong DMA configuration, aborting\n");
+   goto err_dma;
+   }
+   }
+   }
+
+   err = pci_request_selected_regions(pdev,
+  pci_select_bars(pdev,
+  IORESOURCE_MEM),
+  igc_driver_name);
+   if (err)
+   goto err_pci_reg;
+
+   pci_set_master(pdev);
+   err = pci_save_state(pdev);


And you never look at err?


Same as above.

+   return 0;
+
+err_pci_reg:
+err_dma:


The err

Re: [Intel-wired-lan] e1000e driver stuck at 10Mbps after reconnection

2018-08-08 Thread Neftin, Sasha

On 8/8/2018 17:24, Neftin, Sasha wrote:

On 8/7/2018 09:42, Camille Bordignon wrote:

Le lundi 06 août 2018 à 15:45:29 (-0700), Alexander Duyck a écrit :

On Mon, Aug 6, 2018 at 4:59 AM, Camille Bordignon
 wrote:

Hello,

Recently we experienced some issues with intel NIC (I219-LM and 
I219-V).

It seems that after a wire reconnection, auto-negotation "fails" and
link speed drips to 10 Mbps.

 From kernel logs:
[17616.346150] e1000e: enp0s31f6 NIC Link is Down
[17627.003322] e1000e: enp0s31f6 NIC Link is Up 10 Mbps Full Duplex, 
Flow Control: None
[17627.003325] e1000e :00:1f.6 enp0s31f6: 10/100 speed: 
disabling TSO



$ethtool enp0s31f6
Settings for enp0s31f6:
 Supported ports: [ TP ]
 Supported link modes:   10baseT/Half 10baseT/Full
 100baseT/Half 100baseT/Full
 1000baseT/Full
 Supported pause frame use: No
 Supports auto-negotiation: Yes
 Supported FEC modes: Not reported
 Advertised link modes:  10baseT/Half 10baseT/Full
 100baseT/Half 100baseT/Full
 1000baseT/Full
 Advertised pause frame use: No
 Advertised auto-negotiation: Yes
 Advertised FEC modes: Not reported
 Speed: 10Mb/s
 Duplex: Full
 Port: Twisted Pair
 PHYAD: 1
 Transceiver: internal
 Auto-negotiation: on
 MDI-X: on (auto)
 Supports Wake-on: pumbg
 Wake-on: g
 Current message level: 0x0007 (7)
    drv probe link
 Link detected: yes


Notice that if disconnection last less than about 5 seconds,
nothing wrong happens.
And if after last failure, disconnection / connection occurs again and
last less than 5 seconds, link speed is back to 1000 Mbps.

[18075.350678] e1000e: enp0s31f6 NIC Link is Down
[18078.716245] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: None


The following patch seems to fix this issue.
However I don't clearly understand why.

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c

index 3ba0c90e7055..763c013960f1 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5069,7 +5069,7 @@ static bool e1000e_has_link(struct 
e1000_adapter *adapter)

 case e1000_media_type_copper:
 if (hw->mac.get_link_status) {
 ret_val = hw->mac.ops.check_for_link(hw);
-   link_active = !hw->mac.get_link_status;
+   link_active = false;
 } else {
 link_active = true;
 }

Maybe this is related to watchdog task.

I've found out this fix by comparing with last commit that works fine :
commit 0b76aae741abb9d16d2c0e67f8b1e766576f897d.
However I don't know if this information is relevant.

Thank you.
Camille Bordignon


What kernel were you testing this on? I know there have been a number
of changes over the past few months in this area and it would be
useful to know exactly what code base you started out with and what
the latest version of the kernel is you have tested.

Looking over the code change the net effect of it should be to add a 2
second delay from the time the link has changed until you actually
check the speed/duplex configuration. It is possible we could be
seeing some sort of timing issue and adding the 2 second delay after
the link event is enough time for things to stabilize and detect the
link at 1000 instead of 10/100.

- Alex


We've found out this issue using Fedora 27 (4.17.11-100.fc27.x86_64).

Then I've tested wth a more recent version of the driver v4.18-rc7 but
behavior looks the same.

Thanks for you reply.

Camille Bordignon
___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

I've agree with Alex. Let's try add 2s delay after a link event. Please, 
let us know if it will solve your problem.
Also, I would like recommend try work with different link partner and 
see if you see same problem.

___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

Camille,
My apologies, I wrong understand Alex. Please, do not try add delay. 
Please, check if you see same problem with different link partners.

Thanks,
Sasha


Re: [Intel-wired-lan] e1000e driver stuck at 10Mbps after reconnection

2018-08-08 Thread Neftin, Sasha

On 8/7/2018 09:42, Camille Bordignon wrote:

Le lundi 06 août 2018 à 15:45:29 (-0700), Alexander Duyck a écrit :

On Mon, Aug 6, 2018 at 4:59 AM, Camille Bordignon
 wrote:

Hello,

Recently we experienced some issues with intel NIC (I219-LM and I219-V).
It seems that after a wire reconnection, auto-negotation "fails" and
link speed drips to 10 Mbps.

 From kernel logs:
[17616.346150] e1000e: enp0s31f6 NIC Link is Down
[17627.003322] e1000e: enp0s31f6 NIC Link is Up 10 Mbps Full Duplex, Flow 
Control: None
[17627.003325] e1000e :00:1f.6 enp0s31f6: 10/100 speed: disabling TSO


$ethtool enp0s31f6
Settings for enp0s31f6:
 Supported ports: [ TP ]
 Supported link modes:   10baseT/Half 10baseT/Full
 100baseT/Half 100baseT/Full
 1000baseT/Full
 Supported pause frame use: No
 Supports auto-negotiation: Yes
 Supported FEC modes: Not reported
 Advertised link modes:  10baseT/Half 10baseT/Full
 100baseT/Half 100baseT/Full
 1000baseT/Full
 Advertised pause frame use: No
 Advertised auto-negotiation: Yes
 Advertised FEC modes: Not reported
 Speed: 10Mb/s
 Duplex: Full
 Port: Twisted Pair
 PHYAD: 1
 Transceiver: internal
 Auto-negotiation: on
 MDI-X: on (auto)
 Supports Wake-on: pumbg
 Wake-on: g
 Current message level: 0x0007 (7)
drv probe link
 Link detected: yes


Notice that if disconnection last less than about 5 seconds,
nothing wrong happens.
And if after last failure, disconnection / connection occurs again and
last less than 5 seconds, link speed is back to 1000 Mbps.

[18075.350678] e1000e: enp0s31f6 NIC Link is Down
[18078.716245] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow 
Control: None

The following patch seems to fix this issue.
However I don't clearly understand why.

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 3ba0c90e7055..763c013960f1 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5069,7 +5069,7 @@ static bool e1000e_has_link(struct e1000_adapter *adapter)
 case e1000_media_type_copper:
 if (hw->mac.get_link_status) {
 ret_val = hw->mac.ops.check_for_link(hw);
-   link_active = !hw->mac.get_link_status;
+   link_active = false;
 } else {
 link_active = true;
 }

Maybe this is related to watchdog task.

I've found out this fix by comparing with last commit that works fine :
commit 0b76aae741abb9d16d2c0e67f8b1e766576f897d.
However I don't know if this information is relevant.

Thank you.
Camille Bordignon


What kernel were you testing this on? I know there have been a number
of changes over the past few months in this area and it would be
useful to know exactly what code base you started out with and what
the latest version of the kernel is you have tested.

Looking over the code change the net effect of it should be to add a 2
second delay from the time the link has changed until you actually
check the speed/duplex configuration. It is possible we could be
seeing some sort of timing issue and adding the 2 second delay after
the link event is enough time for things to stabilize and detect the
link at 1000 instead of 10/100.

- Alex


We've found out this issue using Fedora 27 (4.17.11-100.fc27.x86_64).

Then I've tested wth a more recent version of the driver v4.18-rc7 but
behavior looks the same.

Thanks for you reply.

Camille Bordignon
___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

I've agree with Alex. Let's try add 2s delay after a link event. Please, 
let us know if it will solve your problem.
Also, I would like recommend try work with different link partner and 
see if you see same problem.


Re: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes

2018-05-12 Thread Neftin, Sasha

On 5/10/2018 21:42, Keller, Jacob E wrote:

-Original Message-
From: Benjamin Poirier [mailto:bpoir...@suse.com]
Sent: Thursday, May 10, 2018 12:29 AM
To: Kirsher, Jeffrey T 
Cc: Keller, Jacob E ; Achim Mildenberger
; olouvig...@gmail.com;
jaya...@goubiq.com; ehabk...@redhat.com; postmodern.m...@gmail.com;
bart.vanass...@wdc.com; intel-wired-...@lists.osuosl.org;
netdev@vger.kernel.org; linux-ker...@vger.kernel.org
Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes

There have been multiple reports of crashes that look like
kernel: RIP: 0010:[] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel:  [] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel:  [] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel:  [] process_one_work+0x155/0x440
kernel:  [] worker_thread+0x116/0x4b0
kernel:  [] kthread+0xd2/0xf0
kernel:  [] ret_from_fork+0x3f/0x70

These can be traced back to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
leads to a null deref in timecounter_read().

Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
e1000e_get_base_timinca() in such a way that it can return -EINVAL for
e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.

Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
sometimes don't have the SYSCFI bit set. Retrying the read shortly after
finds the bit to be set. This was observed at boot (probe) but also link up
and link down.

Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
reads where SYSCFI=0. Therefore, remove this register read and
unconditionally set the clock parameters.

Reported-by: Achim Mildenberger 
Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
Fixes: 83129b37ef35 ("e1000e: fix systim issues")
Signed-off-by: Benjamin Poirier 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++-
  1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
b/drivers/net/ethernet/intel/e1000e/netdev.c
index ec4a9759a6f2..3afb1f3b6f91 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
*adapter, u32 *timinca)
}
break;
case e1000_pch_spt:
-   if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
-   /* Stable 24MHz frequency */
-   incperiod = INCPERIOD_24MHZ;
-   incvalue = INCVALUE_24MHZ;
-   shift = INCVALUE_SHIFT_24MHZ;
-   adapter->cc.shift = shift;
-   break;
-   }
-   return -EINVAL;
+   /* Stable 24MHz frequency */
+   incperiod = INCPERIOD_24MHZ;
+   incvalue = INCVALUE_24MHZ;
+   shift = INCVALUE_SHIFT_24MHZ;
+   adapter->cc.shift = shift;
+   break;
case e1000_pch_cnp:
if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
/* Stable 24MHz frequency */
--
2.16.3


Given testing showing that the clock operates fine regardless of the register 
read, I think this is probably fine. Normally I believe the register was used 
to check which frequency was in use, but it doesn't seem to serve that purpose 
here.

Thanks,
Jake
___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

I've checked our specification, looks only 24MHz used for this product. 
Hope no different platform with another clock support has been 
distributed. So, let's pick up this change.


Re: [Intel-wired-lan] [PATCH net-queue] e1000e: Fix check_for_link return value with autoneg off.

2018-02-20 Thread Neftin, Sasha

On 2/19/2018 22:12, Benjamin Poirier wrote:

When autoneg is off, the .check_for_link callback functions clear the
get_link_status flag and systematically return a "pseudo-error". This means
that the link is not detected as up until the next execution of the
e1000_watchdog_task() 2 seconds later.

Fixes: 19110cfbb34d ("e1000e: Separate signaling for link check/link up")
Signed-off-by: Benjamin Poirier 
---
  drivers/net/ethernet/intel/e1000e/ich8lan.c | 2 +-
  drivers/net/ethernet/intel/e1000e/mac.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c 
b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index 31277d3bb7dc..ff308b05d68c 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1602,7 +1602,7 @@ static s32 e1000_check_for_copper_link_ich8lan(struct 
e1000_hw *hw)
 * we have already determined whether we have link or not.
 */
if (!mac->autoneg)
-   return -E1000_ERR_CONFIG;
+   return 1;
  
  	/* Auto-Neg is enabled.  Auto Speed Detection takes care

 * of MAC speed/duplex configuration.  So we only need to
diff --git a/drivers/net/ethernet/intel/e1000e/mac.c 
b/drivers/net/ethernet/intel/e1000e/mac.c
index f457c5703d0c..db735644b312 100644
--- a/drivers/net/ethernet/intel/e1000e/mac.c
+++ b/drivers/net/ethernet/intel/e1000e/mac.c
@@ -450,7 +450,7 @@ s32 e1000e_check_for_copper_link(struct e1000_hw *hw)
 * we have already determined whether we have link or not.
 */
if (!mac->autoneg)
-   return -E1000_ERR_CONFIG;
+   return 1;
  
  	/* Auto-Neg is enabled.  Auto Speed Detection takes care

 * of MAC speed/duplex configuration.  So we only need to


Acked-by: Sasha Neftin 


Re: [Intel-wired-lan] [PATCH] e1000e: allocate ring descriptors with dma_zalloc_coherent

2018-02-04 Thread Neftin, Sasha

On 1/26/2018 12:24, Pierre-Yves Kerbrat wrote:

Descriptor rings were not initialized at zero when allocated
When area contained garbage data, it caused skb_over_panic in
e1000_clean_rx_irq (if data had E1000_RXD_STAT_DD bit set)

This patch makes use of dma_zalloc_coherent to make sure the
ring is memset at 0 to prevent the area from containing garbage.

Following is the signature of the panic:
IODDR0@0.0: skbuff: skb_over_panic: text:80407b20 len:64010 put:64010 
head:ab46d800 data:ab46d842 tail:0xab47d24c end:0xab46df40 dev:eth0
IODDR0@0.0: BUG: failure at net/core/skbuff.c:105/skb_panic()!
IODDR0@0.0: Kernel panic - not syncing: BUG!
IODDR0@0.0:
IODDR0@0.0: Process swapper/0 (pid: 0, threadinfo=81728000, task=8173cc00 ,cpu: 
0)
IODDR0@0.0: SP = <815a1c0c>
IODDR0@0.0: Stack:  0001
IODDR0@0.0: b2d89800 815e33ac
IODDR0@0.0: ea73c040 0001
IODDR0@0.0: 60040003 fa0a
IODDR0@0.0: 0002
IODDR0@0.0:
IODDR0@0.0: 804540c0 815a1c70
IODDR0@0.0: b2744000 602ac070
IODDR0@0.0: 815a1c44 b2d89800
IODDR0@0.0: 8173cc00 815a1c08
IODDR0@0.0:
IODDR0@0.0: 0006
IODDR0@0.0: 815a1b50 
IODDR0@0.0: 80079434 0001
IODDR0@0.0: ab46df40 b2744000
IODDR0@0.0: b2d89800
IODDR0@0.0:
IODDR0@0.0: fa0a 8045745c
IODDR0@0.0: 815a1c88 fa0a
IODDR0@0.0: 80407b20 b2789f80
IODDR0@0.0: 0005 80407b20
IODDR0@0.0:
IODDR0@0.0:
IODDR0@0.0: Call Trace:
IODDR0@0.0: [<804540bc>] skb_panic+0xa4/0xa8
IODDR0@0.0: [<80079430>] console_unlock+0x2f8/0x6d0
IODDR0@0.0: [<80457458>] skb_put+0xa0/0xc0
IODDR0@0.0: [<80407b1c>] e1000_clean_rx_irq+0x2dc/0x3e8
IODDR0@0.0: [<80407b1c>] e1000_clean_rx_irq+0x2dc/0x3e8
IODDR0@0.0: [<804079c8>] e1000_clean_rx_irq+0x188/0x3e8
IODDR0@0.0: [<80407b1c>] e1000_clean_rx_irq+0x2dc/0x3e8
IODDR0@0.0: [<80468b48>] __dev_kfree_skb_any+0x88/0xa8
IODDR0@0.0: [<804101ac>] e1000e_poll+0x94/0x288
IODDR0@0.0: [<8046e9d4>] net_rx_action+0x19c/0x4e8
IODDR0@0.0:   ...
IODDR0@0.0: Maximum depth to print reached. Use kstack= 
To specify a custom value (where 0 means to display the full backtrace)
IODDR0@0.0: ---[ end Kernel panic - not syncing: BUG!

Signed-off-by: Pierre-Yves Kerbrat 
Signed-off-by: Marius Gligor 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 1298b69..26121ed 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -2333,7 +2333,7 @@ static int e1000_alloc_ring_dma(struct e1000_adapter 
*adapter,
  {
struct pci_dev *pdev = adapter->pdev;
  
-	ring->desc = dma_alloc_coherent(&pdev->dev, ring->size, &ring->dma,

+   ring->desc = dma_zalloc_coherent(&pdev->dev, ring->size, &ring->dma,
GFP_KERNEL);
if (!ring->desc)
return -ENOMEM;


Looks good. Prefer get another opinion from somebody else.


Re: e1000e hardware unit hangs

2018-01-25 Thread Neftin, Sasha

On 1/24/2018 20:41, Ben Greear wrote:

On 01/24/2018 10:38 AM, Denys Fedoryshchenko wrote:

On 2018-01-24 20:31, Ben Greear wrote:

On 01/24/2018 08:34 AM, Neftin, Sasha wrote:

On 1/24/2018 18:11, Alexander Duyck wrote:
On Tue, Jan 23, 2018 at 3:46 PM, Ben Greear 
 wrote:

Hello,

Anyone have any more suggestions for making e1000e work better?  
This is

from a 4.9.65+ kernel,
with these additional e1000e patches applied:

e1000e: Fix error path in link detection
e1000e: Fix wrong comment related to link detection
e1000e: Fix return value test
e1000e: Separate signaling for link check/link up
e1000e: Avoid receiver overrun interrupt bursts


Most of these patches shouldn't address anything that would trigger Tx
hangs. They are mostly related to just link detection.

Test case is simply to run 3 tcp connections each trying to 
send 56Kbps

of bi-directional
data between a pair of e1000e interfaces :)

No OOM related issues are seen on this kernel...similar test on 
4.13 showed

some OOM
issues, but I have not debugged that yet...


Really a question like this probably belongs on e1000-devel or
intel-wired-lan so I have added those lists and the e1000e maintainer
to the thread.

It would be useful if you could provide more information about the
device itself such as the ID and the kind of test you are running.
Keep in mind the e1000e driver supports a pretty broad swath of
devices so we need to narrow things down a bit.


please, also re-check if your kernel include:
e1000e: fix buffer overrun while the I219 is processing DMA 
transactions

e1000e: fix the use of magic numbers for buffer overrun issue
where you take fresh version of kernel?


Hello,

I tried adding those two patches, but I still see this splat shortly
after starting
my test.  The kernel I am using is here:

https://github.com/greearb/linux-ct-4.13

I've seen similar issues at least back to the 4.0 kernel, including
stock kernels and my
own kernels with additional patches.

Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG:
eth2 (e1000e): transmit queue 0 timed out, trans_start: 4295298499,
wd-timeout: 5000 jiffies: 4295304192 tx-queues: 1
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: [ cut
here ]
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: WARNING: CPU: 0
PID: 0 at
/home/greearb/git/linux-4.13.dev.y/net/sched/sch_generic.c:322
dev_watchdog+0x228/0x250
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: Modules linked in:
nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c
cfg80211 macvlan wanlink(O) pktgen bnep bluetooth f...ss tpm_tis ip
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: CPU: 0 PID: 0
Comm: swapper/0 Tainted: G   O    4.13.16+ #22
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: Hardware name:
Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 09/17/2012
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: task:
81e104c0 task.stack: 81e0
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RIP:
0010:dev_watchdog+0x228/0x250
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RSP:
0018:88042fc03e50 EFLAGS: 00010282
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RAX:
0086 RBX:  RCX: 
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RDX:
88042fc15b40 RSI: 88042fc0dbf8 RDI: 88042fc0dbf8
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: RBP:
88042fc03e98 R08: 0001 R09: 03c4
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: R10:
 R11: 03c4 R12: 1388
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: R13:
000100050dc3 R14: 88041767 R15: 000100052400
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: FS:
() GS:88042fc0()
knlGS:
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: CS:  0010 DS: 
ES:  CR0: 80050033
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: CR2:
01d14000 CR3: 01e09000 CR4: 001406f0
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel: Call Trace:
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  ? 
qdisc_rcu_free+0x40/0x40
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  
call_timer_fn+0x30/0x160
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  ? 
qdisc_rcu_free+0x40/0x40

Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:
run_timer_softirq+0x1f0/0x450
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  ?
lapic_next_deadline+0x21/0x30
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  ?
clockevents_program_event+0x78/0xf0
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  
__do_softirq+0xc1/0x2c0

Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:  irq_exit+0xb1/0xc0
Jan 24 10:19:42 lf1003-e3v2-13100124-f20x64 kernel:
smp_apic_timer_interrupt+0x38/0x50
Jan 24 10:19:42 lf1003-e3v2-13100

Re: e1000e hardware unit hangs

2018-01-24 Thread Neftin, Sasha

On 1/24/2018 18:11, Alexander Duyck wrote:

On Tue, Jan 23, 2018 at 3:46 PM, Ben Greear  wrote:

Hello,

Anyone have any more suggestions for making e1000e work better?  This is
from a 4.9.65+ kernel,
with these additional e1000e patches applied:

e1000e: Fix error path in link detection
e1000e: Fix wrong comment related to link detection
e1000e: Fix return value test
e1000e: Separate signaling for link check/link up
e1000e: Avoid receiver overrun interrupt bursts


Most of these patches shouldn't address anything that would trigger Tx
hangs. They are mostly related to just link detection.


Test case is simply to run 3 tcp connections each trying to send 56Kbps
of bi-directional
data between a pair of e1000e interfaces :)

No OOM related issues are seen on this kernel...similar test on 4.13 showed
some OOM
issues, but I have not debugged that yet...


Really a question like this probably belongs on e1000-devel or
intel-wired-lan so I have added those lists and the e1000e maintainer
to the thread.

It would be useful if you could provide more information about the
device itself such as the ID and the kind of test you are running.
Keep in mind the e1000e driver supports a pretty broad swath of
devices so we need to narrow things down a bit.


please, also re-check if your kernel include:
e1000e: fix buffer overrun while the I219 is processing DMA transactions
e1000e: fix the use of magic numbers for buffer overrun issue
where you take fresh version of kernel?


Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3
(e1000e): transmit queue 0 timed out, trans_start: 4294737199, wd-timeout:
5000 jiffies: 4294745088 tx-queues: 1
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth2
(e1000e): transmit queue 0 timed out, trans_start: 4294737200, wd-timeout:
5000 jiffies: 4294745088 tx-queues: 1
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: [ cut here
]
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: WARNING: CPU: 7 PID: 0
at /home/greearb/git/linux-4.9.dev.y/net/sched/sch_generic.c:322
dev_watchdog+0x267/0x270
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: Modules linked in:
nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 cfg80211 bnep
bluetooth macvlan wanlink(O) pktgen fuse corete...sunrpc ipmi_d
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: CPU: 7 PID: 0 Comm:
swapper/7 Tainted: G   O4.9.65+ #21
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: Hardware name:
Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 09/17/2012
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  88042fdc3df0
8142d791  
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  88042fdc3e30
8110f266 01422fdc3e08 
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  1388
fffc7d30 880417d0c000 fffc9c00
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: Call Trace:
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
dump_stack+0x63/0x82
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
__warn+0xc6/0xe0
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
warn_slowpath_null+0x18/0x20
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
dev_watchdog+0x267/0x270
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [] ?
qdisc_rcu_free+0x40/0x40
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
call_timer_fn+0x30/0x150
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [] ?
qdisc_rcu_free+0x40/0x40
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
run_timer_softirq+0x1f0/0x450
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [] ?
lapic_next_deadline+0x21/0x30
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [] ?
clockevents_program_event+0x7d/0x120
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
__do_softirq+0xc1/0x2c0
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
irq_exit+0xb1/0xc0
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
smp_apic_timer_interrupt+0x3d/0x50
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
apic_timer_interrupt+0x82/0x90
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [] ?
cpuidle_enter_state+0x126/0x300
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
cpuidle_enter+0x12/0x20
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
call_cpuidle+0x1e/0x40
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
cpu_startup_entry+0x13a/0x220
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  []
start_secondary+0x149/0x170
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: ---[ end trace
69e31de175b59d4f ]---
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: e1000e :06:00.0
eth2: Reset adapter unexpectedly
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: e1000e :07:00.0
eth3: Reset adapter unexpectedly
Jan 23 15:3

Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working

2017-12-20 Thread Neftin, Sasha

On 20/12/2017 18:01, Pavel Machek wrote:

On Wed 2017-12-20 16:54:21, Pavel Machek wrote:

Hi!


Before ask for reverting 19110cfbb..., please, check if follow patch
of Benjamin work for you http://patchwork.ozlabs.org/patch/846825/

Pavel, before ask for revert - let's check Benjamin's patch following to his 
previous patch. Previous patch was not competed and latest one come to complete 
changes.


v4.15-rc4+:

Ethernet works with 19110cfbb reverted.

Ethernet works With patchwork.ozlabs.org/patch/846825/ applied.

Hmm. So... ethernet originally did not work with patch/846825/ applied
or 19110cfbb reverted, so I re-plugged ethernet cables. Now it works
even with plain v4.15-rc4+.

So it looks like the bug was fixed in the mainline in the meantime...?

Sorry for the noise,
Pavel


Good Pavel, thanks for update us, let's keep both patch applied and see 
ethernet adapter behavior.




Re: [Intel-wired-lan] [PATCH] e1000e: Fix e1000_check_for_copper_link_ich8lan return value.

2017-12-20 Thread Neftin, Sasha

On 11/12/2017 9:26, Benjamin Poirier wrote:

e1000e_check_for_copper_link() and e1000_check_for_copper_link_ich8lan()
are the two functions that may be assigned to mac.ops.check_for_link when
phy.media_type == e1000_media_type_copper. Commit 19110cfbb34d ("e1000e:
Separate signaling for link check/link up") changed the meaning of the
return value of check_for_link for copper media but only adjusted the first
function. This patch adjusts the second function likewise.

Reported-by: Christian Hesse 
Reported-by: Gabriel C 
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198047
Fixes: 19110cfbb34d ("e1000e: Separate signaling for link check/link up")
Tested-by: Christian Hesse 
Signed-off-by: Benjamin Poirier 
---
  drivers/net/ethernet/intel/e1000e/ich8lan.c | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c 
b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index d6d4ed7acf03..31277d3bb7dc 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1367,6 +1367,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, 
bool force)
   *  Checks to see of the link status of the hardware has changed.  If a
   *  change in link status has been detected, then we read the PHY registers
   *  to get the current speed/duplex if link exists.
+ *
+ *  Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 (link
+ *  up).
   **/
  static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
  {
@@ -1382,7 +1385,7 @@ static s32 e1000_check_for_copper_link_ich8lan(struct 
e1000_hw *hw)
 * Change or Rx Sequence Error interrupt.
 */
if (!mac->get_link_status)
-   return 0;
+   return 1;
  
  	/* First we want to see if the MII Status Register reports

 * link.  If so, then we want to get the current speed/duplex
@@ -1613,10 +1616,12 @@ static s32 e1000_check_for_copper_link_ich8lan(struct 
e1000_hw *hw)
 * different link partner.
 */
ret_val = e1000e_config_fc_after_link_up(hw);
-   if (ret_val)
+   if (ret_val) {
e_dbg("Error configuring flow control\n");
+   return ret_val;
+   }
  
-	return ret_val;

+   return 1;
  }
  
  static s32 e1000_get_variants_ich8lan(struct e1000_adapter *adapter)


Acked by sasha.nef...@intel.com



Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working

2017-12-19 Thread Neftin, Sasha

On 12/18/2017 17:50, Neftin, Sasha wrote:

On 12/18/2017 13:58, Pavel Machek wrote:

On Mon 2017-12-18 13:24:40, Neftin, Sasha wrote:

On 12/18/2017 12:26, Pavel Machek wrote:

Hi!


In v4.15-rc2+, network manager can not see my ethernet card, and
manual attempts to ifconfig it up did not really help, either.

Card is:

02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit 
Ethernet

Controller



Any ideas ?

Yes , 19110cfbb34d4af0cdfe14cd243f3b09dc95b013 broke it.

See:
https://bugzilla.kernel.org/show_bug.cgi?id=198047

Fix there :
https://marc.info/?l=linux-kernel&m=151272209903675&w=2

I don't see the patch in latest mainline. Not having ethernet
is... somehow annoying. What is going on there?
Generally speaking, e1000 maintainence has been handled very 
poorly over

the past few years, I have to say.

Fixes take forever to propagate even when someone other than the
maintainer provides a working and tested fix, just like this case.

Jeff, please take e1000 maintainence seriously and get these critical
bug fixes propagated.

No response AFAICT. I guess I should test reverting
19110cfbb34d4af0cdfe14cd243f3b09dc95b013, then ask you for revert?

Hello Pavel,

Before ask for reverting 19110cfbb..., please, check if follow patch of
Benjamin work for you http://patchwork.ozlabs.org/patch/846825/

Jacob said, in another email:

# Digging into this, the problem is complicated. The original bug
# assumed behavior of the .check_for_link call, which is universally not
# implemented.
#
# I think the correct fix is to revert 19110cfbb34d ("e1000e: Separate
# signaling for link check/link up", 2017-10-10) and find a more 
proper solution.


...which makes me think that revert is preffered?

    Pavel

Pavel, before ask for revert - let's check Benjamin's patch following 
to his previous patch. Previous patch was not competed and latest one 
come to complete changes.


___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan


Pavel, any update? Is Benjamin's last patch solved your network problem?



Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working

2017-12-18 Thread Neftin, Sasha

On 12/18/2017 13:58, Pavel Machek wrote:

On Mon 2017-12-18 13:24:40, Neftin, Sasha wrote:

On 12/18/2017 12:26, Pavel Machek wrote:

Hi!


In v4.15-rc2+, network manager can not see my ethernet card, and
manual attempts to ifconfig it up did not really help, either.

Card is:

02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet
Controller



Any ideas ?

Yes , 19110cfbb34d4af0cdfe14cd243f3b09dc95b013 broke it.

See:
https://bugzilla.kernel.org/show_bug.cgi?id=198047

Fix there :
https://marc.info/?l=linux-kernel&m=151272209903675&w=2

I don't see the patch in latest mainline. Not having ethernet
is... somehow annoying. What is going on there?

Generally speaking, e1000 maintainence has been handled very poorly over
the past few years, I have to say.

Fixes take forever to propagate even when someone other than the
maintainer provides a working and tested fix, just like this case.

Jeff, please take e1000 maintainence seriously and get these critical
bug fixes propagated.

No response AFAICT. I guess I should test reverting
19110cfbb34d4af0cdfe14cd243f3b09dc95b013, then ask you for revert?

Hello Pavel,

Before ask for reverting 19110cfbb..., please, check if follow patch of
Benjamin work for you http://patchwork.ozlabs.org/patch/846825/

Jacob said, in another email:

# Digging into this, the problem is complicated. The original bug
# assumed behavior of the .check_for_link call, which is universally not
# implemented.
#
# I think the correct fix is to revert 19110cfbb34d ("e1000e: Separate
# signaling for link check/link up", 2017-10-10) and find a more proper 
solution.

...which makes me think that revert is preffered?

Pavel

Pavel, before ask for revert - let's check Benjamin's patch following to 
his previous patch. Previous patch was not competed and latest one come 
to complete changes.




Re: [net-next 6/9] e1000e: fix buffer overrun while the I219 is processing DMA transactions

2017-10-16 Thread Neftin, Sasha

On 10/11/2017 12:07, David Laight wrote:

From: Jeff Kirsher

Sent: 10 October 2017 18:22
Intel 100/200 Series Chipset platforms reduced the round-trip
latency for the LAN Controller DMA accesses, causing in some high
performance cases a buffer overrun while the I219 LAN Connected
Device is processing the DMA transactions. I219LM and I219V devices
can fall into unrecovered Tx hang under very stressfully UDP traffic
and multiple reconnection of Ethernet cable. This Tx hang of the LAN
Controller is only recovered if the system is rebooted. Slightly slow
down DMA access by reducing the number of outstanding requests.
This workaround could have an impact on TCP traffic performance
on the platform. Disabling TSO eliminates performance loss for TCP
traffic without a noticeable impact on CPU performance.

Please, refer to I218/I219 specification update:
https://www.intel.com/content/www/us/en/embedded/products/networking/
ethernet-connection-i218-family-documentation.html

Signed-off-by: Sasha Neftin 
Reviewed-by: Dima Ruinskiy 
Reviewed-by: Raanan Avargil 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index ee9de3500331..14b096f3d1da 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3021,8 +3021,8 @@ static void e1000_configure_tx(struct e1000_adapter 
*adapter)

hw->mac.ops.config_collision_dist(hw);

-   /* SPT and CNP Si errata workaround to avoid data corruption */
-   if (hw->mac.type >= e1000_pch_spt) {
+   /* SPT and KBL Si errata workaround to avoid data corruption */
+   if (hw->mac.type == e1000_pch_spt) {
u32 reg_val;

reg_val = er32(IOSFPC);
@@ -3030,7 +3030,9 @@ static void e1000_configure_tx(struct e1000_adapter 
*adapter)
ew32(IOSFPC, reg_val);

reg_val = er32(TARC(0));
-   reg_val |= E1000_TARC0_CB_MULTIQ_3_REQ;
+   /* SPT and KBL Si errata workaround to avoid Tx hang */
+   reg_val &= ~BIT(28);
+   reg_val |= BIT(29);

Shouldn't some more of the commit message about what this is doing
be in the comment?
There is provided link on specification update: 
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/i218-i219-ethernet-connection-spec-update.pdf?asset=9561. 
This is Intel's public release.

And shouldn't the 28 and 28 be named constants?
(28 and 29) - you can easy understand from code that same value has been 
changed from 3 to 2. There is no point add flag here I thought.



ew32(TARC(0), reg_val);

David


Thanks,

Sasha



Re: [net-next 6/9] e1000e: fix buffer overrun while the I219 is processing DMA transactions

2017-10-16 Thread Neftin, Sasha

On 10/11/2017 12:07, David Laight wrote:

From: Jeff Kirsher

Sent: 10 October 2017 18:22
Intel 100/200 Series Chipset platforms reduced the round-trip
latency for the LAN Controller DMA accesses, causing in some high
performance cases a buffer overrun while the I219 LAN Connected
Device is processing the DMA transactions. I219LM and I219V devices
can fall into unrecovered Tx hang under very stressfully UDP traffic
and multiple reconnection of Ethernet cable. This Tx hang of the LAN
Controller is only recovered if the system is rebooted. Slightly slow
down DMA access by reducing the number of outstanding requests.
This workaround could have an impact on TCP traffic performance
on the platform. Disabling TSO eliminates performance loss for TCP
traffic without a noticeable impact on CPU performance.

Please, refer to I218/I219 specification update:
https://www.intel.com/content/www/us/en/embedded/products/networking/
ethernet-connection-i218-family-documentation.html

Signed-off-by: Sasha Neftin 
Reviewed-by: Dima Ruinskiy 
Reviewed-by: Raanan Avargil 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index ee9de3500331..14b096f3d1da 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3021,8 +3021,8 @@ static void e1000_configure_tx(struct e1000_adapter 
*adapter)

hw->mac.ops.config_collision_dist(hw);

-   /* SPT and CNP Si errata workaround to avoid data corruption */
-   if (hw->mac.type >= e1000_pch_spt) {
+   /* SPT and KBL Si errata workaround to avoid data corruption */
+   if (hw->mac.type == e1000_pch_spt) {
u32 reg_val;

reg_val = er32(IOSFPC);
@@ -3030,7 +3030,9 @@ static void e1000_configure_tx(struct e1000_adapter 
*adapter)
ew32(IOSFPC, reg_val);

reg_val = er32(TARC(0));
-   reg_val |= E1000_TARC0_CB_MULTIQ_3_REQ;
+   /* SPT and KBL Si errata workaround to avoid Tx hang */
+   reg_val &= ~BIT(28);
+   reg_val |= BIT(29);

Shouldn't some more of the commit message about what this is doing
be in the comment?
There is provided link on specification update: 
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/i218-i219-ethernet-connection-spec-update.pdf?asset=9561. 
This is Intel's public edition.

And shouldn't the 28 and 28 be named constants?
(28 and 29) you can easy understand from the code that value has been 
changed from 3 to 2. There is no point add flags here I thought.



ew32(TARC(0), reg_val);

David


Thanks,

Sasha



Re: [Intel-wired-lan] [PATCH net-next v3] e1000e: Be drop monitor friendly

2017-09-06 Thread Neftin, Sasha

On 8/26/2017 04:14, Florian Fainelli wrote:

e1000e_put_txbuf() can be called from normal reclamation path as well as
when a DMA mapping failure, so we need to differentiate these two cases
when freeing SKBs to be drop monitor friendly. e1000e_tx_hwtstamp_work()
and e1000_remove() are processing TX timestamped SKBs and those should
not be accounted as drops either.

Signed-off-by: Florian Fainelli 
---
Changes in v3:

- differentiate normal reclamation from TX DMA fragment mapping errors
- removed a few invalid dev_kfree_skb() replacements (those are already
   drop monitor friendly)

Changes in v2:

- make it compile

  drivers/net/ethernet/intel/e1000e/netdev.c | 18 +++---
  1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 327dfe5bedc0..cfd21858c095 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1071,7 +1071,8 @@ static bool e1000_clean_rx_irq(struct e1000_ring 
*rx_ring, int *work_done,
  }
  
  static void e1000_put_txbuf(struct e1000_ring *tx_ring,

-   struct e1000_buffer *buffer_info)
+   struct e1000_buffer *buffer_info,
+   bool drop)
  {
struct e1000_adapter *adapter = tx_ring->adapter;
  
@@ -1085,7 +1086,10 @@ static void e1000_put_txbuf(struct e1000_ring *tx_ring,

buffer_info->dma = 0;
}
if (buffer_info->skb) {
-   dev_kfree_skb_any(buffer_info->skb);
+   if (drop)
+   dev_kfree_skb_any(buffer_info->skb);
+   else
+   dev_consume_skb_any(buffer_info->skb);
buffer_info->skb = NULL;
}
buffer_info->time_stamp = 0;
@@ -1199,7 +1203,7 @@ static void e1000e_tx_hwtstamp_work(struct work_struct 
*work)
wmb(); /* force write prior to skb_tstamp_tx */
  
  		skb_tstamp_tx(skb, &shhwtstamps);

-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
} else if (time_after(jiffies, adapter->tx_hwtstamp_start
  + adapter->tx_timeout_factor * HZ)) {
dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
@@ -1254,7 +1258,7 @@ static bool e1000_clean_tx_irq(struct e1000_ring *tx_ring)
}
}
  
-			e1000_put_txbuf(tx_ring, buffer_info);

+   e1000_put_txbuf(tx_ring, buffer_info, false);
tx_desc->upper.data = 0;
  
  			i++;

@@ -2421,7 +2425,7 @@ static void e1000_clean_tx_ring(struct e1000_ring 
*tx_ring)
  
  	for (i = 0; i < tx_ring->count; i++) {

buffer_info = &tx_ring->buffer_info[i];
-   e1000_put_txbuf(tx_ring, buffer_info);
+   e1000_put_txbuf(tx_ring, buffer_info, false);
}
  
  	netdev_reset_queue(adapter->netdev);

@@ -5614,7 +5618,7 @@ static int e1000_tx_map(struct e1000_ring *tx_ring, 
struct sk_buff *skb,
i += tx_ring->count;
i--;
buffer_info = &tx_ring->buffer_info[i];
-   e1000_put_txbuf(tx_ring, buffer_info);
+   e1000_put_txbuf(tx_ring, buffer_info, true);
}
  
  	return 0;

@@ -7411,7 +7415,7 @@ static void e1000_remove(struct pci_dev *pdev)
if (adapter->flags & FLAG_HAS_HW_TIMESTAMP) {
cancel_work_sync(&adapter->tx_hwtstamp_work);
if (adapter->tx_hwtstamp_skb) {
-   dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
+   dev_consume_skb_any(adapter->tx_hwtstamp_skb);
adapter->tx_hwtstamp_skb = NULL;
}
}


i am ok with this patch



Re: [Intel-wired-lan] [PATCH] e1000e: changed some expensive calls of udelay to usleep_range

2017-08-29 Thread Neftin, Sasha

On 8/23/2017 18:59, Matthew Tan wrote:

 Calls to udelay are not preemtable by userspace so userspace
 applications experience a large (~200us) latency when running on core
 0. Instead usleep_range can be used to be more friendly to userspace
 since it is preemtable. This is due to udelay using busy-wait loops
 while usleep_rang uses hrtimers instead. It is recommended to use
 udelay when the delay is <10us since at that precision overhead of
 usleep_range hrtimer setup causes issues. However, the replaced calls
 are for 50us and 100us so this should not be not an issue.

Signed-off-by: Matthew Tan 
---
  drivers/net/ethernet/intel/e1000e/phy.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/phy.c 
b/drivers/net/ethernet/intel/e1000e/phy.c
index de13aea..e318fdc 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -158,7 +158,7 @@ s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 
offset, u16 *data)
 * the lower time out
 */
for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
-   udelay(50);
+   usleep_range(40, 60);
mdic = er32(MDIC);
if (mdic & E1000_MDIC_READY)
break;
@@ -183,7 +183,7 @@ s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 
offset, u16 *data)
 * reading duplicate data in the next MDIC transaction.
 */
if (hw->mac.type == e1000_pch2lan)
-   udelay(100);
+   usleep_range(90, 100);
  
  	return 0;

  }
@@ -222,7 +222,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 
offset, u16 data)
 * the lower time out
 */
for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
-   udelay(50);
+   usleep_range(40, 60);
mdic = er32(MDIC);
if (mdic & E1000_MDIC_READY)
break;
@@ -246,7 +246,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 
offset, u16 data)
 * reading duplicate data in the next MDIC transaction.
 */
if (hw->mac.type == e1000_pch2lan)
-   udelay(100);
+   usleep_range(90, 110);
  
  	return 0;

  }


Reasonable. Do you have any open bug or other reference describe this 
problem?




Re: [Intel-wired-lan] [PATCH] e1000e: apply burst mode settings only on default

2017-08-27 Thread Neftin, Sasha

On 8/27/2017 11:32, Neftin, Sasha wrote:

On 8/27/2017 11:30, Neftin, Sasha wrote:

On 8/25/2017 18:06, Willem de Bruijn wrote:

From: Willem de Bruijn 

Devices that support FLAG2_DMA_BURST have different default values
for RDTR and RADV. Apply burst mode default settings only when no
explicit value was passed at module load.

The RDTR default is zero. If the module is loaded for low latency
operation with RxIntDelay=0, do not override this value with a burst
default of 32.

Move the decision to apply burst values earlier, where explicitly
initialized module variables can be distinguished from defaults.

Signed-off-by: Willem de Bruijn 
---
  drivers/net/ethernet/intel/e1000e/e1000.h  |  4 
  drivers/net/ethernet/intel/e1000e/netdev.c |  8 
  drivers/net/ethernet/intel/e1000e/param.c  | 16 +++-
  3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
b/drivers/net/ethernet/intel/e1000e/e1000.h

index 98e6abb1..2311b31bdcac 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -94,10 +94,6 @@ struct e1000_info;
   */
  #define E1000_CHECK_RESET_COUNT25
  -#define DEFAULT_RDTR0
-#define DEFAULT_RADV8
-#define BURST_RDTR0x20
-#define BURST_RADV0x20
  #define PCICFG_DESC_RING_STATUS0xe4
  #define FLUSH_DESC_REQUIRED0x100
  diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c

index 327dfe5bedc0..47b89aac7969 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3223,14 +3223,6 @@ static void e1000_configure_rx(struct 
e1000_adapter *adapter)

   */
  ew32(RXDCTL(0), E1000_RXDCTL_DMA_BURST_ENABLE);
  ew32(RXDCTL(1), E1000_RXDCTL_DMA_BURST_ENABLE);
-
-/* override the delay timers for enabling bursting, only if
- * the value was not set by the user via module options
- */
-if (adapter->rx_int_delay == DEFAULT_RDTR)
-adapter->rx_int_delay = BURST_RDTR;
-if (adapter->rx_abs_int_delay == DEFAULT_RADV)
-adapter->rx_abs_int_delay = BURST_RADV;
  }
/* set the Receive Delay Timer Register */
diff --git a/drivers/net/ethernet/intel/e1000e/param.c 
b/drivers/net/ethernet/intel/e1000e/param.c

index 6d8c39abee16..bb696c98f9b0 100644
--- a/drivers/net/ethernet/intel/e1000e/param.c
+++ b/drivers/net/ethernet/intel/e1000e/param.c
@@ -73,17 +73,25 @@ E1000_PARAM(TxAbsIntDelay, "Transmit Absolute 
Interrupt Delay");

  /* Receive Interrupt Delay in units of 1.024 microseconds
   * hardware will likely hang if you set this to anything but zero.
   *
+ * Burst variant is used as default if device has FLAG2_DMA_BURST.
+ *
   * Valid Range: 0-65535
   */
  E1000_PARAM(RxIntDelay, "Receive Interrupt Delay");
+#define DEFAULT_RDTR0
+#define BURST_RDTR0x20
  #define MAX_RXDELAY 0x
  #define MIN_RXDELAY 0
/* Receive Absolute Interrupt Delay in units of 1.024 microseconds
+ *
+ * Burst variant is used as default if device has FLAG2_DMA_BURST.
   *
   * Valid Range: 0-65535
   */
  E1000_PARAM(RxAbsIntDelay, "Receive Absolute Interrupt Delay");
+#define DEFAULT_RADV8
+#define BURST_RADV0x20
  #define MAX_RXABSDELAY 0x
  #define MIN_RXABSDELAY 0
  @@ -297,6 +305,9 @@ void e1000e_check_options(struct e1000_adapter 
*adapter)

   .max = MAX_RXDELAY } }
  };
  +if (adapter->flags2 & FLAG2_DMA_BURST)
+opt.def = BURST_RDTR;
+
  if (num_RxIntDelay > bd) {
  adapter->rx_int_delay = RxIntDelay[bd];
e1000_validate_option(&adapter->rx_int_delay, &opt,
@@ -307,7 +318,7 @@ void e1000e_check_options(struct e1000_adapter 
*adapter)

  }
  /* Receive Absolute Interrupt Delay */
  {
-static const struct e1000_option opt = {
+static struct e1000_option opt = {
  .type = range_option,
  .name = "Receive Absolute Interrupt Delay",
  .err  = "using default of "
@@ -317,6 +328,9 @@ void e1000e_check_options(struct e1000_adapter 
*adapter)

   .max = MAX_RXABSDELAY } }
  };
  +if (adapter->flags2 & FLAG2_DMA_BURST)
+opt.def = BURST_RADV;
+
  if (num_RxAbsIntDelay > bd) {
  adapter->rx_abs_int_delay = RxAbsIntDelay[bd];
e1000_validate_option(&adapter->rx_abs_int_delay, &opt,


This patch looks good for me, but I would like hear second opinion.

___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan



___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan





Re: [Intel-wired-lan] [PATCH] e1000e: apply burst mode settings only on default

2017-08-27 Thread Neftin, Sasha

On 8/27/2017 11:30, Neftin, Sasha wrote:

On 8/25/2017 18:06, Willem de Bruijn wrote:

From: Willem de Bruijn 

Devices that support FLAG2_DMA_BURST have different default values
for RDTR and RADV. Apply burst mode default settings only when no
explicit value was passed at module load.

The RDTR default is zero. If the module is loaded for low latency
operation with RxIntDelay=0, do not override this value with a burst
default of 32.

Move the decision to apply burst values earlier, where explicitly
initialized module variables can be distinguished from defaults.

Signed-off-by: Willem de Bruijn 
---
  drivers/net/ethernet/intel/e1000e/e1000.h  |  4 
  drivers/net/ethernet/intel/e1000e/netdev.c |  8 
  drivers/net/ethernet/intel/e1000e/param.c  | 16 +++-
  3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
b/drivers/net/ethernet/intel/e1000e/e1000.h

index 98e6abb1..2311b31bdcac 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -94,10 +94,6 @@ struct e1000_info;
   */
  #define E1000_CHECK_RESET_COUNT25
  -#define DEFAULT_RDTR0
-#define DEFAULT_RADV8
-#define BURST_RDTR0x20
-#define BURST_RADV0x20
  #define PCICFG_DESC_RING_STATUS0xe4
  #define FLUSH_DESC_REQUIRED0x100
  diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c

index 327dfe5bedc0..47b89aac7969 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3223,14 +3223,6 @@ static void e1000_configure_rx(struct 
e1000_adapter *adapter)

   */
  ew32(RXDCTL(0), E1000_RXDCTL_DMA_BURST_ENABLE);
  ew32(RXDCTL(1), E1000_RXDCTL_DMA_BURST_ENABLE);
-
-/* override the delay timers for enabling bursting, only if
- * the value was not set by the user via module options
- */
-if (adapter->rx_int_delay == DEFAULT_RDTR)
-adapter->rx_int_delay = BURST_RDTR;
-if (adapter->rx_abs_int_delay == DEFAULT_RADV)
-adapter->rx_abs_int_delay = BURST_RADV;
  }
/* set the Receive Delay Timer Register */
diff --git a/drivers/net/ethernet/intel/e1000e/param.c 
b/drivers/net/ethernet/intel/e1000e/param.c

index 6d8c39abee16..bb696c98f9b0 100644
--- a/drivers/net/ethernet/intel/e1000e/param.c
+++ b/drivers/net/ethernet/intel/e1000e/param.c
@@ -73,17 +73,25 @@ E1000_PARAM(TxAbsIntDelay, "Transmit Absolute 
Interrupt Delay");

  /* Receive Interrupt Delay in units of 1.024 microseconds
   * hardware will likely hang if you set this to anything but zero.
   *
+ * Burst variant is used as default if device has FLAG2_DMA_BURST.
+ *
   * Valid Range: 0-65535
   */
  E1000_PARAM(RxIntDelay, "Receive Interrupt Delay");
+#define DEFAULT_RDTR0
+#define BURST_RDTR0x20
  #define MAX_RXDELAY 0x
  #define MIN_RXDELAY 0
/* Receive Absolute Interrupt Delay in units of 1.024 microseconds
+ *
+ * Burst variant is used as default if device has FLAG2_DMA_BURST.
   *
   * Valid Range: 0-65535
   */
  E1000_PARAM(RxAbsIntDelay, "Receive Absolute Interrupt Delay");
+#define DEFAULT_RADV8
+#define BURST_RADV0x20
  #define MAX_RXABSDELAY 0x
  #define MIN_RXABSDELAY 0
  @@ -297,6 +305,9 @@ void e1000e_check_options(struct e1000_adapter 
*adapter)

   .max = MAX_RXDELAY } }
  };
  +if (adapter->flags2 & FLAG2_DMA_BURST)
+opt.def = BURST_RDTR;
+
  if (num_RxIntDelay > bd) {
  adapter->rx_int_delay = RxIntDelay[bd];
e1000_validate_option(&adapter->rx_int_delay, &opt,
@@ -307,7 +318,7 @@ void e1000e_check_options(struct e1000_adapter 
*adapter)

  }
  /* Receive Absolute Interrupt Delay */
  {
-static const struct e1000_option opt = {
+static struct e1000_option opt = {
  .type = range_option,
  .name = "Receive Absolute Interrupt Delay",
  .err  = "using default of "
@@ -317,6 +328,9 @@ void e1000e_check_options(struct e1000_adapter 
*adapter)

   .max = MAX_RXABSDELAY } }
  };
  +if (adapter->flags2 & FLAG2_DMA_BURST)
+opt.def = BURST_RADV;
+
  if (num_RxAbsIntDelay > bd) {
  adapter->rx_abs_int_delay = RxAbsIntDelay[bd];
e1000_validate_option(&adapter->rx_abs_int_delay, &opt,


This patch looks good for me, but I would like hear second opinion.

___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan





Re: [Intel-wired-lan] [PATCH] e1000e: apply burst mode settings only on default

2017-08-27 Thread Neftin, Sasha

On 8/25/2017 18:06, Willem de Bruijn wrote:

From: Willem de Bruijn 

Devices that support FLAG2_DMA_BURST have different default values
for RDTR and RADV. Apply burst mode default settings only when no
explicit value was passed at module load.

The RDTR default is zero. If the module is loaded for low latency
operation with RxIntDelay=0, do not override this value with a burst
default of 32.

Move the decision to apply burst values earlier, where explicitly
initialized module variables can be distinguished from defaults.

Signed-off-by: Willem de Bruijn 
---
  drivers/net/ethernet/intel/e1000e/e1000.h  |  4 
  drivers/net/ethernet/intel/e1000e/netdev.c |  8 
  drivers/net/ethernet/intel/e1000e/param.c  | 16 +++-
  3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
b/drivers/net/ethernet/intel/e1000e/e1000.h
index 98e6abb1..2311b31bdcac 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -94,10 +94,6 @@ struct e1000_info;
   */
  #define E1000_CHECK_RESET_COUNT   25
  
-#define DEFAULT_RDTR			0

-#define DEFAULT_RADV   8
-#define BURST_RDTR 0x20
-#define BURST_RADV 0x20
  #define PCICFG_DESC_RING_STATUS   0xe4
  #define FLUSH_DESC_REQUIRED   0x100
  
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c

index 327dfe5bedc0..47b89aac7969 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3223,14 +3223,6 @@ static void e1000_configure_rx(struct e1000_adapter 
*adapter)
 */
ew32(RXDCTL(0), E1000_RXDCTL_DMA_BURST_ENABLE);
ew32(RXDCTL(1), E1000_RXDCTL_DMA_BURST_ENABLE);
-
-   /* override the delay timers for enabling bursting, only if
-* the value was not set by the user via module options
-*/
-   if (adapter->rx_int_delay == DEFAULT_RDTR)
-   adapter->rx_int_delay = BURST_RDTR;
-   if (adapter->rx_abs_int_delay == DEFAULT_RADV)
-   adapter->rx_abs_int_delay = BURST_RADV;
}
  
  	/* set the Receive Delay Timer Register */

diff --git a/drivers/net/ethernet/intel/e1000e/param.c 
b/drivers/net/ethernet/intel/e1000e/param.c
index 6d8c39abee16..bb696c98f9b0 100644
--- a/drivers/net/ethernet/intel/e1000e/param.c
+++ b/drivers/net/ethernet/intel/e1000e/param.c
@@ -73,17 +73,25 @@ E1000_PARAM(TxAbsIntDelay, "Transmit Absolute Interrupt 
Delay");
  /* Receive Interrupt Delay in units of 1.024 microseconds
   * hardware will likely hang if you set this to anything but zero.
   *
+ * Burst variant is used as default if device has FLAG2_DMA_BURST.
+ *
   * Valid Range: 0-65535
   */
  E1000_PARAM(RxIntDelay, "Receive Interrupt Delay");
+#define DEFAULT_RDTR   0
+#define BURST_RDTR 0x20
  #define MAX_RXDELAY 0x
  #define MIN_RXDELAY 0
  
  /* Receive Absolute Interrupt Delay in units of 1.024 microseconds

+ *
+ * Burst variant is used as default if device has FLAG2_DMA_BURST.
   *
   * Valid Range: 0-65535
   */
  E1000_PARAM(RxAbsIntDelay, "Receive Absolute Interrupt Delay");
+#define DEFAULT_RADV   8
+#define BURST_RADV 0x20
  #define MAX_RXABSDELAY 0x
  #define MIN_RXABSDELAY 0
  
@@ -297,6 +305,9 @@ void e1000e_check_options(struct e1000_adapter *adapter)

 .max = MAX_RXDELAY } }
};
  
+		if (adapter->flags2 & FLAG2_DMA_BURST)

+   opt.def = BURST_RDTR;
+
if (num_RxIntDelay > bd) {
adapter->rx_int_delay = RxIntDelay[bd];
e1000_validate_option(&adapter->rx_int_delay, &opt,
@@ -307,7 +318,7 @@ void e1000e_check_options(struct e1000_adapter *adapter)
}
/* Receive Absolute Interrupt Delay */
{
-   static const struct e1000_option opt = {
+   static struct e1000_option opt = {
.type = range_option,
.name = "Receive Absolute Interrupt Delay",
.err  = "using default of "
@@ -317,6 +328,9 @@ void e1000e_check_options(struct e1000_adapter *adapter)
 .max = MAX_RXABSDELAY } }
};
  
+		if (adapter->flags2 & FLAG2_DMA_BURST)

+   opt.def = BURST_RADV;
+
if (num_RxAbsIntDelay > bd) {
adapter->rx_abs_int_delay = RxAbsIntDelay[bd];
e1000_validate_option(&adapter->rx_abs_int_delay, &opt,


This patch looks good for me, but I would like hear second opinion.



Re: [Intel-wired-lan] [PATCH 4/5] e1000e: Separate signaling for link check/link up

2017-08-02 Thread Neftin, Sasha

On 7/21/2017 21:36, Benjamin Poirier wrote:

Lennart reported the following race condition:

\ e1000_watchdog_task
 \ e1000e_has_link
 \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
 /* link is up */
 mac->get_link_status = false;

 /* interrupt */
 \ e1000_msix_other
 hw->mac.get_link_status = true;

 link_active = !hw->mac.get_link_status
 /* link_active is false, wrongly */

This problem arises because the single flag get_link_status is used to
signal two different states: link status needs checking and link status is
down.

Avoid the problem by using the return value of .check_for_link to signal
the link status to e1000e_has_link().

Reported-by: Lennart Sorensen 
Signed-off-by: Benjamin Poirier 
---
  drivers/net/ethernet/intel/e1000e/mac.c| 11 ---
  drivers/net/ethernet/intel/e1000e/netdev.c |  2 +-
  2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/mac.c 
b/drivers/net/ethernet/intel/e1000e/mac.c
index b322011ec282..f457c5703d0c 100644
--- a/drivers/net/ethernet/intel/e1000e/mac.c
+++ b/drivers/net/ethernet/intel/e1000e/mac.c
@@ -410,6 +410,9 @@ void e1000e_clear_hw_cntrs_base(struct e1000_hw *hw)
   *  Checks to see of the link status of the hardware has changed.  If a
   *  change in link status has been detected, then we read the PHY registers
   *  to get the current speed/duplex if link exists.
+ *
+ *  Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 (link
+ *  up).
   **/
  s32 e1000e_check_for_copper_link(struct e1000_hw *hw)
  {
@@ -423,7 +426,7 @@ s32 e1000e_check_for_copper_link(struct e1000_hw *hw)
 * Change or Rx Sequence Error interrupt.
 */
if (!mac->get_link_status)
-   return 0;
+   return 1;
  
  	/* First we want to see if the MII Status Register reports

 * link.  If so, then we want to get the current speed/duplex
@@ -461,10 +464,12 @@ s32 e1000e_check_for_copper_link(struct e1000_hw *hw)
 * different link partner.
 */
ret_val = e1000e_config_fc_after_link_up(hw);
-   if (ret_val)
+   if (ret_val) {
e_dbg("Error configuring flow control\n");
+   return ret_val;
+   }
  
-	return ret_val;

+   return 1;
  }
  
  /**

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index fc6a1db2..5a8ab1136566 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5081,7 +5081,7 @@ static bool e1000e_has_link(struct e1000_adapter *adapter)
case e1000_media_type_copper:
if (hw->mac.get_link_status) {
ret_val = hw->mac.ops.check_for_link(hw);
-   link_active = !hw->mac.get_link_status;
+   link_active = ret_val > 0;
} else {
link_active = true;
}


Hello Benjamin,

Will this patch fix any serious problem with link indication? Is it 
necessary? Can we consider your patch series without 4/5 part?




Re: [Intel-wired-lan] [PATCH] net: intel: e1000e: add check on e1e_wphy() return value

2017-06-21 Thread Neftin, Sasha

On 21/06/2017 22:52, Gustavo A. R. Silva wrote:

Hi Ethan,

Quoting Ethan Zhao :


Gustavo,

   The return value of ret_val seems used to check if the access to 
PHY/NVM

got its semaphore,  generally speaking, it is needed for every PHY
access of this driver.

Reviewed-by: Ethan Zhao 



Thank you very much for the clarification.


On Wed, Jun 21, 2017 at 5:22 AM, Gustavo A. R. Silva
 wrote:

Check return value from call to e1e_wphy(). This value is being
checked during previous calls to function e1e_wphy() and it seems
a check was missing here.

Addresses-Coverity-ID: 1226905
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/ethernet/intel/e1000e/ich8lan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c 
b/drivers/net/ethernet/intel/e1000e/ich8lan.c

index 68ea8b4..d6d4ed7 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -2437,6 +2437,8 @@ static s32 
e1000_hv_phy_workarounds_ich8lan(struct e1000_hw *hw)

if (hw->phy.revision < 2) {
e1000e_phy_sw_reset(hw);
ret_val = e1e_wphy(hw, MII_BMCR, 0x3140);
+   if (ret_val)
+   return ret_val;
}
}

--
2.5.0



--
Gustavo A. R. Silva





___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan


We will accept this patch.



Re: [Intel-wired-lan] [PATCH v2 1/1] e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails

2017-06-04 Thread Neftin, Sasha

On 5/31/2017 18:50, Jani Nikula wrote:

From: Chris Wilson 

An error during suspend (e100e_pm_suspend),

[  429.994338] ACPI : EC: event blocked
[  429.994633] e1000e: EEE TX LPI TIMER: 0011
[  430.955451] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x30 [e1000e] returns -2
[  430.955454] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2
[  430.955458] PM: Device :00:19.0 failed to suspend async: error -2
[  430.955581] PM: Some devices failed to suspend, or early wake event detected
[  430.957709] ACPI : EC: event unblocked

lead to complete failure:

[  432.585002] [ cut here ]
[  432.585013] WARNING: CPU: 3 PID: 8372 at kernel/irq/manage.c:1478 
__free_irq+0x9f/0x280
[  432.585015] Trying to free already-free IRQ 20
[  432.585016] Modules linked in: cdc_ncm usbnet x86_pkg_temp_thermal 
intel_powerclamp coretemp mii crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel 
snd_hda_codec snd_hwdep lpc_ich snd_hda_core snd_pcm mei_me mei sdhci_pci sdhci 
i915 mmc_core e1000e ptp pps_core prime_numbers
[  432.585042] CPU: 3 PID: 8372 Comm: kworker/u16:40 Tainted: G U  
4.10.0-rc8-CI-Patchwork_3870+ #1
[  432.585044] Hardware name: LENOVO 2356GCG/2356GCG, BIOS G7ET31WW (1.13 ) 
07/02/2012
[  432.585050] Workqueue: events_unbound async_run_entry_fn
[  432.585051] Call Trace:
[  432.585058]  dump_stack+0x67/0x92
[  432.585062]  __warn+0xc6/0xe0
[  432.585065]  warn_slowpath_fmt+0x4a/0x50
[  432.585070]  ? _raw_spin_lock_irqsave+0x49/0x60
[  432.585072]  __free_irq+0x9f/0x280
[  432.585075]  free_irq+0x34/0x80
[  432.585089]  e1000_free_irq+0x65/0x70 [e1000e]
[  432.585098]  e1000e_pm_freeze+0x7a/0xb0 [e1000e]
[  432.585106]  e1000e_pm_suspend+0x21/0x30 [e1000e]
[  432.585113]  pci_pm_suspend+0x71/0x140
[  432.585118]  dpm_run_callback+0x6f/0x330
[  432.585122]  ? pci_pm_freeze+0xe0/0xe0
[  432.585125]  __device_suspend+0xea/0x330
[  432.585128]  async_suspend+0x1a/0x90
[  432.585132]  async_run_entry_fn+0x34/0x160
[  432.585137]  process_one_work+0x1f4/0x6d0
[  432.585140]  ? process_one_work+0x16e/0x6d0
[  432.585143]  worker_thread+0x49/0x4a0
[  432.585145]  kthread+0x107/0x140
[  432.585148]  ? process_one_work+0x6d0/0x6d0
[  432.585150]  ? kthread_create_on_node+0x40/0x40
[  432.585154]  ret_from_fork+0x2e/0x40
[  432.585156] ---[ end trace 6712df7f8c4b9124 ]---

The unwind failures stems from commit 2800209994f8 ("e1000e: Refactor PM
flows"), but it may be a later patch that introduced the non-recoverable
behaviour.

Fixes: 2800209994f8 ("e1000e: Refactor PM flows")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99847
Cc: Tvrtko Ursulin 
Cc: Jeff Kirsher 
Cc: Dave Ertman 
Cc: Bruce Allan 
Cc: intel-wired-...@lists.osuosl.org
Cc: netdev@vger.kernel.org
Signed-off-by: Chris Wilson 
[Jani: bikeshed repainted]
Signed-off-by: Jani Nikula 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index b3679728caac..5cad688be609 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6630,12 +6630,17 @@ static int e1000e_pm_thaw(struct device *dev)
  static int e1000e_pm_suspend(struct device *dev)
  {
struct pci_dev *pdev = to_pci_dev(dev);
+   int rc;
  
  	e1000e_flush_lpic(pdev);
  
  	e1000e_pm_freeze(dev);
  
-	return __e1000_shutdown(pdev, false);

+   rc = __e1000_shutdown(pdev, false);
+   if (rc)
+   e1000e_pm_thaw(dev);
+
+   return rc;
  }
  
  static int e1000e_pm_resume(struct device *dev)


Good. Let's pick up this patch.



Re: [Intel-wired-lan] [PATCH] e1000e: use disable_hardirq() also for MSIX vectors in e1000_netpoll()

2017-05-23 Thread Neftin, Sasha

On 5/19/2017 21:34, Cong Wang wrote:

On Fri, May 19, 2017 at 12:18 AM, Konstantin Khlebnikov
 wrote:

Replace disable_irq() which waits for threaded irq handlers with
disable_hardirq() which waits only for hardirq part.

Signed-off-by: Konstantin Khlebnikov 
Fixes: 311191297125 ("e1000: use disable_hardirq() for e1000_netpoll()")

Thomas had a similar patch, I don't know why he never sends it
out formally. Anyway,

Acked-by: Cong Wang 
___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan


Ack. Let's pick up this patch.



Re: [Intel-wired-lan] [PATCH 1/2] e1000e: Don't return uninitialized stats

2017-04-24 Thread Neftin, Sasha

On 4/23/2017 15:53, Neftin, Sasha wrote:

-Original Message-
From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On 
Behalf Of Benjamin Poirier
Sent: Saturday, April 22, 2017 00:20
To: Kirsher, Jeffrey T 
Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org; Stefan Priebe 

Subject: [Intel-wired-lan] [PATCH 1/2] e1000e: Don't return uninitialized stats

Some statistics passed to ethtool are garbage because e1000e_get_stats64() 
doesn't write them, for example: tx_heartbeat_errors. This leaks kernel memory 
to userspace and confuses users.

Do like ixgbe and use dev_get_stats() which first zeroes out rtnl_link_stats64.

Reported-by: Stefan Priebe 
Signed-off-by: Benjamin Poirier 
---
  drivers/net/ethernet/intel/e1000e/ethtool.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c 
b/drivers/net/ethernet/intel/e1000e/ethtool.c
index 7aff68a4a4df..f117b90cdc2f 100644
--- a/drivers/net/ethernet/intel/e1000e/ethtool.c
+++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
@@ -2063,7 +2063,7 @@ static void e1000_get_ethtool_stats(struct net_device 
*netdev,
  
  	pm_runtime_get_sync(netdev->dev.parent);
  
-	e1000e_get_stats64(netdev, &net_stats);

+   dev_get_stats(netdev, &net_stats);
  
  	pm_runtime_put_sync(netdev->dev.parent);
  
--

2.12.2

___
Intel-wired-lan mailing list
intel-wired-...@lists.osuosl.org
http://lists.osuosl.org/mailman/listinfo/intel-wired-lan


Hello,

We would like to not accept this patch. Suggested generic method 
'*dev_get_stats' (net/core/dev.c) calls 'ops->ndo_get_stats64' method 
which eventually calls e1000e_get_stats64 (netdev.c) - so there is same 
functionality. Also, see that 'e1000e_get_stats64' method in netdev.c 
(line 5928) calls 'memset' with 0's before update statistics.  Local 
sanity check in our lab shows 'tx_heartbeat_errors' counter reported as 0.


Thanks,

Sasha



Re: [Intel-wired-lan] NFS over NAT causes e1000e transmit hangs

2017-04-22 Thread Neftin, Sasha

On 4/20/2017 00:15, Florian Fainelli wrote:

On 04/19/2017 01:52 AM, Neftin, Sasha wrote:

On 4/18/2017 22:05, Florian Fainelli wrote:

On 04/18/2017 12:03 PM, Eric Dumazet wrote:

On Tue, 2017-04-18 at 11:18 -0700, Florian Fainelli wrote:

Hi,

I am using NFS over a NAT with two e1000e adapters and with eth1 being
the LAN interface and eth0 the WAN interface. The kernel is Ubuntu's
16.10 kernel: 4.8.0-46-generic. The device doing NAT over NFS is just
mounting a remote folder and doing normal execution/file accesses. It's
enough to untar a file from this device onto a NFS share to expose the
problem.

The transmit hangs look like the ones below, doing a rmmod/insmod does
not help eliminated the problem, nor does a power cycle. Stopping the
NFS over NAT definitively does let the adapter recover.

Is this NFS over TCP or UDP ?

This is NFS over TCP mounted with the following:

type nfs
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=2049,timeo=70,retrans=3,sec=sys,local_lock=none,addr=X.X.X.X)


Thanks Eric!

Please, try disable TCP segmentation offload: ethtool -K  tso off.

I am not able to reproduce the hangs with TSO turned off. Is there a
specific patch you would want me to try?


Please, work with TSO turned off so. There is no patch for this specific 
problem.




Re: [Intel-wired-lan] NFS over NAT causes e1000e transmit hangs

2017-04-19 Thread Neftin, Sasha

On 4/18/2017 22:05, Florian Fainelli wrote:

On 04/18/2017 12:03 PM, Eric Dumazet wrote:

On Tue, 2017-04-18 at 11:18 -0700, Florian Fainelli wrote:

Hi,

I am using NFS over a NAT with two e1000e adapters and with eth1 being
the LAN interface and eth0 the WAN interface. The kernel is Ubuntu's
16.10 kernel: 4.8.0-46-generic. The device doing NAT over NFS is just
mounting a remote folder and doing normal execution/file accesses. It's
enough to untar a file from this device onto a NFS share to expose the
problem.

The transmit hangs look like the ones below, doing a rmmod/insmod does
not help eliminated the problem, nor does a power cycle. Stopping the
NFS over NAT definitively does let the adapter recover.

Is this NFS over TCP or UDP ?

This is NFS over TCP mounted with the following:

type nfs
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=2049,timeo=70,retrans=3,sec=sys,local_lock=none,addr=X.X.X.X)

Thanks Eric!


Please, try disable TCP segmentation offload: ethtool -K  tso off.



Re: [Intel-wired-lan] [PATCH] e1000e: fix timing for 82579 Gigabit Ethernet controller

2017-02-27 Thread Neftin, Sasha

On 2/26/2017 11:08, Neftin, Sasha wrote:

On 2/19/2017 14:55, Neftin, Sasha wrote:

On 2/16/2017 20:42, Bernd Faust wrote:

After an upgrade to Linux kernel v4.x the hardware timestamps of the
82579 Gigabit Ethernet Controller are different than expected.
The values that are being read are almost four times as big as before
the kernel upgrade.

The difference is that after the upgrade the driver sets the clock
frequency to 25MHz, where before the upgrade it was set to 96MHz. Intel
confirmed that the correct frequency for this network adapter is 96MHz.

Signed-off-by: Bernd Faust 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c

index 7017281..8b7113d 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3511,6 +3511,12 @@ s32 e1000e_get_base_timinca(struct 
e1000_adapter *adapter, u32 *timinca)


  switch (hw->mac.type) {
  case e1000_pch2lan:
+/* Stable 96MHz frequency */
+incperiod = INCPERIOD_96MHz;
+incvalue = INCVALUE_96MHz;
+shift = INCVALUE_SHIFT_96MHz;
+adapter->cc.shift = shift + INCPERIOD_SHIFT_96MHz;
+break;
  case e1000_pch_lpt:
  if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
  /* Stable 96MHz frequency */
--
2.7.4
___
Intel-wired-lan mailing list
intel-wired-...@lists.osuosl.org
http://lists.osuosl.org/mailman/listinfo/intel-wired-lan


Hello,

e1000_pch2lan mac type corresponds to 82579LM and 82579V network 
adapters. System clock frequency indication (SYSCFI) for these 
devices supports both 25MHz and 96MHz frequency. By default 
TSYNCRXCTL.SYSCFI is set to 1 and that means 96MHz frequency is picked.


It is better to keep the current implementation as it covers all 
options.


Thanks,

Sasha


Hello,

During last couple of weeks I saw few  complaints from community on 
same timing problem with 82579. I will double check clock definition 
with HW architecture.


Sasha

I've double checked - 82579 support 96MHz frequency only. So, let's 
accept this suggestion to upstream.


Ack.



Re: [Intel-wired-lan] [PATCH] e1000e: fix timing for 82579 Gigabit Ethernet controller

2017-02-27 Thread Neftin, Sasha

On 2/26/2017 11:08, Neftin, Sasha wrote:

On 2/19/2017 14:55, Neftin, Sasha wrote:

On 2/16/2017 20:42, Bernd Faust wrote:

After an upgrade to Linux kernel v4.x the hardware timestamps of the
82579 Gigabit Ethernet Controller are different than expected.
The values that are being read are almost four times as big as before
the kernel upgrade.

The difference is that after the upgrade the driver sets the clock
frequency to 25MHz, where before the upgrade it was set to 96MHz. Intel
confirmed that the correct frequency for this network adapter is 96MHz.

Signed-off-by: Bernd Faust 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c

index 7017281..8b7113d 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3511,6 +3511,12 @@ s32 e1000e_get_base_timinca(struct 
e1000_adapter *adapter, u32 *timinca)


  switch (hw->mac.type) {
  case e1000_pch2lan:
+/* Stable 96MHz frequency */
+incperiod = INCPERIOD_96MHz;
+incvalue = INCVALUE_96MHz;
+shift = INCVALUE_SHIFT_96MHz;
+adapter->cc.shift = shift + INCPERIOD_SHIFT_96MHz;
+break;
  case e1000_pch_lpt:
  if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
  /* Stable 96MHz frequency */
--
2.7.4
___
Intel-wired-lan mailing list
intel-wired-...@lists.osuosl.org
http://lists.osuosl.org/mailman/listinfo/intel-wired-lan


Hello,

e1000_pch2lan mac type corresponds to 82579LM and 82579V network 
adapters. System clock frequency indication (SYSCFI) for these 
devices supports both 25MHz and 96MHz frequency. By default 
TSYNCRXCTL.SYSCFI is set to 1 and that means 96MHz frequency is picked.


It is better to keep the current implementation as it covers all 
options.


Thanks,

Sasha


Hello,

During last couple of weeks I saw few  complaints from community on 
same timing problem with 82579. I will double check clock definition 
with HW architecture.


Sasha

___
Intel-wired-lan mailing list
intel-wired-...@lists.osuosl.org
http://lists.osuosl.org/mailman/listinfo/intel-wired-lan


I've double checked - 82579 support 96MHz frequency only. So, let's 
accept this suggestion to upstream.


Ack.




Re: [Intel-wired-lan] [PATCH] e1000e: fix timing for 82579 Gigabit Ethernet controller

2017-02-26 Thread Neftin, Sasha

On 2/19/2017 14:55, Neftin, Sasha wrote:

On 2/16/2017 20:42, Bernd Faust wrote:

After an upgrade to Linux kernel v4.x the hardware timestamps of the
82579 Gigabit Ethernet Controller are different than expected.
The values that are being read are almost four times as big as before
the kernel upgrade.

The difference is that after the upgrade the driver sets the clock
frequency to 25MHz, where before the upgrade it was set to 96MHz. Intel
confirmed that the correct frequency for this network adapter is 96MHz.

Signed-off-by: Bernd Faust 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c

index 7017281..8b7113d 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3511,6 +3511,12 @@ s32 e1000e_get_base_timinca(struct 
e1000_adapter *adapter, u32 *timinca)


  switch (hw->mac.type) {
  case e1000_pch2lan:
+/* Stable 96MHz frequency */
+incperiod = INCPERIOD_96MHz;
+incvalue = INCVALUE_96MHz;
+shift = INCVALUE_SHIFT_96MHz;
+adapter->cc.shift = shift + INCPERIOD_SHIFT_96MHz;
+break;
  case e1000_pch_lpt:
  if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
  /* Stable 96MHz frequency */
--
2.7.4
___
Intel-wired-lan mailing list
intel-wired-...@lists.osuosl.org
http://lists.osuosl.org/mailman/listinfo/intel-wired-lan


Hello,

e1000_pch2lan mac type corresponds to 82579LM and 82579V network 
adapters. System clock frequency indication (SYSCFI) for these devices 
supports both 25MHz and 96MHz frequency. By default TSYNCRXCTL.SYSCFI 
is set to 1 and that means 96MHz frequency is picked.


It is better to keep the current implementation as it covers all options.

Thanks,

Sasha


Hello,

During last couple of weeks I saw few  complaints from community on same 
timing problem with 82579. I will double check clock definition with HW 
architecture.


Sasha



Re: [Intel-wired-lan] [PATCH] e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails

2017-02-21 Thread Neftin, Sasha

On 2/20/2017 15:02, Chris Wilson wrote:

Chris Wilson, Intel Open Source Technology Centre


Chris,

Please, check patch with fewer code lines. If short patch good and works 
well for you, please, submit it and we will check then on our side.





Re: [Intel-wired-lan] [PATCH] e1000e: fix timing for 82579 Gigabit Ethernet controller

2017-02-19 Thread Neftin, Sasha

On 2/16/2017 20:42, Bernd Faust wrote:

After an upgrade to Linux kernel v4.x the hardware timestamps of the
82579 Gigabit Ethernet Controller are different than expected.
The values that are being read are almost four times as big as before
the kernel upgrade.

The difference is that after the upgrade the driver sets the clock
frequency to 25MHz, where before the upgrade it was set to 96MHz. Intel
confirmed that the correct frequency for this network adapter is 96MHz.

Signed-off-by: Bernd Faust 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 7017281..8b7113d 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3511,6 +3511,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter 
*adapter, u32 *timinca)

switch (hw->mac.type) {
case e1000_pch2lan:
+   /* Stable 96MHz frequency */
+   incperiod = INCPERIOD_96MHz;
+   incvalue = INCVALUE_96MHz;
+   shift = INCVALUE_SHIFT_96MHz;
+   adapter->cc.shift = shift + INCPERIOD_SHIFT_96MHz;
+   break;
case e1000_pch_lpt:
if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
/* Stable 96MHz frequency */
--
2.7.4
___
Intel-wired-lan mailing list
intel-wired-...@lists.osuosl.org
http://lists.osuosl.org/mailman/listinfo/intel-wired-lan


Hello,

e1000_pch2lan mac type corresponds to 82579LM and 82579V network 
adapters. System clock frequency indication (SYSCFI) for these devices 
supports both 25MHz and 96MHz frequency. By default TSYNCRXCTL.SYSCFI is 
set to 1 and that means 96MHz frequency is picked.


It is better to keep the current implementation as it covers all options.

Thanks,

Sasha



Re: [Intel-wired-lan] [PATCH] net: intel: e1000e: use new api ethtool_{get|set}_link_ksettings

2017-02-02 Thread Neftin, Sasha

On 2/1/2017 00:01, Philippe Reynes wrote:

Hi Sasha,

On 1/31/17, Neftin, Sasha  wrote:

Philippe,

We will look into and try test this patch. I would like ask question. I
see that this thread has been started from implementation for e1000
code. Why do you decide shift implementation to e1000e?

I've just sent two patch for two drivers (e1000 and e1000e).



Sasha



Philippe


Hello Philippe,

Great. We are responsible for e1000e. We will check patch for e1000e.

Sasha



Re: [Intel-wired-lan] [PATCH] net: intel: e1000e: use new api ethtool_{get|set}_link_ksettings

2017-01-30 Thread Neftin, Sasha

On 1/26/2017 23:19, Philippe Reynes wrote:

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
---
  drivers/net/ethernet/intel/e1000e/ethtool.c |   91 ++
  1 files changed, 49 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c 
b/drivers/net/ethernet/intel/e1000e/ethtool.c
index 7aff68a..3768a5c 100644
--- a/drivers/net/ethernet/intel/e1000e/ethtool.c
+++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
@@ -117,15 +117,15 @@ struct e1000_stats {
  
  #define E1000_TEST_LEN ARRAY_SIZE(e1000_gstrings_test)
  
-static int e1000_get_settings(struct net_device *netdev,

- struct ethtool_cmd *ecmd)
+static int e1000_get_link_ksettings(struct net_device *netdev,
+   struct ethtool_link_ksettings *cmd)
  {
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
-   u32 speed;
+   u32 speed, supported, advertising;
  
  	if (hw->phy.media_type == e1000_media_type_copper) {

-   ecmd->supported = (SUPPORTED_10baseT_Half |
+   supported = (SUPPORTED_10baseT_Half |
   SUPPORTED_10baseT_Full |
   SUPPORTED_100baseT_Half |
   SUPPORTED_100baseT_Full |
@@ -133,39 +133,36 @@ static int e1000_get_settings(struct net_device *netdev,
   SUPPORTED_Autoneg |
   SUPPORTED_TP);
if (hw->phy.type == e1000_phy_ife)
-   ecmd->supported &= ~SUPPORTED_1000baseT_Full;
-   ecmd->advertising = ADVERTISED_TP;
+   supported &= ~SUPPORTED_1000baseT_Full;
+   advertising = ADVERTISED_TP;
  
  		if (hw->mac.autoneg == 1) {

-   ecmd->advertising |= ADVERTISED_Autoneg;
+   advertising |= ADVERTISED_Autoneg;
/* the e1000 autoneg seems to match ethtool nicely */
-   ecmd->advertising |= hw->phy.autoneg_advertised;
+   advertising |= hw->phy.autoneg_advertised;
}
  
-		ecmd->port = PORT_TP;

-   ecmd->phy_address = hw->phy.addr;
-   ecmd->transceiver = XCVR_INTERNAL;
-
+   cmd->base.port = PORT_TP;
+   cmd->base.phy_address = hw->phy.addr;
} else {
-   ecmd->supported   = (SUPPORTED_1000baseT_Full |
+   supported   = (SUPPORTED_1000baseT_Full |
 SUPPORTED_FIBRE |
 SUPPORTED_Autoneg);
  
-		ecmd->advertising = (ADVERTISED_1000baseT_Full |

+   advertising = (ADVERTISED_1000baseT_Full |
 ADVERTISED_FIBRE |
 ADVERTISED_Autoneg);
  
-		ecmd->port = PORT_FIBRE;

-   ecmd->transceiver = XCVR_EXTERNAL;
+   cmd->base.port = PORT_FIBRE;
}
  
  	speed = SPEED_UNKNOWN;

-   ecmd->duplex = DUPLEX_UNKNOWN;
+   cmd->base.duplex = DUPLEX_UNKNOWN;
  
  	if (netif_running(netdev)) {

if (netif_carrier_ok(netdev)) {
speed = adapter->link_speed;
-   ecmd->duplex = adapter->link_duplex - 1;
+   cmd->base.duplex = adapter->link_duplex - 1;
}
} else if (!pm_runtime_suspended(netdev->dev.parent)) {
u32 status = er32(STATUS);
@@ -179,30 +176,36 @@ static int e1000_get_settings(struct net_device *netdev,
speed = SPEED_10;
  
  			if (status & E1000_STATUS_FD)

-   ecmd->duplex = DUPLEX_FULL;
+   cmd->base.duplex = DUPLEX_FULL;
else
-   ecmd->duplex = DUPLEX_HALF;
+   cmd->base.duplex = DUPLEX_HALF;
}
}
  
-	ethtool_cmd_speed_set(ecmd, speed);

-   ecmd->autoneg = ((hw->phy.media_type == e1000_media_type_fiber) ||
+   cmd->base.speed = speed;
+   cmd->base.autoneg = ((hw->phy.media_type == e1000_media_type_fiber) ||
 hw->mac.autoneg) ? AUTONEG_ENABLE : AUTONEG_DISABLE;
  
  	/* MDI-X => 2; MDI =>1; Invalid =>0 */

if ((hw->phy.media_type == e1000_media_type_copper) &&
netif_carrier_ok(netdev))
-   ecmd->eth_tp_mdix = hw->phy.is_mdix ? ETH_TP_MDI_X : ETH_TP_MDI;
+   cmd->base.eth_tp_mdix = hw->phy.is_mdix ?
+   ETH_TP_MDI_X : ETH_TP_MDI;
else
-   ecmd->eth_tp_mdix = ETH_TP_MDI_INVALID;
+   cmd->base.eth_tp_mdix = ETH_TP_MDI_INVALID;
  
  	if (hw->

Re: [Intel-wired-lan] [PATCH v2] e1000e: free IRQ regardless of __E1000_DOWN

2016-12-03 Thread Neftin, Sasha
On 12/2/2016 7:02 PM, Baicar, Tyler wrote:
> Hello Sasha,
> 
> Were you able to reproduce this issue?
> 
> Do you have a patch fixing the close function inconsistencies that you
> mentioned which I could try out?
> 
> Thanks,
> Tyler
> 
> On 11/21/2016 1:40 PM, Baicar, Tyler wrote:
>> On 11/17/2016 6:31 AM, Neftin, Sasha wrote:
>>> On 11/13/2016 10:34 AM, Neftin, Sasha wrote:
>>>> On 11/11/2016 12:35 AM, Baicar, Tyler wrote:
>>>>> Hello Sasha,
>>>>>
>>>>> On 11/9/2016 11:19 PM, Neftin, Sasha wrote:
>>>>>> On 11/9/2016 11:41 PM, Tyler Baicar wrote:
>>>>>>> Move IRQ free code so that it will happen regardless of the
>>>>>>> __E1000_DOWN bit. Currently the e1000e driver only releases its IRQ
>>>>>>> if the __E1000_DOWN bit is cleared. This is not sufficient because
>>>>>>> it is possible for __E1000_DOWN to be set without releasing the IRQ.
>>>>>>> In such a situation, we will hit a kernel bug later in e1000_remove
>>>>>>> because the IRQ still has action since it was never freed. A
>>>>>>> secondary bus reset can cause this case to happen.
>>>>>>>
>>>>>>> Signed-off-by: Tyler Baicar 
>>>>>>> ---
>>>>>>>drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>>>>>>>1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>>> index 7017281..36cfcb0 100644
>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>>>> @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
>>>>>>>  if (!test_bit(__E1000_DOWN, &adapter->state)) {
>>>>>>>e1000e_down(adapter, true);
>>>>>>> -e1000_free_irq(adapter);
>>>>>>>  /* Link status message must follow this format */
>>>>>>>pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>>>>>>>}
>>>>>>>+e1000_free_irq(adapter);
>>>>>>> +
>>>>>>>napi_disable(&adapter->napi);
>>>>>>>  e1000e_free_tx_resources(adapter->tx_ring);
>>>>>>>
>>>>>> I would like not recommend insert this change. This change related
>>>>>> driver state machine, we afraid from lot of synchronization
>>>>>> problem and
>>>>>> issues.
>>>>>> We need keep e1000_free_irq in loop and check for 'test_bit' ready.
>>>>> What do you mean here? There is no loop. If __E1000_DOWN is set
>>>>> then we
>>>>> will never free the IRQ.
>>>>>
>>>>>> Another point, does before execute secondary bus reset your SW
>>>>>> back up
>>>>>> pcie configuration space as properly?
>>>>> After a secondary bus reset, the link needs to recover and go back
>>>>> to a
>>>>> working state after 1 second.
>>>>>
>>>>>  From the callstack, the issue is happening while removing the
>>>>> endpoint
>>>>> from the system, before applying the secondary bus reset.
>>>>>
>>>>> The order of events is
>>>>> 1. remove the drivers
>>>>> 2. cause a secondary bus reset
>>>>> 3. wait 1 second
>>>> Actually, this is too much, usually link up in less than 100ms.You can
>>>> check Data Link Layer indication.
>>>>> 4. recover the link
>>>>>
>>>>> callstack:
>>>>> free_msi_irqs+0x6c/0x1a8
>>>>> pci_disable_msi+0xb0/0x148
>>>>> e1000e_reset_interrupt_capability+0x60/0x78
>>>>> e1000_remove+0xc8/0x180
>>>>> pci_device_remove+0x48/0x118
>>>>> __device_release_driver+0x80/0x108
>>>>> device_release_driver+0x2c/0x40
>>>>> pci_stop_bus_device+0xa0/0xb0
>>>>> pci_stop_bus_device+0x3c/0xb0
>>>>> pci_stop_root_bus+0x54/0x80
>>>>> acpi_pci_root_remove+0x28/0x64
>>>>> acpi_bus_trim+0x6c/0xa4
>

Fwd:[Intel-wired-lan] [PATCH v2] e1000e: free IRQ regardless of __E1000_DOWN

2016-11-17 Thread Neftin, Sasha
From: Baicar, Tyler [mailto:tbai...@codeaurora.org] Sent: Tuesday,
November 15, 2016 11:50 PM
To: Neftin, Sasha ; Kirsher, Jeffrey T
; intel-wired-...@lists.osuosl.org;
netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
ok...@codeaurora.org; ti...@codeaurora.org
Subject: Re: [Intel-wired-lan] [PATCH v2] e1000e: free IRQ regardless of
__E1000_DOWN

On 11/13/2016 2:25 AM, Neftin, Sasha wrote:
> On 11/13/2016 10:34 AM, Neftin, Sasha wrote:
>> On 11/11/2016 12:35 AM, Baicar, Tyler wrote:
>>> Hello Sasha,
>>>
>>> On 11/9/2016 11:19 PM, Neftin, Sasha wrote:
>>>> On 11/9/2016 11:41 PM, Tyler Baicar wrote:
>>>>> Move IRQ free code so that it will happen regardless of the 
>>>>> __E1000_DOWN bit. Currently the e1000e driver only releases its 
>>>>> IRQ if the __E1000_DOWN bit is cleared. This is not sufficient 
>>>>> because it is possible for __E1000_DOWN to be set without releasing the 
>>>>> IRQ.
>>>>> In such a situation, we will hit a kernel bug later in 
>>>>> e1000_remove because the IRQ still has action since it was never 
>>>>> freed. A secondary bus reset can cause this case to happen.
>>>>>
>>>>> Signed-off-by: Tyler Baicar 
>>>>> ---
>>>>>drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>>>>>1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> index 7017281..36cfcb0 100644
>>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>>> @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
>>>>>  if (!test_bit(__E1000_DOWN, &adapter->state)) {
>>>>>e1000e_down(adapter, true);
>>>>> -e1000_free_irq(adapter);
>>>>>  /* Link status message must follow this format */
>>>>>pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>>>>>}
>>>>>+e1000_free_irq(adapter);
>>>>> +
>>>>>napi_disable(&adapter->napi);
>>>>>  e1000e_free_tx_resources(adapter->tx_ring);
>>>>>
>>>> I would like not recommend insert this change. This change related 
>>>> driver state machine, we afraid from lot of synchronization problem 
>>>> and issues.
>>>> We need keep e1000_free_irq in loop and check for 'test_bit' ready.
>>> What do you mean here? There is no loop. If __E1000_DOWN is set then 
>>> we will never free the IRQ.
>>>
>>>> Another point, does before execute secondary bus reset your SW back 
>>>> up pcie configuration space as properly?
>>> After a secondary bus reset, the link needs to recover and go back 
>>> to a working state after 1 second.
>>>
>>>  From the callstack, the issue is happening while removing the 
>>> endpoint from the system, before applying the secondary bus reset.
>>>
>>> The order of events is
>>> 1. remove the drivers
>>> 2. cause a secondary bus reset
>>> 3. wait 1 second
>> Actually, this is too much, usually link up in less than 100ms.You 
>> can check Data Link Layer indication.
>>> 4. recover the link
>>>
>>> callstack:
>>> free_msi_irqs+0x6c/0x1a8
>>> pci_disable_msi+0xb0/0x148
>>> e1000e_reset_interrupt_capability+0x60/0x78
>>> e1000_remove+0xc8/0x180
>>> pci_device_remove+0x48/0x118
>>> __device_release_driver+0x80/0x108
>>> device_release_driver+0x2c/0x40
>>> pci_stop_bus_device+0xa0/0xb0
>>> pci_stop_bus_device+0x3c/0xb0
>>> pci_stop_root_bus+0x54/0x80
>>> acpi_pci_root_remove+0x28/0x64
>>> acpi_bus_trim+0x6c/0xa4
>>> acpi_device_hotplug+0x19c/0x3f4
>>> acpi_hotplug_work_fn+0x28/0x3c
>>> process_one_work+0x150/0x460
>>> worker_thread+0x50/0x4b8
>>> kthread+0xd4/0xe8
>>> ret_from_fork+0x10/0x50
>>>
>>> Thanks,
>>> Tyler
>>>
>> Hello Tyler,
>> Okay, we need consult more about this suggestion.
>> May I ask what is setup you run? Is there NIC or on board LAN? I 
>> would like try reproduce this issue in our lab's too.
>> Also, is same issue observed with same scenario and others NIC's too?
>> Sasha
>>

Re: [Intel-wired-lan] [PATCH v2] e1000e: free IRQ regardless of __E1000_DOWN

2016-11-17 Thread Neftin, Sasha
On 11/13/2016 10:34 AM, Neftin, Sasha wrote:
> On 11/11/2016 12:35 AM, Baicar, Tyler wrote:
>> Hello Sasha,
>>
>> On 11/9/2016 11:19 PM, Neftin, Sasha wrote:
>>> On 11/9/2016 11:41 PM, Tyler Baicar wrote:
>>>> Move IRQ free code so that it will happen regardless of the
>>>> __E1000_DOWN bit. Currently the e1000e driver only releases its IRQ
>>>> if the __E1000_DOWN bit is cleared. This is not sufficient because
>>>> it is possible for __E1000_DOWN to be set without releasing the IRQ.
>>>> In such a situation, we will hit a kernel bug later in e1000_remove
>>>> because the IRQ still has action since it was never freed. A
>>>> secondary bus reset can cause this case to happen.
>>>>
>>>> Signed-off-by: Tyler Baicar 
>>>> ---
>>>>   drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> index 7017281..36cfcb0 100644
>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
>>>> if (!test_bit(__E1000_DOWN, &adapter->state)) {
>>>>   e1000e_down(adapter, true);
>>>> -e1000_free_irq(adapter);
>>>> /* Link status message must follow this format */
>>>>   pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>>>>   }   
>>>>   +e1000_free_irq(adapter);
>>>> +
>>>>   napi_disable(&adapter->napi);
>>>> e1000e_free_tx_resources(adapter->tx_ring);
>>>>
>>> I would like not recommend insert this change. This change related
>>> driver state machine, we afraid from lot of synchronization problem and
>>> issues.
>>> We need keep e1000_free_irq in loop and check for 'test_bit' ready.
>>
>> What do you mean here? There is no loop. If __E1000_DOWN is set then we
>> will never free the IRQ.
>>
>>> Another point, does before execute secondary bus reset your SW back up
>>> pcie configuration space as properly?
>>
>> After a secondary bus reset, the link needs to recover and go back to a
>> working state after 1 second.
>>
>> From the callstack, the issue is happening while removing the endpoint
>> from the system, before applying the secondary bus reset.
>>
>> The order of events is
>> 1. remove the drivers
>> 2. cause a secondary bus reset
>> 3. wait 1 second
> Actually, this is too much, usually link up in less than 100ms.You can
> check Data Link Layer indication.
>> 4. recover the link
>>
>> callstack:
>> free_msi_irqs+0x6c/0x1a8
>> pci_disable_msi+0xb0/0x148
>> e1000e_reset_interrupt_capability+0x60/0x78
>> e1000_remove+0xc8/0x180
>> pci_device_remove+0x48/0x118
>> __device_release_driver+0x80/0x108
>> device_release_driver+0x2c/0x40
>> pci_stop_bus_device+0xa0/0xb0
>> pci_stop_bus_device+0x3c/0xb0
>> pci_stop_root_bus+0x54/0x80
>> acpi_pci_root_remove+0x28/0x64
>> acpi_bus_trim+0x6c/0xa4
>> acpi_device_hotplug+0x19c/0x3f4
>> acpi_hotplug_work_fn+0x28/0x3c
>> process_one_work+0x150/0x460
>> worker_thread+0x50/0x4b8
>> kthread+0xd4/0xe8
>> ret_from_fork+0x10/0x50
>>
>> Thanks,
>> Tyler
>>
> Hello Tyler,
> Okay, we need consult more about this suggestion.
> May I ask what is setup you run? Is there NIC or on board LAN? I would
> like try reproduce this issue in our lab's too.
> Also, is same issue observed with same scenario and others NIC's too?
> Sasha
> ___
> Intel-wired-lan mailing list
> intel-wired-...@lists.osuosl.org
> http://lists.osuosl.org/mailman/listinfo/intel-wired-lan
> 
Hello Tyler,
I see some in consistent implementation of __*_close methods in our
drivers. Do you have any igb NIC to check if same problem persist there?
Thanks,
Sasha


Re: [Intel-wired-lan] [PATCH v2] e1000e: free IRQ regardless of __E1000_DOWN

2016-11-13 Thread Neftin, Sasha
On 11/13/2016 10:34 AM, Neftin, Sasha wrote:
> On 11/11/2016 12:35 AM, Baicar, Tyler wrote:
>> Hello Sasha,
>>
>> On 11/9/2016 11:19 PM, Neftin, Sasha wrote:
>>> On 11/9/2016 11:41 PM, Tyler Baicar wrote:
>>>> Move IRQ free code so that it will happen regardless of the
>>>> __E1000_DOWN bit. Currently the e1000e driver only releases its IRQ
>>>> if the __E1000_DOWN bit is cleared. This is not sufficient because
>>>> it is possible for __E1000_DOWN to be set without releasing the IRQ.
>>>> In such a situation, we will hit a kernel bug later in e1000_remove
>>>> because the IRQ still has action since it was never freed. A
>>>> secondary bus reset can cause this case to happen.
>>>>
>>>> Signed-off-by: Tyler Baicar 
>>>> ---
>>>>   drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> index 7017281..36cfcb0 100644
>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
>>>> if (!test_bit(__E1000_DOWN, &adapter->state)) {
>>>>   e1000e_down(adapter, true);
>>>> -e1000_free_irq(adapter);
>>>> /* Link status message must follow this format */
>>>>   pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>>>>   }   
>>>>   +e1000_free_irq(adapter);
>>>> +
>>>>   napi_disable(&adapter->napi);
>>>> e1000e_free_tx_resources(adapter->tx_ring);
>>>>
>>> I would like not recommend insert this change. This change related
>>> driver state machine, we afraid from lot of synchronization problem and
>>> issues.
>>> We need keep e1000_free_irq in loop and check for 'test_bit' ready.
>>
>> What do you mean here? There is no loop. If __E1000_DOWN is set then we
>> will never free the IRQ.
>>
>>> Another point, does before execute secondary bus reset your SW back up
>>> pcie configuration space as properly?
>>
>> After a secondary bus reset, the link needs to recover and go back to a
>> working state after 1 second.
>>
>> From the callstack, the issue is happening while removing the endpoint
>> from the system, before applying the secondary bus reset.
>>
>> The order of events is
>> 1. remove the drivers
>> 2. cause a secondary bus reset
>> 3. wait 1 second
> Actually, this is too much, usually link up in less than 100ms.You can
> check Data Link Layer indication.
>> 4. recover the link
>>
>> callstack:
>> free_msi_irqs+0x6c/0x1a8
>> pci_disable_msi+0xb0/0x148
>> e1000e_reset_interrupt_capability+0x60/0x78
>> e1000_remove+0xc8/0x180
>> pci_device_remove+0x48/0x118
>> __device_release_driver+0x80/0x108
>> device_release_driver+0x2c/0x40
>> pci_stop_bus_device+0xa0/0xb0
>> pci_stop_bus_device+0x3c/0xb0
>> pci_stop_root_bus+0x54/0x80
>> acpi_pci_root_remove+0x28/0x64
>> acpi_bus_trim+0x6c/0xa4
>> acpi_device_hotplug+0x19c/0x3f4
>> acpi_hotplug_work_fn+0x28/0x3c
>> process_one_work+0x150/0x460
>> worker_thread+0x50/0x4b8
>> kthread+0xd4/0xe8
>> ret_from_fork+0x10/0x50
>>
>> Thanks,
>> Tyler
>>
> Hello Tyler,
> Okay, we need consult more about this suggestion.
> May I ask what is setup you run? Is there NIC or on board LAN? I would
> like try reproduce this issue in our lab's too.
> Also, is same issue observed with same scenario and others NIC's too?
> Sasha
> ___
> Intel-wired-lan mailing list
> intel-wired-...@lists.osuosl.org
> http://lists.osuosl.org/mailman/listinfo/intel-wired-lan
> 
Please, specify what is device used.


Re: [Intel-wired-lan] [PATCH v2] e1000e: free IRQ regardless of __E1000_DOWN

2016-11-13 Thread Neftin, Sasha
On 11/11/2016 12:35 AM, Baicar, Tyler wrote:
> Hello Sasha,
> 
> On 11/9/2016 11:19 PM, Neftin, Sasha wrote:
>> On 11/9/2016 11:41 PM, Tyler Baicar wrote:
>>> Move IRQ free code so that it will happen regardless of the
>>> __E1000_DOWN bit. Currently the e1000e driver only releases its IRQ
>>> if the __E1000_DOWN bit is cleared. This is not sufficient because
>>> it is possible for __E1000_DOWN to be set without releasing the IRQ.
>>> In such a situation, we will hit a kernel bug later in e1000_remove
>>> because the IRQ still has action since it was never freed. A
>>> secondary bus reset can cause this case to happen.
>>>
>>> Signed-off-by: Tyler Baicar 
>>> ---
>>>   drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>>> index 7017281..36cfcb0 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>> @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
>>> if (!test_bit(__E1000_DOWN, &adapter->state)) {
>>>   e1000e_down(adapter, true);
>>> -e1000_free_irq(adapter);
>>> /* Link status message must follow this format */
>>>   pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>>>   }   
>>>   +e1000_free_irq(adapter);
>>> +
>>>   napi_disable(&adapter->napi);
>>> e1000e_free_tx_resources(adapter->tx_ring);
>>>
>> I would like not recommend insert this change. This change related
>> driver state machine, we afraid from lot of synchronization problem and
>> issues.
>> We need keep e1000_free_irq in loop and check for 'test_bit' ready.
> 
> What do you mean here? There is no loop. If __E1000_DOWN is set then we
> will never free the IRQ.
> 
>> Another point, does before execute secondary bus reset your SW back up
>> pcie configuration space as properly?
> 
> After a secondary bus reset, the link needs to recover and go back to a
> working state after 1 second.
> 
> From the callstack, the issue is happening while removing the endpoint
> from the system, before applying the secondary bus reset.
> 
> The order of events is
> 1. remove the drivers
> 2. cause a secondary bus reset
> 3. wait 1 second
Actually, this is too much, usually link up in less than 100ms.You can
check Data Link Layer indication.
> 4. recover the link
> 
> callstack:
> free_msi_irqs+0x6c/0x1a8
> pci_disable_msi+0xb0/0x148
> e1000e_reset_interrupt_capability+0x60/0x78
> e1000_remove+0xc8/0x180
> pci_device_remove+0x48/0x118
> __device_release_driver+0x80/0x108
> device_release_driver+0x2c/0x40
> pci_stop_bus_device+0xa0/0xb0
> pci_stop_bus_device+0x3c/0xb0
> pci_stop_root_bus+0x54/0x80
> acpi_pci_root_remove+0x28/0x64
> acpi_bus_trim+0x6c/0xa4
> acpi_device_hotplug+0x19c/0x3f4
> acpi_hotplug_work_fn+0x28/0x3c
> process_one_work+0x150/0x460
> worker_thread+0x50/0x4b8
> kthread+0xd4/0xe8
> ret_from_fork+0x10/0x50
> 
> Thanks,
> Tyler
> 
Hello Tyler,
Okay, we need consult more about this suggestion.
May I ask what is setup you run? Is there NIC or on board LAN? I would
like try reproduce this issue in our lab's too.
Also, is same issue observed with same scenario and others NIC's too?
Sasha


Re: [Intel-wired-lan] [PATCH v2] e1000e: free IRQ regardless of __E1000_DOWN

2016-11-09 Thread Neftin, Sasha
On 11/9/2016 11:41 PM, Tyler Baicar wrote:
> Move IRQ free code so that it will happen regardless of the
> __E1000_DOWN bit. Currently the e1000e driver only releases its IRQ
> if the __E1000_DOWN bit is cleared. This is not sufficient because
> it is possible for __E1000_DOWN to be set without releasing the IRQ.
> In such a situation, we will hit a kernel bug later in e1000_remove
> because the IRQ still has action since it was never freed. A
> secondary bus reset can cause this case to happen.
> 
> Signed-off-by: Tyler Baicar 
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 7017281..36cfcb0 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
>  
>   if (!test_bit(__E1000_DOWN, &adapter->state)) {
>   e1000e_down(adapter, true);
> - e1000_free_irq(adapter);
>  
>   /* Link status message must follow this format */
>   pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>   }   
>  
> + e1000_free_irq(adapter);
> +
>   napi_disable(&adapter->napi);
>  
>   e1000e_free_tx_resources(adapter->tx_ring);
> 
I would like not recommend insert this change. This change related
driver state machine, we afraid from lot of synchronization problem and
issues.
We need keep e1000_free_irq in loop and check for 'test_bit' ready.
Another point, does before execute secondary bus reset your SW back up
pcie configuration space as properly?

Sasha



RE: [net-next 5/5] PCI: disable FLR for 82579 device

2016-09-28 Thread Neftin, Sasha

Since I worked with Sasha on this I will provide a bit of information from what 
I understand of this bug as well.

On Tue, Sep 27, 2016 at 12:13 PM, Alex Williamson  
wrote:
> On Tue, 27 Sep 2016 13:17:02 -0500
> Bjorn Helgaas  wrote:
>
>> On Sun, Sep 25, 2016 at 10:02:43AM +0300, Neftin, Sasha wrote:
>> > On 9/24/2016 12:05 AM, Jeff Kirsher wrote:
>> > >On Fri, 2016-09-23 at 09:01 -0500, Bjorn Helgaas wrote:
>> > >>On Thu, Sep 22, 2016 at 11:39:01PM -0700, Jeff Kirsher wrote:
>> > >>>From: Sasha Neftin 
>> > >>>
>> > >>>82579 has a problem reattaching itself after the device is detached.
>> > >>>The bug was reported by Redhat. The suggested fix is to disable 
>> > >>>FLR capability in PCIe configuration space.
>> > >>>
>> > >>>Reproduction:
>> > >>>Attach the device to a VM, then detach and try to attach again.
>> > >>>
>> > >>>Fix:
>> > >>>Disable FLR capability to prevent the 82579 from hanging.
>> > >>Is there a bugzilla or other reference URL to include here?  
>> > >>Should this be marked for stable?
>> > >So the author is in Israel, meaning it is their weekend now.  I do 
>> > >not believe Sasha monitors email over the weekend, so a response 
>> > >to your questions won't happen for a few days.
>> > >
>> > >I tried searching my archives for more information, but had no 
>> > >luck finding any additional information.
>> > >

I agree that we do probably need to update the patch description since it isn't 
exactly clear what this is fixing or what was actually broken.

>> > >>>Signed-off-by: Sasha Neftin 
>> > >>>Tested-by: Aaron Brown 
>> > >>>Signed-off-by: Jeff Kirsher 
>> > >>>---
>> > >>>  drivers/pci/quirks.c | 21 +
>> > >>>  1 file changed, 21 insertions(+)
>> > >>>
>> > >>>diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 
>> > >>>44e0ff3..59fba6e 100644
>> > >>>--- a/drivers/pci/quirks.c
>> > >>>+++ b/drivers/pci/quirks.c
>> > >>>@@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct 
>> > >>>pci_dev *pdev)
>> > >>> }
>> > >>>  }
>> > >>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, 
>> > >>>quirk_intel_qat_vf_cap);
>> > >>>+/*
>> > >>>+ * Workaround FLR issues for 82579
>> > >>>+ * This code disables the FLR (Function Level Reset) via PCIe, 
>> > >>>+in
>> > >>>order
>> > >>>+ * to workaround a bug found while using device passthrough, 
>> > >>>+ where the
>> > >>>+ * interface would become non-responsive.
>> > >>>+ * NOTE: the FLR bit is Read/Write Once (RWO) in config space, 
>> > >>>+ so if
>> > >>>+ * the BIOS or kernel writes this register * then this 
>> > >>>+ workaround will
>> > >>>+ * not work.
>> > >>This doesn't sound like a root cause.  Is the issue a hardware 
>> > >>erratum?  Linux PCI core bug?  VFIO bug?  Device firmware bug?
>> > >>
>> > >>The changelog suggests that the problem only affects passthrough, 
>> > >>which suggests some sort of kernel bug related to how passthrough 
>> > >>is implemented.
>>
>> If this bug affects all scenarios, not just passthrough, the 
>> changelog should not mention passthrough.
>>
>> > >>>+ */
>> > >>>+static void quirk_intel_flr_cap_dis(struct pci_dev *dev) {
>> > >>>+int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
>> > >>>+if (pos) {
>> > >>>+u8 cap;
>> > >>>+pci_read_config_byte(dev, pos + PCI_AF_CAP, &cap);
>> > >>>+cap = cap & (~PCI_AF_CAP_FLR);
>> > >>>+pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
>> > >>>+}
>> > >>>+}
>> > >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
>> > >>>quirk_intel_flr_cap_dis);
>> > >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
>> > >>>quirk_intel_flr_cap_dis);
>&

Re: [net-next 5/5] PCI: disable FLR for 82579 device

2016-09-25 Thread Neftin, Sasha

On 9/24/2016 12:05 AM, Jeff Kirsher wrote:

On Fri, 2016-09-23 at 09:01 -0500, Bjorn Helgaas wrote:

On Thu, Sep 22, 2016 at 11:39:01PM -0700, Jeff Kirsher wrote:

From: Sasha Neftin 

82579 has a problem reattaching itself after the device is detached.
The bug was reported by Redhat. The suggested fix is to disable
FLR capability in PCIe configuration space.

Reproduction:
Attach the device to a VM, then detach and try to attach again.

Fix:
Disable FLR capability to prevent the 82579 from hanging.

Is there a bugzilla or other reference URL to include here?  Should
this be marked for stable?

So the author is in Israel, meaning it is their weekend now.  I do not
believe Sasha monitors email over the weekend, so a response to your
questions won't happen for a few days.

I tried searching my archives for more information, but had no luck finding
any additional information.


Signed-off-by: Sasha Neftin 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
  drivers/pci/quirks.c | 21 +
  1 file changed, 21 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44e0ff3..59fba6e 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct
pci_dev *pdev)
}
  }
  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
quirk_intel_qat_vf_cap);
+/*
+ * Workaround FLR issues for 82579
+ * This code disables the FLR (Function Level Reset) via PCIe, in
order
+ * to workaround a bug found while using device passthrough, where the
+ * interface would become non-responsive.
+ * NOTE: the FLR bit is Read/Write Once (RWO) in config space, so if
+ * the BIOS or kernel writes this register * then this workaround will
+ * not work.

This doesn't sound like a root cause.  Is the issue a hardware
erratum?  Linux PCI core bug?  VFIO bug?  Device firmware bug?

The changelog suggests that the problem only affects passthrough,
which suggests some sort of kernel bug related to how passthrough is
implemented.


+ */
+static void quirk_intel_flr_cap_dis(struct pci_dev *dev)
+{
+   int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
+   if (pos) {
+   u8 cap;
+   pci_read_config_byte(dev, pos + PCI_AF_CAP, &cap);
+   cap = cap & (~PCI_AF_CAP_FLR);
+   pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
+   }
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
quirk_intel_flr_cap_dis);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
quirk_intel_flr_cap_dis);
--
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hello,

Original bugzilla thread could be found here: 
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=966840


This is our HW bug, exist only in 82579 devices. More new devices have 
no such problem. We have found root cause and suggested this solution. 
This solution should work for a 95% of cases, so I do not think that 
this is fragile. For another cases possible solution is get up working 
system and manually disable FLR, before VM start use our adapter.


Thanks,

Sasha