Hi Ferruh & Harish,

On 10/20/2017 9:15 AM, Ferruh Yigit wrote:
On 10/19/2017 3:43 PM, Patil, Harish wrote:
-----Original Message-----
From: Harish Patil <harish.pa...@cavium.com>
Date: Tuesday, October 17, 2017 at 9:50 PM
To: Thomas Monjalon <tho...@monjalon.net>, Ferruh Yigit
<ferruh.yi...@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng....@intel.com>,
Jingjing Wu <jingjing...@intel.com>, "Thotton, Shijith"
<shijith.thot...@cavium.com>, Gregory Etelson <greg...@weka.io>, George
Prekas <george.pre...@epfl.ch>, "sta...@dpdk.org" <sta...@dpdk.org>
Subject: Re: [PATCH] igb_uio: revert open and release operations

-----Original Message-----
From: Thomas Monjalon <tho...@monjalon.net>
Date: Tuesday, October 17, 2017 at 1:33 PM
To: Ferruh Yigit <ferruh.yi...@intel.com>, Harish Patil
<harish.pa...@cavium.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng....@intel.com>,
Jingjing Wu <jingjing...@intel.com>, "Thotton, Shijith"
<shijith.thot...@cavium.com>, Gregory Etelson <greg...@weka.io>, George
Prekas <george.pre...@epfl.ch>, "sta...@dpdk.org" <sta...@dpdk.org>
Subject: Re: [PATCH] igb_uio: revert open and release operations

17/10/2017 22:14, Ferruh Yigit:
There were bug reports about terminated application may leave device in
undesired state:
http://dpdk.org/ml/archives/dev/2016-November/049745.html
http://dpdk.org/ml/archives/dev/2016-November/050932.html

And a proposal to fix:
http://dpdk.org/ml/archives/dev/2016-December/051844.html

Later another proposal triggered the discussion:
http://dpdk.org/ml/archives/dev/2017-May/066317.html

Finally a fix patch pushed into v17.08:
Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
device file")

Later a regression report sent related to the pushed patch:
http://dpdk.org/ml/archives/dev/2017-September/075236.html

And a fix for regression integrated into v17.11-rc1:
http://dpdk.org/ml/archives/dev/2017-October/079166.html
Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in
VM")
Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17")

Even after the fix qede PMD reported to be broken:
http://dpdk.org/ml/archives/dev/2017-October/079359.html

So this patch reverts original fix and related commits. The related
igb_uio code part turns back to v17.05 base.
[...]
---
It would be nice to solve this issue in LTS release, but being close to
the release and the error report without details makes it hard to work
more on this issue.
With this revert, we are back to the original issue.
We must really try the proposed solution.
Harish, please describe your issue and think how it could be fixed.
Jingjing made it work for i40e.
I know it is less effort to request a simple revert.
Please let's try to fix it once for all.
Hi Ferruh/Thomas,
I’m discussing it internally, so please hold on committing this patch till
I revert back to you.

Thanks.
[Harish]
1) With the introduction of:
Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
device file”)
    We saw failures with qede PF & SR-IOV VF initialization in PCI passthru
mode.

PF PCI passthru mode initialization failure was resolved by:
“Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in
VM")
Thank you for the update.

SR-IOV VF PCI passthru mode initialization issue is that PCI FLR and
related device cleanup is not completed by the time VF driver starts
loading. It results in the mbox command failure sent over the HW channel
between VF and PF.
This seems same reason why i40e added a check and wait loop.

Even though pci_reset_function() waits for the stipulated amount of time
per standards, VF FLR takes longer than that and pci_reset_function() &
igb_uio_open() call returns before FLR completes and VF PMD driver tries
to load before FLR completes leading to VF PMD initialization failure.


We can work around this problem by adding driver delay/retry logic since
there is no deterministic way of detecting FLR completion. But this is
going to increase the driver load time.

2) With the above patch ("igb_uio: issue FLR during open and release of
device file), FLR is going to be issued unconditionally on all devices
during igb_uio_open. We think it’s an over kill. FLR is required only for
previous abnormal app termination. We already handle the abnormal app
termination by doing necessary cleanup in the driver during load. This
cleanup is more efficient as it is done only when required. So we feel
that the drivers/devices needing such cleanup (the two cases listed
below)  should do it conditionally when required rather than
igb_uio_open() unconditionally performing FLR all the time.

e.g.,
    - cdb166963cae ("net/liquidio: add API for VF FLR”)
Both 1) and 2) related to the pci_reset during open().
But the main functionality we are looking for is the pci_reset in release().
So we can remove reset during open() [1].

Will disabling pci_reset in open() [2] solve your problems?

Jingjing, Jianfeng,
What do you think?

If [2] can work for all drivers, I agree with this option.

Thanks,
Jianfeng



[1]
Perhaps will need to revert or partially revert liquidio patch after this.

[2]
disable line "pci_reset_function(dev);" in the igbuio_pci_open()
http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/igb_uio/igb_uio.c#n339


Reply via email to