Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs

2018-07-30 Thread Alexander Duyck
On Mon, Jul 30, 2018 at 12:20 PM, Sinan Kaya  wrote:
> +netdev,
>
> On 7/30/2018 9:45 AM, Alexander Duyck wrote:
>>
>> I haven't had a chance to work on this much myself. My understanding
>> is that igb has had the barriers updated, but I don't think any of the
>> other drivers have been worked over yet.
>
>
> Unfortunately, I have recently changed jobs and I no longer have the
> hardware to test my changes. I thought that you wanted to handle this
> yourself.
>
> I haven't seen any follow ups. I wanted to double check.

As I said so far igb has been the only one updated, and that was done
by a third party:
73017f4e051c8 igb: Use dma_wmb() instead of wmb() before doorbell writes

> I worked with several architecture leads on 4.17. All architectures
> support the updated barrier semantics now. It is time to clean up the
> network drivers.

Thanks for that update. As I said for now igb has the barriers
updated. The idea being that igb is the test vehicle for this so if we
go a kernel version or so without triggering any issues then we can
follow up with the other drivers.

The other thing we have to keep in mind is that unlike many other NICs
we have to also deal with emulations of our devices (e1000 and e1000e)
that may rely on certain barriers being used to enforce things like
SMP synchronization between CPUs, so we have to be careful as we roll
this out.

Thanks.

- Alex


Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs

2018-07-30 Thread Sinan Kaya

+netdev,

On 7/30/2018 9:45 AM, Alexander Duyck wrote:

I haven't had a chance to work on this much myself. My understanding
is that igb has had the barriers updated, but I don't think any of the
other drivers have been worked over yet.


Unfortunately, I have recently changed jobs and I no longer have the
hardware to test my changes. I thought that you wanted to handle this
yourself.

I haven't seen any follow ups. I wanted to double check.

I worked with several architecture leads on 4.17. All architectures
support the updated barrier semantics now. It is time to clean up the
network drivers.

92d7223a7423 alpha: io: reorder barriers to guarantee writeX() and 
iowriteX() ordering #2

a1cc7034e33d MIPS: io: Add barrier after register read in readX()
f6b7aeee8f16 MIPS: io: Prevent compiler reordering writeX()
a71e7c44ffb7 io: change writeX_relaxed() to remove barriers
8875c5543761 io: change readX_relaxed() to remove barriers
cd0e00c10672 alpha: io: reorder barriers to guarantee writeX() and 
iowriteX() ordering

87fe2d543f81 io: change inX() to have their own IO barrier overrides
a7851aa54c0c io: change outX() to have their own IO barrier overrides
755bd04aaf4b io: define stronger ordering for the default writeX() 
implementation
032d59e1cde9 io: define stronger ordering for the default readX() 
implementation
64e2c6738b4d io: define several IO & PIO barrier types for the 
asm-generic version







Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs

2018-04-03 Thread Jeff Kirsher
On Tue, 2018-04-03 at 13:50 -0400, Sinan Kaya wrote:
> > > What do you think about this version? Did I miss any SMP
> > > barriers?
> > 
> > I would say we should probably just drop the whole set and start
> > over.
> > If we don't need the wmb() we are going to need to go through and
> > clean up all of these paths and consider the barriers when updating
> > the layout of the code.
> > 
> > For example I have been thinking about it and in the case of x86 we
> > are probably better off not bothering with the wmb() and
> > writel_relaxed() and instead switch over to the smp_wmb() and
> > writel()
> > since in the case of a strongly ordered system like x86 or sparc
> > this
> > ends up translating out to a couple of compile barriers.
> > 
> > I will also need time to reevaluate the Rx paths since dropping the
> > wmb() may have other effects which I need to verify.
> 
> Sounds good, I'll let you work on it.
> 
> @Jeff Kirsher: can you drop this version from your development branch
> until
> Alex posts the next version?

Already on it, I will work with Alex on any possible future versions of
these patches.

signature.asc
Description: This is a digitally signed message part


Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs

2018-04-03 Thread Sinan Kaya
On 4/3/2018 1:47 PM, Alexander Duyck wrote:
> On Mon, Apr 2, 2018 at 7:59 PM, Sinan Kaya  wrote:
>> Alex,
>>
>> On 4/2/2018 3:06 PM, Sinan Kaya wrote:
>>> Code includes wmb() followed by writel() in multiple places. writel()
>>> already has a barrier on some architectures like arm64.
>>>
>>> This ends up CPU observing two barriers back to back before executing the
>>> register write.
>>>
>>> Since code already has an explicit barrier call, changing writel() to
>>> writel_relaxed().
>>>
>>> I did a regex search for wmb() followed by writel() in each drivers
>>> directory. I scrubbed the ones I care about in this series.
>>>
>>> I considered "ease of change", "popular usage" and "performance critical
>>> path" as the determining criteria for my filtering.
>>>
>>> We used relaxed API heavily on ARM for a long time but
>>> it did not exist on other architectures. For this reason, relaxed
>>> architectures have been paying double penalty in order to use the common
>>> drivers.
>>>
>>> Now that relaxed API is present on all architectures, we can go and scrub
>>> all drivers to see what needs to change and what can remain.
>>>
>>> We start with mostly used ones and hope to increase the coverage over time.
>>> It will take a while to cover all drivers.
>>>
>>> Feel free to apply patches individually.
>>>
>>> Change since 7:
>>>
>>> * API clarification with Linus and several architecture folks regarding
>>> writel()
>>>
>>> https://www.mail-archive.com/netdev@vger.kernel.org/msg225806.html
>>>
>>> * removed wmb() in front of writel() at several places.
>>> * remove old IA64 comments regarding ordering.
>>>
>>
>> What do you think about this version? Did I miss any SMP barriers?
> 
> I would say we should probably just drop the whole set and start over.
> If we don't need the wmb() we are going to need to go through and
> clean up all of these paths and consider the barriers when updating
> the layout of the code.
> 
> For example I have been thinking about it and in the case of x86 we
> are probably better off not bothering with the wmb() and
> writel_relaxed() and instead switch over to the smp_wmb() and writel()
> since in the case of a strongly ordered system like x86 or sparc this
> ends up translating out to a couple of compile barriers.
> 
> I will also need time to reevaluate the Rx paths since dropping the
> wmb() may have other effects which I need to verify.

Sounds good, I'll let you work on it.

@Jeff Kirsher: can you drop this version from your development branch until
Alex posts the next version?

> 
> Thanks.
> 
> - Alex
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.


Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs

2018-04-03 Thread Alexander Duyck
On Mon, Apr 2, 2018 at 7:59 PM, Sinan Kaya  wrote:
> Alex,
>
> On 4/2/2018 3:06 PM, Sinan Kaya wrote:
>> Code includes wmb() followed by writel() in multiple places. writel()
>> already has a barrier on some architectures like arm64.
>>
>> This ends up CPU observing two barriers back to back before executing the
>> register write.
>>
>> Since code already has an explicit barrier call, changing writel() to
>> writel_relaxed().
>>
>> I did a regex search for wmb() followed by writel() in each drivers
>> directory. I scrubbed the ones I care about in this series.
>>
>> I considered "ease of change", "popular usage" and "performance critical
>> path" as the determining criteria for my filtering.
>>
>> We used relaxed API heavily on ARM for a long time but
>> it did not exist on other architectures. For this reason, relaxed
>> architectures have been paying double penalty in order to use the common
>> drivers.
>>
>> Now that relaxed API is present on all architectures, we can go and scrub
>> all drivers to see what needs to change and what can remain.
>>
>> We start with mostly used ones and hope to increase the coverage over time.
>> It will take a while to cover all drivers.
>>
>> Feel free to apply patches individually.
>>
>> Change since 7:
>>
>> * API clarification with Linus and several architecture folks regarding
>> writel()
>>
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg225806.html
>>
>> * removed wmb() in front of writel() at several places.
>> * remove old IA64 comments regarding ordering.
>>
>
> What do you think about this version? Did I miss any SMP barriers?

I would say we should probably just drop the whole set and start over.
If we don't need the wmb() we are going to need to go through and
clean up all of these paths and consider the barriers when updating
the layout of the code.

For example I have been thinking about it and in the case of x86 we
are probably better off not bothering with the wmb() and
writel_relaxed() and instead switch over to the smp_wmb() and writel()
since in the case of a strongly ordered system like x86 or sparc this
ends up translating out to a couple of compile barriers.

I will also need time to reevaluate the Rx paths since dropping the
wmb() may have other effects which I need to verify.

Thanks.

- Alex


Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs

2018-04-02 Thread Sinan Kaya
Alex,

On 4/2/2018 3:06 PM, Sinan Kaya wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> I did a regex search for wmb() followed by writel() in each drivers
> directory. I scrubbed the ones I care about in this series.
> 
> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.
> 
> We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
> 
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.
> 
> We start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
> 
> Feel free to apply patches individually.
> 
> Change since 7:
> 
> * API clarification with Linus and several architecture folks regarding
> writel()
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg225806.html
> 
> * removed wmb() in front of writel() at several places.
> * remove old IA64 comments regarding ordering.
> 

What do you think about this version? Did I miss any SMP barriers?


> Sinan Kaya (7):
>   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbe: eliminate duplicate barriers on weakly-ordered archs
>   igbvf: eliminate duplicate barriers on weakly-ordered archs
>   igb: eliminate duplicate barriers on weakly-ordered archs
>   fm10k: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbevf: keep writel() closer to wmb()
>   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
> 
>  drivers/net/ethernet/intel/fm10k/fm10k_main.c | 15 ++---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 --
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  8 +--
>  drivers/net/ethernet/intel/igb/igb_main.c | 14 ++--
>  drivers/net/ethernet/intel/igbvf/netdev.c | 16 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 23 ++-
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  |  5 -
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 27 
> ---
>  8 files changed, 30 insertions(+), 100 deletions(-)
> 

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.


[PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs

2018-04-02 Thread Sinan Kaya
Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

I did a regex search for wmb() followed by writel() in each drivers
directory. I scrubbed the ones I care about in this series.

I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering.

We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers.

Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.

We start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.

Feel free to apply patches individually.

Change since 7:

* API clarification with Linus and several architecture folks regarding
writel()

https://www.mail-archive.com/netdev@vger.kernel.org/msg225806.html

* removed wmb() in front of writel() at several places.
* remove old IA64 comments regarding ordering.

Sinan Kaya (7):
  i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
  ixgbe: eliminate duplicate barriers on weakly-ordered archs
  igbvf: eliminate duplicate barriers on weakly-ordered archs
  igb: eliminate duplicate barriers on weakly-ordered archs
  fm10k: Eliminate duplicate barriers on weakly-ordered archs
  ixgbevf: keep writel() closer to wmb()
  ixgbevf: eliminate duplicate barriers on weakly-ordered archs

 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 15 ++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 --
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  8 +--
 drivers/net/ethernet/intel/igb/igb_main.c | 14 ++--
 drivers/net/ethernet/intel/igbvf/netdev.c | 16 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 23 ++-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  |  5 -
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 27 ---
 8 files changed, 30 insertions(+), 100 deletions(-)

-- 
2.7.4