Re: [PATCH] net: bonding: Fix transmit load balancing in balance-alb mode if specified by sysfs

2017-09-08 Thread Kosuke Tatsukawa
Hi,

> On 08/09/17 13:10, Nikolay Aleksandrov wrote:
>> On 08/09/17 05:06, Kosuke Tatsukawa wrote:
>>> Hi,
>>>
>>>> On  7.09.2017 01:47, Kosuke Tatsukawa wrote:
>>>>> Commit cbf5ecb30560 ("net: bonding: Fix transmit load balancing in
>>>>> balance-alb mode") tried to fix transmit dynamic load balancing in
>>>>> balance-alb mode, which wasn't working after commit 8b426dc54cf4
>>>>> ("bonding: remove hardcoded value").
>>>>>
>>>>> It turned out that my previous patch only fixed the case when
>>>>> balance-alb was specified as bonding module parameter, and not when
>>>>> balance-alb mode was set using /sys/class/net/*/bonding/mode (the most
>>>>> common usage).  In the latter case, tlb_dynamic_lb was set up according
>>>>> to the default mode of the bonding interface, which happens to be
>>>>> balance-rr.
>>>>>
>>>>> This additional patch addresses this issue by setting up tlb_dynamic_lb
>>>>> to 1 if "mode" is set to balance-alb through the sysfs interface.
>>>>>
>>>>> I didn't add code to change tlb_balance_lb back to the default value for
>>>>> other modes, because "mode" is usually set up only once during
>>>>> initialization, and it's not worthwhile to change the static variable
>>>>> bonding_defaults in bond_main.c to a global variable just for this
>>>>> purpose.
>>>>>
>>>>> Commit 8b426dc54cf4 also changes the value of tlb_dynamic_lb for
>>>>> balance-tlb mode if it is set up using the sysfs interface.  I didn't
>>>>> change that behavior, because the value of tlb_balance_lb can be changed
>>>>> using the sysfs interface for balance-tlb, and I didn't like changing
>>>>> the default value back and forth for balance-tlb.
>>>>>
>>>>> As for balance-alb, /sys/class/net/*/bonding/tlb_balance_lb cannot be
>>>>> written to.  However, I think balance-alb with tlb_dynamic_lb set to 0
>>>>> is not an intended usage, so there is little use making it writable at
>>>>> this moment.
>>>>>
>>>>> Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value")
>>>>> Reported-by: Reinis Rozitis <r...@roze.lv>
>>>>> Signed-off-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
>>>>> Cc: sta...@vger.kernel.org  # v4.12+
>>>>> ---
>>>>>  drivers/net/bonding/bond_options.c |3 +++
>>>>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>>>>
>>>>
>>>> I don't believe this to be the right solution, hardcoding it like this
>>>> changes user-visible behaviour. The issue is that if someone configured
>>>> it to be 0 in tlb mode, suddenly it will become 1 and will silently
>>>> override their config if they switch the mode to alb. Also it robs users
>>>> from their choice.
>>>>
>>>> If you think this should be settable in ALB mode, then IMO you should
>>>> edit tlb_dynamic_lb option's unsuppmodes and allow it to be set in ALB.
>>>> That would also be consistent with how it's handled in TLB mode.
>>>
>>> No, I don't think tlb_dynamic_lb should be settable in balance-alb at
>>> this point.  All the current commits regarding tlb_dynamic_lb are for
>>> balance-tlb mode, so I don't think balance-alb with tlb_dynamic_lb set
>>> to 0 is an intended usage.
>>>
>>>
>>>> Going back and looking at your previous fix I'd argue that it is also
>>>> wrong, you should've removed the mode check altogether to return the
>>>> original behaviour where the dynamic_lb is set to 1 (enabled) by
>>>> default and then ALB mode would've had it, of course that would've left
>>>> the case of setting it to 0 in TLB mode and switching to ALB, but that
>>>> is a different issue.
>>>
>>> Maybe balance-alb shouldn't be dependent on tlb_dynamic_lb.
>>> tlb_dynamic_lb is referenced in the following functions.
>>>
>>>  + bond_do_alb_xmit()  -- Used by both balance-tlb and balance-alb
>>>  + bond_tlb_xmit()  -- Only used by balance-tlb
>>>  + bond_open()  -- Used by both balance-tlb and balance-alb
>>>  + bond_check_params()  -- Used during module initialization
>>>  + bond_fill_info()  -- Used to get/set value
>>>  + bond_option_tlb_dynamic

Re: [PATCH] net: bonding: Fix transmit load balancing in balance-alb mode if specified by sysfs

2017-09-07 Thread Kosuke Tatsukawa
Hi,

> On  7.09.2017 01:47, Kosuke Tatsukawa wrote:
>> Commit cbf5ecb30560 ("net: bonding: Fix transmit load balancing in
>> balance-alb mode") tried to fix transmit dynamic load balancing in
>> balance-alb mode, which wasn't working after commit 8b426dc54cf4
>> ("bonding: remove hardcoded value").
>> 
>> It turned out that my previous patch only fixed the case when
>> balance-alb was specified as bonding module parameter, and not when
>> balance-alb mode was set using /sys/class/net/*/bonding/mode (the most
>> common usage).  In the latter case, tlb_dynamic_lb was set up according
>> to the default mode of the bonding interface, which happens to be
>> balance-rr.
>> 
>> This additional patch addresses this issue by setting up tlb_dynamic_lb
>> to 1 if "mode" is set to balance-alb through the sysfs interface.
>> 
>> I didn't add code to change tlb_balance_lb back to the default value for
>> other modes, because "mode" is usually set up only once during
>> initialization, and it's not worthwhile to change the static variable
>> bonding_defaults in bond_main.c to a global variable just for this
>> purpose.
>> 
>> Commit 8b426dc54cf4 also changes the value of tlb_dynamic_lb for
>> balance-tlb mode if it is set up using the sysfs interface.  I didn't
>> change that behavior, because the value of tlb_balance_lb can be changed
>> using the sysfs interface for balance-tlb, and I didn't like changing
>> the default value back and forth for balance-tlb.
>> 
>> As for balance-alb, /sys/class/net/*/bonding/tlb_balance_lb cannot be
>> written to.  However, I think balance-alb with tlb_dynamic_lb set to 0
>> is not an intended usage, so there is little use making it writable at
>> this moment.
>> 
>> Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value")
>> Reported-by: Reinis Rozitis <r...@roze.lv>
>> Signed-off-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
>> Cc: sta...@vger.kernel.org  # v4.12+
>> ---
>>  drivers/net/bonding/bond_options.c |3 +++
>>  1 files changed, 3 insertions(+), 0 deletions(-)
>> 
> 
> I don't believe this to be the right solution, hardcoding it like this
> changes user-visible behaviour. The issue is that if someone configured
> it to be 0 in tlb mode, suddenly it will become 1 and will silently
> override their config if they switch the mode to alb. Also it robs users
> from their choice.
> 
> If you think this should be settable in ALB mode, then IMO you should
> edit tlb_dynamic_lb option's unsuppmodes and allow it to be set in ALB.
> That would also be consistent with how it's handled in TLB mode.

No, I don't think tlb_dynamic_lb should be settable in balance-alb at
this point.  All the current commits regarding tlb_dynamic_lb are for
balance-tlb mode, so I don't think balance-alb with tlb_dynamic_lb set
to 0 is an intended usage.


> Going back and looking at your previous fix I'd argue that it is also
> wrong, you should've removed the mode check altogether to return the
> original behaviour where the dynamic_lb is set to 1 (enabled) by
> default and then ALB mode would've had it, of course that would've left
> the case of setting it to 0 in TLB mode and switching to ALB, but that
> is a different issue.

Maybe balance-alb shouldn't be dependent on tlb_dynamic_lb.
tlb_dynamic_lb is referenced in the following functions.

 + bond_do_alb_xmit()  -- Used by both balance-tlb and balance-alb
 + bond_tlb_xmit()  -- Only used by balance-tlb
 + bond_open()  -- Used by both balance-tlb and balance-alb
 + bond_check_params()  -- Used during module initialization
 + bond_fill_info()  -- Used to get/set value
 + bond_option_tlb_dynamic_lb_set()  -- Used to get/set value
 + bonding_show_tlb_dynamic_lb()  -- Used to get/set value
 + bond_is_nondyn_tlb()  -- Only referenced if balance-tlb mode

The following untested patch adds code to make balance-alb work as if
tlb_dynamic_lb==1 for the functions which affect balance-alb mode.  It
also reverts my previous patch.

What do you think about this approach?
---
Kosuke TATSUKAWA  | 1st Platform Software Division
  | NEC Solution Innovators
  | ta...@ab.jp.nec.com


 drivers/net/bonding/bond_alb.c  |6 --
 drivers/net/bonding/bond_main.c |5 +++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index c02cc81..a9229b5 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1325,7 +1325,8 @@ static int bond_do_alb_xmit(struct sk_buff *skb, struct 
bonding *bond,
if (!tx_slave) {
   

[PATCH] net: bonding: Fix transmit load balancing in balance-alb mode if specified by sysfs

2017-09-06 Thread Kosuke Tatsukawa
Commit cbf5ecb30560 ("net: bonding: Fix transmit load balancing in
balance-alb mode") tried to fix transmit dynamic load balancing in
balance-alb mode, which wasn't working after commit 8b426dc54cf4
("bonding: remove hardcoded value").

It turned out that my previous patch only fixed the case when
balance-alb was specified as bonding module parameter, and not when
balance-alb mode was set using /sys/class/net/*/bonding/mode (the most
common usage).  In the latter case, tlb_dynamic_lb was set up according
to the default mode of the bonding interface, which happens to be
balance-rr.

This additional patch addresses this issue by setting up tlb_dynamic_lb
to 1 if "mode" is set to balance-alb through the sysfs interface.

I didn't add code to change tlb_balance_lb back to the default value for
other modes, because "mode" is usually set up only once during
initialization, and it's not worthwhile to change the static variable
bonding_defaults in bond_main.c to a global variable just for this
purpose.

Commit 8b426dc54cf4 also changes the value of tlb_dynamic_lb for
balance-tlb mode if it is set up using the sysfs interface.  I didn't
change that behavior, because the value of tlb_balance_lb can be changed
using the sysfs interface for balance-tlb, and I didn't like changing
the default value back and forth for balance-tlb.

As for balance-alb, /sys/class/net/*/bonding/tlb_balance_lb cannot be
written to.  However, I think balance-alb with tlb_dynamic_lb set to 0
is not an intended usage, so there is little use making it writable at
this moment.

Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value")
Reported-by: Reinis Rozitis <r...@roze.lv>
Signed-off-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
Cc: sta...@vger.kernel.org  # v4.12+
---
 drivers/net/bonding/bond_options.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index a12d603..5931aa2 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -754,6 +754,9 @@ static int bond_option_mode_set(struct bonding *bond,
   bond->params.miimon);
}
 
+   if (newval->value == BOND_MODE_ALB)
+   bond->params.tlb_dynamic_lb = 1;
+
/* don't cache arp_validate between modes */
bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
bond->params.mode = newval->value;



Re: [PATCH] net: bonding: Fix transmit load balancing in balance-alb mode

2017-09-06 Thread Kosuke Tatsukawa
Reinis Rozitis reported that tlb_dynamic_lb was still 0 in balance-alb
mode even when using linux-4.12.10 kernels, which includes this patch.

It turned out that my previous patch only fixed the case when
balance-alb mode was specified as bonding module parameter, and not when
balance-alb mode was set using /sys/class/net/*/bonding/mode (the most
common usage).  In the latter case, tlb_dynamic_lb was set up according
to the default mode of the bonding interface, which happens to be
balance-rr.

I'll send another patch to address this case.

In the meantime, a workaround for this issue is to specify balance-tlb
mode as a module parameter for bonding.  This will change the value of
tlb_balance_lb to 1, and make balance-alb work like pre-4.12 kernels.

Best regards.


> balance-alb mode used to have transmit dynamic load balancing feature
> enabled by default.  However, transmit dynamic load balancing no longer
> works in balance-alb after commit 8b426dc54cf4 ("bonding: remove
> hardcoded value").
> 
> Both balance-tlb and balance-alb use the function bond_do_alb_xmit() to
> send packets.  This function uses the parameter tlb_dynamic_lb.
> tlb_dynamic_lb used to have the default value of 1 for balance-alb, but
> now the value is set to 0 except in balance-tlb.
> 
> Re-enable transmit dyanmic load balancing by initializing tlb_dynamic_lb
> for balance-alb similar to balance-tlb.
> 
> Signed-off-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
> Cc: sta...@vger.kernel.org
> ---
>  drivers/net/bonding/bond_main.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 14ff622..181839d 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4596,7 +4596,7 @@ static int bond_check_params(struct bond_params *params)
>   }
>   ad_user_port_key = valptr->value;
>  
> - if (bond_mode == BOND_MODE_TLB) {
> + if ((bond_mode == BOND_MODE_TLB) || (bond_mode == BOND_MODE_ALB)) {
>   bond_opt_initstr(, "default");
>       valptr = bond_opt_parse(bond_opt_get(BOND_OPT_TLB_DYNAMIC_LB),
>   );
---
Kosuke TATSUKAWA  | 1st Platform Software Division
  | NEC Solution Innovators
  | ta...@ab.jp.nec.com



[PATCH] net: bonding: Fix transmit load balancing in balance-alb mode

2017-07-19 Thread Kosuke Tatsukawa
balance-alb mode used to have transmit dynamic load balancing feature
enabled by default.  However, transmit dynamic load balancing no longer
works in balance-alb after commit 8b426dc54cf4 ("bonding: remove
hardcoded value").

Both balance-tlb and balance-alb use the function bond_do_alb_xmit() to
send packets.  This function uses the parameter tlb_dynamic_lb.
tlb_dynamic_lb used to have the default value of 1 for balance-alb, but
now the value is set to 0 except in balance-tlb.

Re-enable transmit dyanmic load balancing by initializing tlb_dynamic_lb
for balance-alb similar to balance-tlb.

Signed-off-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
Cc: sta...@vger.kernel.org
---
 drivers/net/bonding/bond_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 14ff622..181839d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4596,7 +4596,7 @@ static int bond_check_params(struct bond_params *params)
}
ad_user_port_key = valptr->value;
 
-   if (bond_mode == BOND_MODE_TLB) {
+   if ((bond_mode == BOND_MODE_TLB) || (bond_mode == BOND_MODE_ALB)) {
bond_opt_initstr(, "default");
valptr = bond_opt_parse(bond_opt_get(BOND_OPT_TLB_DYNAMIC_LB),
);



Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-23 Thread Kosuke Tatsukawa
J. Bruce Fields wrote:
> On Fri, Oct 23, 2015 at 04:14:10AM +0000, Kosuke Tatsukawa wrote:
>> J. Bruce Fields wrote:
>> > On Fri, Oct 16, 2015 at 02:28:10AM +, Kosuke Tatsukawa wrote:
>> >> Tatsukawa Kosuke wrote:
>> >> > J. Bruce Fields wrote:
>> >> >> On Thu, Oct 15, 2015 at 11:44:20AM +, Kosuke Tatsukawa wrote:
>> >> >>> Tatsukawa Kosuke wrote:
>> >> >>> > J. Bruce Fields wrote:
>> >> >>> >> Thanks for the detailed investigation.
>> >> >>> >> 
>> >> >>> >> I think it would be worth adding a comment if that might help 
>> >> >>> >> someone
>> >> >>> >> having to reinvestigate this again some day.
>> >> >>> > 
>> >> >>> > It would be nice, but I find it difficult to write a comment in the
>> >> >>> > sunrpc layer why a memory barrier isn't necessary, using the 
>> >> >>> > knowledge
>> >> >>> > of how nfsd uses it, and the current implementation of the network 
>> >> >>> > code.
>> >> >>> > 
>> >> >>> > Personally, I would prefer removing the call to waitqueue_active() 
>> >> >>> > which
>> >> >>> > would make the memory barrier totally unnecessary at the cost of a
>> >> >>> > spin_lock + spin_unlock by unconditionally calling
>> >> >>> > wake_up_interruptible.
>> >> >>> 
>> >> >>> On second thought, the callbacks will be called frequently from the 
>> >> >>> tcp
>> >> >>> code, so it wouldn't be a good idea.
>> >> >> 
>> >> >> So, I was even considering documenting it like this, if it's not
>> >> >> overkill.
>> >> >> 
>> >> >> Hmm... but if this is right, then we may as well ask why we're doing 
>> >> >> the
>> >> >> wakeups at all.  Might be educational to test the code with them
>> >> >> removed.
>> >> > 
>> >> > sk_write_space will be called in sock_wfree() with UDP/IP each time
>> >> > kfree_skb() is called.  With TCP/IP, sk_write_space is only called if
>> >> > SOCK_NOSPACE has been set.
>> >> > 
>> >> > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP
>> >> > and in sock_queue_rcv_skb() for UDP/IP.  The latter lacks a memory
>> >> > barrier with sk_data_ready called right after __skb_queue_tail().
>> >> > I think this hasn't caused any problems because sk_data_ready wasn't
>> >> > used.
>> >> 
>> >> Actually, svc_udp_data_ready() calls set_bit() which is an atomic
>> >> operation.  So there won't be a problem unless svsk is NULL.
>> > 
>> > So is it true that every caller of these socket callbacks has adequate
>> > memory barriers between the time the change is made visible and the time
>> > the callback is called?
>> > 
>> > If so, then there's nothing really specific about nfsd here.
>> > 
>> > In that case maybe it's the networking code that use some documentation,
>> > if it doesn't already?  (Or maybe common helper functions for this
>> > 
>> >if (waitqueue_active(wq))
>> >wake_up(wq)
>> > 
>> > pattern?)
>> 
>> Some of the other places defining these callback functions are using
>>   static inline bool wq_has_sleeper(struct socket_wq *wq)
>> defined in include/net/sock.h
>> 
>> The comment above the function explains that it was introduced for
>> exactly this purpose.
>> 
>> Even thought the argument variable uses the same name "wq", it has a
>> different type from the wq used in svcsock.c (struct socket_wq *
>> vs. wait_queue_head_t *).
> 
> OK, thanks.  So, I guess it still sounds like the code is OK as is, but
> maybe my comment wasn't.  Here's another attempt.

Thank you.  By now the patch looks completely different from my original
patch, so I don't think I deserve to be mentioned in the Author line.


> --b.
> 
> commit b805ca58a81a
> Author: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
> Date:   Fri Oct 9 01:44:07 2015 +
> 
> svcrpc: document lack of some memory barriers
> 
> We're missing memory barriers in net/sunrpc/svcsock.c in some spots we'd
>

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-22 Thread Kosuke Tatsukawa
J. Bruce Fields wrote:
> On Fri, Oct 16, 2015 at 02:28:10AM +0000, Kosuke Tatsukawa wrote:
>> Tatsukawa Kosuke wrote:
>> > J. Bruce Fields wrote:
>> >> On Thu, Oct 15, 2015 at 11:44:20AM +, Kosuke Tatsukawa wrote:
>> >>> Tatsukawa Kosuke wrote:
>> >>> > J. Bruce Fields wrote:
>> >>> >> Thanks for the detailed investigation.
>> >>> >> 
>> >>> >> I think it would be worth adding a comment if that might help someone
>> >>> >> having to reinvestigate this again some day.
>> >>> > 
>> >>> > It would be nice, but I find it difficult to write a comment in the
>> >>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge
>> >>> > of how nfsd uses it, and the current implementation of the network 
>> >>> > code.
>> >>> > 
>> >>> > Personally, I would prefer removing the call to waitqueue_active() 
>> >>> > which
>> >>> > would make the memory barrier totally unnecessary at the cost of a
>> >>> > spin_lock + spin_unlock by unconditionally calling
>> >>> > wake_up_interruptible.
>> >>> 
>> >>> On second thought, the callbacks will be called frequently from the tcp
>> >>> code, so it wouldn't be a good idea.
>> >> 
>> >> So, I was even considering documenting it like this, if it's not
>> >> overkill.
>> >> 
>> >> Hmm... but if this is right, then we may as well ask why we're doing the
>> >> wakeups at all.  Might be educational to test the code with them
>> >> removed.
>> > 
>> > sk_write_space will be called in sock_wfree() with UDP/IP each time
>> > kfree_skb() is called.  With TCP/IP, sk_write_space is only called if
>> > SOCK_NOSPACE has been set.
>> > 
>> > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP
>> > and in sock_queue_rcv_skb() for UDP/IP.  The latter lacks a memory
>> > barrier with sk_data_ready called right after __skb_queue_tail().
>> > I think this hasn't caused any problems because sk_data_ready wasn't
>> > used.
>> 
>> Actually, svc_udp_data_ready() calls set_bit() which is an atomic
>> operation.  So there won't be a problem unless svsk is NULL.
> 
> So is it true that every caller of these socket callbacks has adequate
> memory barriers between the time the change is made visible and the time
> the callback is called?
> 
> If so, then there's nothing really specific about nfsd here.
> 
> In that case maybe it's the networking code that use some documentation,
> if it doesn't already?  (Or maybe common helper functions for this
> 
>   if (waitqueue_active(wq))
>   wake_up(wq)
> 
> pattern?)

Some of the other places defining these callback functions are using
  static inline bool wq_has_sleeper(struct socket_wq *wq)
defined in include/net/sock.h

The comment above the function explains that it was introduced for
exactly this purpose.

Even thought the argument variable uses the same name "wq", it has a
different type from the wq used in svcsock.c (struct socket_wq *
vs. wait_queue_head_t *).


> --b.
> 
>> 
>> 
>> >> --b.
>> >> 
>> >> commit 0882cfeb39e0
>> >> Author: J. Bruce Fields <bfie...@redhat.com>
>> >> Date:   Thu Oct 15 16:53:41 2015 -0400
>> >> 
>> >> svcrpc: document lack of some memory barriers.
>> >> 
>> >> Kosuke Tatsukawa points out an odd lack of memory barriers in some 
>> >> sites
>> >> here.  I think the code's correct, but it's probably worth 
>> >> documenting.
>> >> 
>> >> Reported-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
>> >> 
>> >> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> >> index 856407fa085e..90480993ec4a 100644
>> >> --- a/net/sunrpc/svcsock.c
>> >> +++ b/net/sunrpc/svcsock.c
>> >> @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst 
>> >> *rqstp)
>> >>   return svc_port_is_privileged(svc_addr(rqstp));
>> >>  }
>> >>  
>> >> +static void svc_no_smp_mb(void)
>> >> +{
>> >> + /*
>> >> +  * Kosuke Tatsukawa points out there should normally be an
>> >> +  * smp_mb() at the callsites of this function.  (Either that or
>> >> +  * we could just drop 

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-15 Thread Kosuke Tatsukawa
Tatsukawa Kosuke wrote:
> J. Bruce Fields wrote:
>> On Wed, Oct 14, 2015 at 03:57:13AM +0000, Kosuke Tatsukawa wrote:
>>> J. Bruce Fields wrote:
>>> > On Mon, Oct 12, 2015 at 10:41:06AM +, Kosuke Tatsukawa wrote:
>>> >> J. Bruce Fields wrote:
>>> >> > On Fri, Oct 09, 2015 at 06:29:44AM +, Kosuke Tatsukawa wrote:
>>> >> >> Neil Brown wrote:
>>> >> >> > Kosuke Tatsukawa <ta...@ab.jp.nec.com> writes:
>>> >> >> > 
>>> >> >> >> There are several places in net/sunrpc/svcsock.c which calls
>>> >> >> >> waitqueue_active() without calling a memory barrier.  Add a memory
>>> >> >> >> barrier just as in wq_has_sleeper().
>>> >> >> >>
>>> >> >> >> I found this issue when I was looking through the linux source code
>>> >> >> >> for places calling waitqueue_active() before wake_up*(), but 
>>> >> >> >> without
>>> >> >> >> preceding memory barriers, after sending a patch to fix a similar
>>> >> >> >> issue in drivers/tty/n_tty.c  (Details about the original issue 
>>> >> >> >> can be
>>> >> >> >> found here: https://lkml.org/lkml/2015/9/28/849).
>>> >> >> > 
>>> >> >> > hi,
>>> >> >> > this feels like the wrong approach to the problem.  It requires 
>>> >> >> > extra
>>> >> >> > 'smb_mb's to be spread around which are hard to understand as easy 
>>> >> >> > to
>>> >> >> > forget.
>>> >> >> > 
>>> >> >> > A quick look seems to suggest that (nearly) every waitqueue_active()
>>> >> >> > will need an smb_mb.  Could we just put the smb_mb() inside
>>> >> >> > waitqueue_active()??
>>> >> >> 
>>> >> >> 
>>> >> >> There are around 200 occurrences of waitqueue_active() in the kernel
>>> >> >> source, and most of the places which use it before wake_up are either
>>> >> >> protected by some spin lock, or already has a memory barrier or some
>>> >> >> kind of atomic operation before it.
>>> >> >> 
>>> >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in
>>> >> >> many cases and won't be a good idea.
>>> >> >> 
>>> >> >> Another way to solve this problem is to remove the waitqueue_active(),
>>> >> >> making the code look like this;
>>> >> >>   if (wq)
>>> >> >>   wake_up_interruptible(wq);
>>> >> >> This also fixes the problem because the spinlock in the wake_up*() 
>>> >> >> acts
>>> >> >> as a memory barrier and prevents the code from being reordered by the
>>> >> >> CPU (and it also makes the resulting code is much simpler).
>>> >> > 
>>> >> > I might not care which we did, except I don't have the means to test
>>> >> > this quickly, and I guess this is some of our most frequently called
>>> >> > code.
>>> >> > 
>>> >> > I suppose your patch is the most conservative approach, as the
>>> >> > alternative is a spinlock/unlock in wake_up_interruptible, which I
>>> >> > assume is necessarily more expensive than an smp_mb().
>>> >> > 
>>> >> > As far as I can tell it's been this way since forever.  (Well, since a
>>> >> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" 
>>> >> > which
>>> >> > removed some spinlocks from the data_ready routines.)
>>> >> > 
>>> >> > I don't understand what the actual race is yet (which code exactly is
>>> >> > missing the wakeup in this case?  nfsd threads seem to instead get
>>> >> > woken up by the wake_up_process() in svc_xprt_do_enqueue().)
>>> >> 
>>> >> Thank you for the reply.  I tried looking into this.
>>> >> 
>>> >> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and
>>> >> svc_udp_init(), which are both called from svc_setup_socket().
>>> >> svc_s

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-15 Thread Kosuke Tatsukawa
J. Bruce Fields wrote:
> On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote:
>> Tatsukawa Kosuke wrote:
>> > J. Bruce Fields wrote:
>> >> Thanks for the detailed investigation.
>> >> 
>> >> I think it would be worth adding a comment if that might help someone
>> >> having to reinvestigate this again some day.
>> > 
>> > It would be nice, but I find it difficult to write a comment in the
>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge
>> > of how nfsd uses it, and the current implementation of the network code.
>> > 
>> > Personally, I would prefer removing the call to waitqueue_active() which
>> > would make the memory barrier totally unnecessary at the cost of a
>> > spin_lock + spin_unlock by unconditionally calling
>> > wake_up_interruptible.
>> 
>> On second thought, the callbacks will be called frequently from the tcp
>> code, so it wouldn't be a good idea.
> 
> So, I was even considering documenting it like this, if it's not
> overkill.
> 
> Hmm... but if this is right, then we may as well ask why we're doing the
> wakeups at all.  Might be educational to test the code with them
> removed.

sk_write_space will be called in sock_wfree() with UDP/IP each time
kfree_skb() is called.  With TCP/IP, sk_write_space is only called if
SOCK_NOSPACE has been set.

sk_data_ready will be called in both tcp_rcv_established() for TCP/IP
and in sock_queue_rcv_skb() for UDP/IP.  The latter lacks a memory
barrier with sk_data_ready called right after __skb_queue_tail().
I think this hasn't caused any problems because sk_data_ready wasn't
used.


> --b.
> 
> commit 0882cfeb39e0
> Author: J. Bruce Fields <bfie...@redhat.com>
> Date:   Thu Oct 15 16:53:41 2015 -0400
> 
> svcrpc: document lack of some memory barriers.
> 
> Kosuke Tatsukawa points out an odd lack of memory barriers in some sites
> here.  I think the code's correct, but it's probably worth documenting.
> 
> Reported-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 856407fa085e..90480993ec4a 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp)
>   return svc_port_is_privileged(svc_addr(rqstp));
>  }
>  
> +static void svc_no_smp_mb(void)
> +{
> + /*
> +  * Kosuke Tatsukawa points out there should normally be an
> +  * smp_mb() at the callsites of this function.  (Either that or
> +  * we could just drop the waitqueue_active() checks.)
> +  *
> +  * It appears they aren't currently necessary, though, basically
> +  * because nfsd does non-blocking reads from these sockets, so
> +  * the only places we wait on this waitqueue is in sendpage and
> +  * sendmsg, which won't be waiting for wakeups on newly arrived
> +  * data.
> +  *
> +  * Maybe we should add the memory barriers anyway, but these are
> +  * hot paths so we'd need to be convinced there's no sigificant
> +  * penalty.
> +  */
> +}
> +
>  /*
>   * INET callback when data has been received on the socket.
>   */
> @@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk)
>   set_bit(XPT_DATA, >sk_xprt.xpt_flags);
>   svc_xprt_enqueue(>sk_xprt);
>   }
> - smp_mb();
> + svc_no_smp_mb();
>   if (wq && waitqueue_active(wq))
>   wake_up_interruptible(wq);
>  }
> @@ -433,7 +452,7 @@ static void svc_write_space(struct sock *sk)
>   svc_xprt_enqueue(>sk_xprt);
>   }
>  
> - smp_mb();
> + svc_no_smp_mb();
>   if (wq && waitqueue_active(wq)) {
>   dprintk("RPC svc_write_space: someone sleeping on %p\n",
>  svsk);
> @@ -789,7 +808,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>   }
>  
>   wq = sk_sleep(sk);
> - smp_mb();
> + svc_no_smp_mb();
>   if (wq && waitqueue_active(wq))
>   wake_up_interruptible_all(wq);
>  }
> @@ -811,7 +830,7 @@ static void svc_tcp_state_change(struct sock *sk)
>   set_bit(XPT_CLOSE, >sk_xprt.xpt_flags);
>   svc_xprt_enqueue(>sk_xprt);
>   }
> - smp_mb();
> + svc_no_smp_mb();
>   if (wq && waitqueue_active(wq))
>   wake_up_interruptible_all(wq);
>  }
> @@ -827,7 +846,7 @@ static void svc_tcp_data_ready(struct sock *sk)
>   set_bit(XPT_DATA, >sk_xprt.xpt_flags);
>  

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-15 Thread Kosuke Tatsukawa
Tatsukawa Kosuke wrote:
> J. Bruce Fields wrote:
>> On Thu, Oct 15, 2015 at 11:44:20AM +0000, Kosuke Tatsukawa wrote:
>>> Tatsukawa Kosuke wrote:
>>> > J. Bruce Fields wrote:
>>> >> Thanks for the detailed investigation.
>>> >> 
>>> >> I think it would be worth adding a comment if that might help someone
>>> >> having to reinvestigate this again some day.
>>> > 
>>> > It would be nice, but I find it difficult to write a comment in the
>>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge
>>> > of how nfsd uses it, and the current implementation of the network code.
>>> > 
>>> > Personally, I would prefer removing the call to waitqueue_active() which
>>> > would make the memory barrier totally unnecessary at the cost of a
>>> > spin_lock + spin_unlock by unconditionally calling
>>> > wake_up_interruptible.
>>> 
>>> On second thought, the callbacks will be called frequently from the tcp
>>> code, so it wouldn't be a good idea.
>> 
>> So, I was even considering documenting it like this, if it's not
>> overkill.
>> 
>> Hmm... but if this is right, then we may as well ask why we're doing the
>> wakeups at all.  Might be educational to test the code with them
>> removed.
> 
> sk_write_space will be called in sock_wfree() with UDP/IP each time
> kfree_skb() is called.  With TCP/IP, sk_write_space is only called if
> SOCK_NOSPACE has been set.
> 
> sk_data_ready will be called in both tcp_rcv_established() for TCP/IP
> and in sock_queue_rcv_skb() for UDP/IP.  The latter lacks a memory
> barrier with sk_data_ready called right after __skb_queue_tail().
> I think this hasn't caused any problems because sk_data_ready wasn't
> used.

Actually, svc_udp_data_ready() calls set_bit() which is an atomic
operation.  So there won't be a problem unless svsk is NULL.


>> --b.
>> 
>> commit 0882cfeb39e0
>> Author: J. Bruce Fields <bfie...@redhat.com>
>> Date:   Thu Oct 15 16:53:41 2015 -0400
>> 
>> svcrpc: document lack of some memory barriers.
>> 
>> Kosuke Tatsukawa points out an odd lack of memory barriers in some sites
>> here.  I think the code's correct, but it's probably worth documenting.
>> 
>> Reported-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
>> 
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index 856407fa085e..90480993ec4a 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp)
>>  return svc_port_is_privileged(svc_addr(rqstp));
>>  }
>>  
>> +static void svc_no_smp_mb(void)
>> +{
>> +/*
>> + * Kosuke Tatsukawa points out there should normally be an
>> + * smp_mb() at the callsites of this function.  (Either that or
>> + * we could just drop the waitqueue_active() checks.)
>> + *
>> + * It appears they aren't currently necessary, though, basically
>> + * because nfsd does non-blocking reads from these sockets, so
>> + * the only places we wait on this waitqueue is in sendpage and
>> + * sendmsg, which won't be waiting for wakeups on newly arrived
>> + * data.
>> + *
>> + * Maybe we should add the memory barriers anyway, but these are
>> + * hot paths so we'd need to be convinced there's no sigificant
>> + * penalty.
>> + */
>> +}
>> +
>>  /*
>>   * INET callback when data has been received on the socket.
>>   */
>> @@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk)
>>  set_bit(XPT_DATA, >sk_xprt.xpt_flags);
>>  svc_xprt_enqueue(>sk_xprt);
>>  }
>> -smp_mb();
>> +svc_no_smp_mb();
>>  if (wq && waitqueue_active(wq))
>>  wake_up_interruptible(wq);
>>  }
>> @@ -433,7 +452,7 @@ static void svc_write_space(struct sock *sk)
>>  svc_xprt_enqueue(>sk_xprt);
>>  }
>>  
>> -smp_mb();
>> +svc_no_smp_mb();
>>  if (wq && waitqueue_active(wq)) {
>>  dprintk("RPC svc_write_space: someone sleeping on %p\n",
>> svsk);
>> @@ -789,7 +808,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>>  }
>>  
>>  wq = sk_sleep(sk);
>> -smp_mb();
>> +svc_no_smp_mb();
>>  if (wq && waitqueue_active(wq))
>>  wake

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-14 Thread Kosuke Tatsukawa
J. Bruce Fields wrote:
> On Wed, Oct 14, 2015 at 03:57:13AM +0000, Kosuke Tatsukawa wrote:
>> J. Bruce Fields wrote:
>> > On Mon, Oct 12, 2015 at 10:41:06AM +, Kosuke Tatsukawa wrote:
>> >> J. Bruce Fields wrote:
>> >> > On Fri, Oct 09, 2015 at 06:29:44AM +, Kosuke Tatsukawa wrote:
>> >> >> Neil Brown wrote:
>> >> >> > Kosuke Tatsukawa <ta...@ab.jp.nec.com> writes:
>> >> >> > 
>> >> >> >> There are several places in net/sunrpc/svcsock.c which calls
>> >> >> >> waitqueue_active() without calling a memory barrier.  Add a memory
>> >> >> >> barrier just as in wq_has_sleeper().
>> >> >> >>
>> >> >> >> I found this issue when I was looking through the linux source code
>> >> >> >> for places calling waitqueue_active() before wake_up*(), but without
>> >> >> >> preceding memory barriers, after sending a patch to fix a similar
>> >> >> >> issue in drivers/tty/n_tty.c  (Details about the original issue can 
>> >> >> >> be
>> >> >> >> found here: https://lkml.org/lkml/2015/9/28/849).
>> >> >> > 
>> >> >> > hi,
>> >> >> > this feels like the wrong approach to the problem.  It requires extra
>> >> >> > 'smb_mb's to be spread around which are hard to understand as easy to
>> >> >> > forget.
>> >> >> > 
>> >> >> > A quick look seems to suggest that (nearly) every waitqueue_active()
>> >> >> > will need an smb_mb.  Could we just put the smb_mb() inside
>> >> >> > waitqueue_active()??
>> >> >> 
>> >> >> 
>> >> >> There are around 200 occurrences of waitqueue_active() in the kernel
>> >> >> source, and most of the places which use it before wake_up are either
>> >> >> protected by some spin lock, or already has a memory barrier or some
>> >> >> kind of atomic operation before it.
>> >> >> 
>> >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in
>> >> >> many cases and won't be a good idea.
>> >> >> 
>> >> >> Another way to solve this problem is to remove the waitqueue_active(),
>> >> >> making the code look like this;
>> >> >>if (wq)
>> >> >>wake_up_interruptible(wq);
>> >> >> This also fixes the problem because the spinlock in the wake_up*() acts
>> >> >> as a memory barrier and prevents the code from being reordered by the
>> >> >> CPU (and it also makes the resulting code is much simpler).
>> >> > 
>> >> > I might not care which we did, except I don't have the means to test
>> >> > this quickly, and I guess this is some of our most frequently called
>> >> > code.
>> >> > 
>> >> > I suppose your patch is the most conservative approach, as the
>> >> > alternative is a spinlock/unlock in wake_up_interruptible, which I
>> >> > assume is necessarily more expensive than an smp_mb().
>> >> > 
>> >> > As far as I can tell it's been this way since forever.  (Well, since a
>> >> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which
>> >> > removed some spinlocks from the data_ready routines.)
>> >> > 
>> >> > I don't understand what the actual race is yet (which code exactly is
>> >> > missing the wakeup in this case?  nfsd threads seem to instead get
>> >> > woken up by the wake_up_process() in svc_xprt_do_enqueue().)
>> >> 
>> >> Thank you for the reply.  I tried looking into this.
>> >> 
>> >> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and
>> >> svc_udp_init(), which are both called from svc_setup_socket().
>> >> svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4
>> >> callback port related code.
>> >> 
>> >> Maybe I'm wrong, but there might not be any kernel code that is using
>> >> the socket's wait queue in this case.
>> > 
>> > As Trond points out there are probably waiters internal to the
>> > networking code.
>> 
>> Trond and Bruce, thank you for the comment.  I was abl

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-13 Thread Kosuke Tatsukawa
J. Bruce Fields wrote:
> On Mon, Oct 12, 2015 at 10:41:06AM +0000, Kosuke Tatsukawa wrote:
>> J. Bruce Fields wrote:
>> > On Fri, Oct 09, 2015 at 06:29:44AM +, Kosuke Tatsukawa wrote:
>> >> Neil Brown wrote:
>> >> > Kosuke Tatsukawa <ta...@ab.jp.nec.com> writes:
>> >> > 
>> >> >> There are several places in net/sunrpc/svcsock.c which calls
>> >> >> waitqueue_active() without calling a memory barrier.  Add a memory
>> >> >> barrier just as in wq_has_sleeper().
>> >> >>
>> >> >> I found this issue when I was looking through the linux source code
>> >> >> for places calling waitqueue_active() before wake_up*(), but without
>> >> >> preceding memory barriers, after sending a patch to fix a similar
>> >> >> issue in drivers/tty/n_tty.c  (Details about the original issue can be
>> >> >> found here: https://lkml.org/lkml/2015/9/28/849).
>> >> > 
>> >> > hi,
>> >> > this feels like the wrong approach to the problem.  It requires extra
>> >> > 'smb_mb's to be spread around which are hard to understand as easy to
>> >> > forget.
>> >> > 
>> >> > A quick look seems to suggest that (nearly) every waitqueue_active()
>> >> > will need an smb_mb.  Could we just put the smb_mb() inside
>> >> > waitqueue_active()??
>> >> 
>> >> 
>> >> There are around 200 occurrences of waitqueue_active() in the kernel
>> >> source, and most of the places which use it before wake_up are either
>> >> protected by some spin lock, or already has a memory barrier or some
>> >> kind of atomic operation before it.
>> >> 
>> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in
>> >> many cases and won't be a good idea.
>> >> 
>> >> Another way to solve this problem is to remove the waitqueue_active(),
>> >> making the code look like this;
>> >>   if (wq)
>> >>   wake_up_interruptible(wq);
>> >> This also fixes the problem because the spinlock in the wake_up*() acts
>> >> as a memory barrier and prevents the code from being reordered by the
>> >> CPU (and it also makes the resulting code is much simpler).
>> > 
>> > I might not care which we did, except I don't have the means to test
>> > this quickly, and I guess this is some of our most frequently called
>> > code.
>> > 
>> > I suppose your patch is the most conservative approach, as the
>> > alternative is a spinlock/unlock in wake_up_interruptible, which I
>> > assume is necessarily more expensive than an smp_mb().
>> > 
>> > As far as I can tell it's been this way since forever.  (Well, since a
>> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which
>> > removed some spinlocks from the data_ready routines.)
>> > 
>> > I don't understand what the actual race is yet (which code exactly is
>> > missing the wakeup in this case?  nfsd threads seem to instead get
>> > woken up by the wake_up_process() in svc_xprt_do_enqueue().)
>> 
>> Thank you for the reply.  I tried looking into this.
>> 
>> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and
>> svc_udp_init(), which are both called from svc_setup_socket().
>> svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4
>> callback port related code.
>> 
>> Maybe I'm wrong, but there might not be any kernel code that is using
>> the socket's wait queue in this case.
> 
> As Trond points out there are probably waiters internal to the
> networking code.

Trond and Bruce, thank you for the comment.  I was able to find the call
to the wait function that was called from nfsd.

sk_stream_wait_connect() and sk_stream_wait_memory() were called from
either do_tcp_sendpages() or tcp_sendmsg() called from within
svc_send().  sk_stream_wait_connect() shouldn't be called at this point,
because the socket has already been used to receive the rpc request.

On the wake_up side, sk_write_space() is called from the following
locations.  The relevant ones seems to be preceded by atomic_sub or a
memory barrier.
+ ksocknal_write_space 
[drivers/staging/lustre/lnet/klnds/socklnd/socklnd_lib.c:633]
+ atm_pop_raw [net/atm/raw.c:40]
+ sock_setsockopt [net/core/sock.c:740]
+ sock_wfree [net/core/sock.c:1630]
  Preceded by atomic_sub in sock_wfree()
+ ccid3_hc_tx_packet_recv [net/dccp/ccids/ccid3.c:442]
+ do_t

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-12 Thread Kosuke Tatsukawa
J. Bruce Fields wrote:
> On Fri, Oct 09, 2015 at 06:29:44AM +0000, Kosuke Tatsukawa wrote:
>> Neil Brown wrote:
>> > Kosuke Tatsukawa <ta...@ab.jp.nec.com> writes:
>> > 
>> >> There are several places in net/sunrpc/svcsock.c which calls
>> >> waitqueue_active() without calling a memory barrier.  Add a memory
>> >> barrier just as in wq_has_sleeper().
>> >>
>> >> I found this issue when I was looking through the linux source code
>> >> for places calling waitqueue_active() before wake_up*(), but without
>> >> preceding memory barriers, after sending a patch to fix a similar
>> >> issue in drivers/tty/n_tty.c  (Details about the original issue can be
>> >> found here: https://lkml.org/lkml/2015/9/28/849).
>> > 
>> > hi,
>> > this feels like the wrong approach to the problem.  It requires extra
>> > 'smb_mb's to be spread around which are hard to understand as easy to
>> > forget.
>> > 
>> > A quick look seems to suggest that (nearly) every waitqueue_active()
>> > will need an smb_mb.  Could we just put the smb_mb() inside
>> > waitqueue_active()??
>> 
>> 
>> There are around 200 occurrences of waitqueue_active() in the kernel
>> source, and most of the places which use it before wake_up are either
>> protected by some spin lock, or already has a memory barrier or some
>> kind of atomic operation before it.
>> 
>> Simply adding smp_mb() to waitqueue_active() would incur extra cost in
>> many cases and won't be a good idea.
>> 
>> Another way to solve this problem is to remove the waitqueue_active(),
>> making the code look like this;
>>  if (wq)
>>  wake_up_interruptible(wq);
>> This also fixes the problem because the spinlock in the wake_up*() acts
>> as a memory barrier and prevents the code from being reordered by the
>> CPU (and it also makes the resulting code is much simpler).
> 
> I might not care which we did, except I don't have the means to test
> this quickly, and I guess this is some of our most frequently called
> code.
> 
> I suppose your patch is the most conservative approach, as the
> alternative is a spinlock/unlock in wake_up_interruptible, which I
> assume is necessarily more expensive than an smp_mb().
> 
> As far as I can tell it's been this way since forever.  (Well, since a
> 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which
> removed some spinlocks from the data_ready routines.)
> 
> I don't understand what the actual race is yet (which code exactly is
> missing the wakeup in this case?  nfsd threads seem to instead get
> woken up by the wake_up_process() in svc_xprt_do_enqueue().)

Thank you for the reply.  I tried looking into this.

The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and
svc_udp_init(), which are both called from svc_setup_socket().
svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4
callback port related code.

Maybe I'm wrong, but there might not be any kernel code that is using
the socket's wait queue in this case.

Best regards.
---
Kosuke TATSUKAWA  | 3rd IT Platform Department
  | IT Platform Division, NEC Corporation
  | ta...@ab.jp.nec.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-09 Thread Kosuke Tatsukawa
Neil Brown wrote:
> Kosuke Tatsukawa <ta...@ab.jp.nec.com> writes:
> 
>> There are several places in net/sunrpc/svcsock.c which calls
>> waitqueue_active() without calling a memory barrier.  Add a memory
>> barrier just as in wq_has_sleeper().
>>
>> I found this issue when I was looking through the linux source code
>> for places calling waitqueue_active() before wake_up*(), but without
>> preceding memory barriers, after sending a patch to fix a similar
>> issue in drivers/tty/n_tty.c  (Details about the original issue can be
>> found here: https://lkml.org/lkml/2015/9/28/849).
> 
> hi,
> this feels like the wrong approach to the problem.  It requires extra
> 'smb_mb's to be spread around which are hard to understand as easy to
> forget.
> 
> A quick look seems to suggest that (nearly) every waitqueue_active()
> will need an smb_mb.  Could we just put the smb_mb() inside
> waitqueue_active()??


There are around 200 occurrences of waitqueue_active() in the kernel
source, and most of the places which use it before wake_up are either
protected by some spin lock, or already has a memory barrier or some
kind of atomic operation before it.

Simply adding smp_mb() to waitqueue_active() would incur extra cost in
many cases and won't be a good idea.

Another way to solve this problem is to remove the waitqueue_active(),
making the code look like this;
if (wq)
wake_up_interruptible(wq);
This also fixes the problem because the spinlock in the wake_up*() acts
as a memory barrier and prevents the code from being reordered by the
CPU (and it also makes the resulting code is much simpler).
---
Kosuke TATSUKAWA  | 3rd IT Platform Department
  | IT Platform Division, NEC Corporation
  | ta...@ab.jp.nec.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-08 Thread Kosuke Tatsukawa
There are several places in net/sunrpc/svcsock.c which calls
waitqueue_active() without calling a memory barrier.  Add a memory
barrier just as in wq_has_sleeper().

I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c  (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).

Signed-off-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
---
v2:
  - Fixed compiler warnings caused by type mismatch
v1:
  - https://lkml.org/lkml/2015/10/8/993
---
 net/sunrpc/svcsock.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 0c81202..ec19444 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -414,6 +414,7 @@ static void svc_udp_data_ready(struct sock *sk)
set_bit(XPT_DATA, >sk_xprt.xpt_flags);
svc_xprt_enqueue(>sk_xprt);
}
+   smp_mb();
if (wq && waitqueue_active(wq))
wake_up_interruptible(wq);
 }
@@ -432,6 +433,7 @@ static void svc_write_space(struct sock *sk)
svc_xprt_enqueue(>sk_xprt);
}
 
+   smp_mb();
if (wq && waitqueue_active(wq)) {
dprintk("RPC svc_write_space: someone sleeping on %p\n",
   svsk);
@@ -787,6 +789,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
}
 
wq = sk_sleep(sk);
+   smp_mb();
if (wq && waitqueue_active(wq))
wake_up_interruptible_all(wq);
 }
@@ -808,6 +811,7 @@ static void svc_tcp_state_change(struct sock *sk)
set_bit(XPT_CLOSE, >sk_xprt.xpt_flags);
svc_xprt_enqueue(>sk_xprt);
}
+   smp_mb();
if (wq && waitqueue_active(wq))
wake_up_interruptible_all(wq);
 }
@@ -823,6 +827,7 @@ static void svc_tcp_data_ready(struct sock *sk)
set_bit(XPT_DATA, >sk_xprt.xpt_flags);
svc_xprt_enqueue(>sk_xprt);
}
+   smp_mb();
if (wq && waitqueue_active(wq))
wake_up_interruptible(wq);
 }
@@ -1594,6 +1599,7 @@ static void svc_sock_detach(struct svc_xprt *xprt)
sk->sk_write_space = svsk->sk_owspace;
 
wq = sk_sleep(sk);
+   smp_mb();
if (wq && waitqueue_active(wq))
wake_up_interruptible(wq);
 }
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] brcmfmac: fix waitqueue_active without memory barrier in brcmfmac driver

2015-10-08 Thread Kosuke Tatsukawa
brcmf_msgbuf_ioctl_resp_wake() seems to be missing a memory barrier
which might cause the waker to not notice the waiter and miss sending a
wake_up as in the following figure.

  brcmf_msgbuf_ioctl_resp_wake  brcmf_msgbuf_ioctl_resp_wait

if (waitqueue_active(>ioctl_resp_wait))
/* The CPU might reorder the test for
   the waitqueue up here, before
   prior writes complete */
   /* wait_event_timeout */
/* __wait_event_timeout */
 /* ___wait_event */
 prepare_to_wait_event(, &__wait,
   state);
 if (msgbuf->ctl_completed)
 ...
msgbuf->ctl_completed = true;
 schedule_timeout(__ret))


There are three other place in drivers/net/wireless/brcm80211/brcmfmac/
which have similar code.  The attached patch removes the call to
waitqueue_active() leaving just wake_up() behind.  This fixes the
problem because the call to spin_lock_irqsave() in wake_up() will be an
ACQUIRE operation.

I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c  (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).

Signed-off-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
---
 drivers/net/wireless/brcm80211/brcmfmac/msgbuf.c |3 +--
 drivers/net/wireless/brcm80211/brcmfmac/sdio.c   |6 ++
 drivers/net/wireless/brcm80211/brcmfmac/usb.c|3 +--
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/brcm80211/brcmfmac/msgbuf.c 
b/drivers/net/wireless/brcm80211/brcmfmac/msgbuf.c
index 7b2136c..648151e 100644
--- a/drivers/net/wireless/brcm80211/brcmfmac/msgbuf.c
+++ b/drivers/net/wireless/brcm80211/brcmfmac/msgbuf.c
@@ -473,8 +473,7 @@ static int brcmf_msgbuf_ioctl_resp_wait(struct brcmf_msgbuf 
*msgbuf)
 static void brcmf_msgbuf_ioctl_resp_wake(struct brcmf_msgbuf *msgbuf)
 {
msgbuf->ctl_completed = true;
-   if (waitqueue_active(>ioctl_resp_wait))
-   wake_up(>ioctl_resp_wait);
+   wake_up(>ioctl_resp_wait);
 }
 
 
diff --git a/drivers/net/wireless/brcm80211/brcmfmac/sdio.c 
b/drivers/net/wireless/brcm80211/brcmfmac/sdio.c
index f990e3d..332c4c8 100644
--- a/drivers/net/wireless/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/brcm80211/brcmfmac/sdio.c
@@ -1785,8 +1785,7 @@ static int brcmf_sdio_dcmd_resp_wait(struct brcmf_sdio 
*bus, uint *condition,
 
 static int brcmf_sdio_dcmd_resp_wake(struct brcmf_sdio *bus)
 {
-   if (waitqueue_active(>dcmd_resp_wait))
-   wake_up_interruptible(>dcmd_resp_wait);
+   wake_up_interruptible(>dcmd_resp_wait);
 
return 0;
 }
@@ -2110,8 +2109,7 @@ static uint brcmf_sdio_readframes(struct brcmf_sdio *bus, 
uint maxframes)
 static void
 brcmf_sdio_wait_event_wakeup(struct brcmf_sdio *bus)
 {
-   if (waitqueue_active(>ctrl_wait))
-   wake_up_interruptible(>ctrl_wait);
+   wake_up_interruptible(>ctrl_wait);
return;
 }
 
diff --git a/drivers/net/wireless/brcm80211/brcmfmac/usb.c 
b/drivers/net/wireless/brcm80211/brcmfmac/usb.c
index daba86d..7f5889c 100644
--- a/drivers/net/wireless/brcm80211/brcmfmac/usb.c
+++ b/drivers/net/wireless/brcm80211/brcmfmac/usb.c
@@ -184,8 +184,7 @@ static int brcmf_usb_ioctl_resp_wait(struct 
brcmf_usbdev_info *devinfo)
 
 static void brcmf_usb_ioctl_resp_wake(struct brcmf_usbdev_info *devinfo)
 {
-   if (waitqueue_active(>ioctl_resp_wait))
-   wake_up(>ioctl_resp_wait);
+   wake_up(>ioctl_resp_wait);
 }
 
 static void
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-08 Thread Kosuke Tatsukawa
There are several places in net/sunrpc/svcsock.c which calls
waitqueue_active() without calling a memory barrier.  Change the code
to call wq_has_sleeper() instead, which other networking code uses in
similar places.

I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c  (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).

Signed-off-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
---
 net/sunrpc/svcsock.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 0c81202..cf081b8 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -414,7 +414,7 @@ static void svc_udp_data_ready(struct sock *sk)
set_bit(XPT_DATA, >sk_xprt.xpt_flags);
svc_xprt_enqueue(>sk_xprt);
}
-   if (wq && waitqueue_active(wq))
+   if (wq_has_sleeper(wq))
wake_up_interruptible(wq);
 }
 
@@ -432,7 +432,7 @@ static void svc_write_space(struct sock *sk)
svc_xprt_enqueue(>sk_xprt);
}
 
-   if (wq && waitqueue_active(wq)) {
+   if (wq_has_sleeper(wq)) {
dprintk("RPC svc_write_space: someone sleeping on %p\n",
   svsk);
wake_up_interruptible(wq);
@@ -787,7 +787,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
}
 
wq = sk_sleep(sk);
-   if (wq && waitqueue_active(wq))
+   if (wq_has_sleeper(wq))
wake_up_interruptible_all(wq);
 }
 
@@ -808,7 +808,7 @@ static void svc_tcp_state_change(struct sock *sk)
set_bit(XPT_CLOSE, >sk_xprt.xpt_flags);
svc_xprt_enqueue(>sk_xprt);
}
-   if (wq && waitqueue_active(wq))
+   if (wq_has_sleeper(wq))
wake_up_interruptible_all(wq);
 }
 
@@ -823,7 +823,7 @@ static void svc_tcp_data_ready(struct sock *sk)
set_bit(XPT_DATA, >sk_xprt.xpt_flags);
svc_xprt_enqueue(>sk_xprt);
}
-   if (wq && waitqueue_active(wq))
+   if (wq_has_sleeper(wq))
wake_up_interruptible(wq);
 }
 
@@ -1594,7 +1594,7 @@ static void svc_sock_detach(struct svc_xprt *xprt)
sk->sk_write_space = svsk->sk_owspace;
 
wq = sk_sleep(sk);
-   if (wq && waitqueue_active(wq))
+   if (wq_has_sleeper(wq))
wake_up_interruptible(wq);
 }
 
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html