Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

Chen Gang Mon, 02 Sep 2013 22:42:47 -0700

Hello Maintainers:

Is this issue finished ?


If need additional help from me (e.g. some test things, or others, if
you have no time, can let me try), please let me know, I should try.


Thanks.

On 08/26/2013 10:21 AM, Chen Gang F T wrote:
> 
> Firstly, thank you for your reply with these details. 
> 
> On 08/26/2013 03:18 AM, Paul E. McKenney wrote:
>> On Thu, Aug 22, 2013 at 11:01:53AM +0800, Chen Gang wrote:
>>> On 08/21/2013 10:23 PM, Paul E. McKenney wrote:
>>>> On Wed, Aug 21, 2013 at 01:59:29PM +0800, Chen Gang wrote:
>>
>> [ . . . ]
>>
>>>> Don't get me wrong, I do welcome appropriate patches.  In fact, if
>>>> you look at RCU's git history, you will see that I frequently accept
>>>> patches from a fair number of people.  And if you were willing to
>>>> invest some time and thought, you might eventually be able to generate
>>>> an appropriate (albeit low priority) patch to this function.  However,
>>>> you seem to be motivated to submit small patches with a minimum of
>>>> thought and preparation, perhaps because you need to meet some external
>>>> or self-imposed quota of accepted patches.  And if you are in fact driven
>>>> by a quota that prevents you from taking the time required to carefully
>>>> think things through, you are wasting your time with RCU.
>>>
>>> Hmm... at least, some contents you said above is correct to me.
>>>
>>> At least, I should provide 10 patches per month, it is a necessary
>>> basic requirement to me.
>>
>> OK, that does help explain the otherwise inexplicable approach you have
>> been taking.  Let's see how you have been doing, based on committer date
>> in Linus's tree:
>>
>>       1 2012-11
>>      15 2013-01
>>       7 2013-02
>>      20 2013-03
>>      21 2013-04
>>      12 2013-05
>>      17 2013-06
>>      10 2013-07
>>
>> The last few months might be understated a bit due to patches
>> still being in maintainer trees.  This is a nice contrast from my
>> first impression of you from https://lkml.org/lkml/2013/6/9/64 and
>> https://lkml.org/lkml/2013/8/19/650, neither of which gave me any
>> reason to trust your work, to put it mildly.  And if I cannot trust
>> your work, I obviously cannot accept your patches.
>>
> 
> Hmm... better to check patches independent personal feelings (trust
> some one, or not).
> 
> ;-)
> 
> 
>> You do seem to select for localized bug fixes, which require less work
>> than the performance-motivated patches you were putting forward earlier
>> in this thread.  With a localized bug, you demonstrate the bug, show the
>> fix, and that is that.  From what I can see, part of the problem with
>> your patches in this email thread is that you are trying to move from
>> localized bug fixes to performance issues without doing the additional
>> work required.  Please see below for a rough outline of this additional
>> work.
>>
> 
> Hmm... it seems I need describe my work flow for fixing bugs in details.
> 
>   1. Is it a bug ?
>      if so, I can be marked as Reported-by and continue to 2nd.
>      else, it is a waste mail.
> 
>   2. Try to fix it in simple ways (so can save the maintainers time resource).
>      if it can be accepted by maintainers, it is OK (I can be Signed-off-by).
>      else need continue to 3rd.
> 
>        exception: if I can not find a simple way to fix it, I will send 
> [Suggestion] mail.
> 
>   3. Do the maintainers know how to fix it ?
>      if yes, fix it together with maintainers (may mark me only as 
> Reported-by).
>      else need continue to Last.
> 
>   Last: I should analyze it and fix it (it is my duty to fix it).
> 
> 
> How do you feel about this work flow ? welcome any suggestions or
> completions.
> 
> Thanks.
> 
>>> And what my focus is efficiency: let appliers and maintainers together
>>> to provide contributes to outside with efficiency.
>>
>> Sounds great, but there are many possible definitions of "efficiency".
>> Given your quota, I would expect your definition to involve number of
>> patches accepted.  In contrast, my definition for RCU instead involves
>> maintainability, robustness, scalability, and, for a few critical
>> code paths, performance.  I therefore need you to have thought through
>> and carefully tested your patch.
>>
> 
> Hmm... it seems I need give more description for the 'efficiency' which
> I point to.
> 
> If it is no negative effect with the quality, we need try to use less
> resources (e.g. time resources) to provide more contributions (e.g. fix
> issue).
> 
> 
>>> If you already know about it, why need I continue ?  but if you don't
>>> know either, I should try.
>>
>> What I need you to do in future RCU performance patch submissions is:
>>
>> 1.   Think through your patch and the code that it is modifying.
>>      If you submit a patch to me, you should be able to answer the
>>      sorts of questions that I was asking in this thread.
>>
>> 2.   Tell me what situations your patch helps and not.
>>
>> 3.   Tell me how much your patch improves performance in the
>>      situations where it helps.
>>
>> 4.   Test the code.  If it makes a measurable difference, present
>>      the performance results.  (It would be very surprising if your
>>      early-loop exit patch made a significant difference, expecially
>>      on a CONFIG_PREEMPT=n kernel.)
>>
>> 5.   Rather than randomly dropping into the code, use actual measurements
>>      to determine where to focus your performance-improvement efforts.
>>      Developers, even experienced ones, are really bad at guessing
>>      where the most important performance problems are.
>>
>> 6.   Use your judgement.  For example, 1000-line patch to improve a
>>      slowpath by 0.1% simply isn't worth it.  A high risk of adding
>>      bugs for a microscopic benefit?  Thanks, but no thanks!!!
>>
>> For your patch https://lkml.org/lkml/2013/8/19/651, which was closest
>> of the three to being useful, here are some things about RCU that you
>> should have taken the time to learn -before- submitting the patch:
>>
>> a.   Q:  How many iterations for the for_each_rcu_flavor() loop?
>>      A:  On CONFIG_PREEMPT=n kernels, only two iterations.
>>          On CONFIG_PREEMPT=y kernels, only three iterations.
>>
>> b.   Q:  Which flavor of RCU is most likely to have non-lazy callbacks
>>          queued?
>>          
>>      A:  On CONFIG_PREEMPT=y kernels, the first one in the list.
>>          For CONFIG_PREEMPT=n kernels, it is last in the list.
>>          (In other words, for CONFIG_PREEMPT=n kernels, this change
>>          won't help at all, at least not without also changing the
>>          order of the list.)
>>
>> c.   Q:  Do any of the other for_each_rcu_flavor() loops care what order
>>          the flavors are in?
>>
>>      A:  No.  (In other words, it is OK to reorder the list to improve
>>          the performance.)
>>
>> d.   Q:  What is the performance benefit of this change?
>>
>>      A:  Quite small, for example, much less than an atomic operation
>>          on a shared data item.  It is probably not possible to
>>          measure the performance difference.
>>
>> e.   Q:  Is the change on a hotpath?
>>
>>      A:  Somewhat.  It is not on the read side, but it is on the path
>>          to and from idle, which can be important for latency-sensitive
>>          workloads.
>>
>> f.   Q:  How did you test this patch?
>>
>>      A:  As far as I can see, you did no testing.
>>
>> If I receive a future patch from you that does not convince me that you
>> know the answer to questions like these, I will most likely ignore it.
>>
> 
> Hmm... it sounds reasonable for some cases.
> 
> e.g.
> 
>   when neither you nor me know about how to fix it.
> 
>   As a patch maker, I should continue trying to fix it.
>     (what you said above is valuable reference to me).
> 
>   As an integrator, you should give a necessary check for it.
>     (what you said above is the necessary check for it).
> 
> 
> If the integrator already know about how to fix it, it seems what you
> said above is not quite efficient.
> 
> 
>> Just for practice, let's rework your second patch to make it something
>> that I might accept.  Here is what you had:
>>
>>              for_each_rcu_flavor(rsp) {
>>                      rdp = per_cpu_ptr(rsp->rda, cpu);
>>      -               if (rdp->qlen != rdp->qlen_lazy)
>>      -                       al = false;
>>      -               if (rdp->nxtlist)
>>      +               if (rdp->nxtlist) {
>>                              hc = true;
>>      +                       if (rdp->qlen != rdp->qlen_lazy) {
>>      +                               al = false;
>>      +                               break;
>>      +                       }
>>      +               }
>>              }
>>              if (all_lazy)
>>                      *all_lazy = al;
>>
>> We need to do something about the indentation, perhaps as follows:
>>
>>              for_each_rcu_flavor(rsp) {
>>                      rdp = per_cpu_ptr(rsp->rda, cpu);
>>                      if (!rdp->nxtlist)
>>                              continue;
>>                      hc = true;
>>                      if (rdp->qlen != rdp->qlen_lazy) {
>>                              al = false;
>>                              break;
>>                      }
>>              }
>>              if (all_lazy)
>>                      *all_lazy = al;
>>
>>
>> We also need to change the following code in rcu_init() in the file
>> kernel/rcutree.c:
>>
>>      rcu_init_one(&rcu_sched_state, &rcu_sched_data);
>>      rcu_init_one(&rcu_bh_state, &rcu_bh_data);
>>      __rcu_init_preempt();
>>
>> So that it gets rcu_sched_state in the right place, which I believe is
>> like this:
>>
>>      rcu_init_one(&rcu_bh_state, &rcu_bh_data);
>>      rcu_init_one(&rcu_sched_state, &rcu_sched_data);
>>      __rcu_init_preempt();
>>
>>
> 
> At least for me, it sounds reasonable. It seems you have already know
> about how to fix it (you never directly say you know about it, so I use
> 'seems').
> 
> 
>> If you make these changes, test them with RCU_FAST_NO_HZ both set and
>> not set, and verify that rcu_sched_state is first in the flavor list
>> for kernels with PREEMPT=n and that rcu_preempt_state is first in flavor
>> list for kernels with PREEMPT=y, and send me a the resulting patch by end
>> of day Friday, China time, I will seriously consider it for acceptance.
>> Otherwise, I will author the patch myself with your Reported-by.
>>
> 
> If you have already know about how to fix it, please fix it as soon as
> possible when you have time (mark me as Reported-by is OK).
> 
> If you need additional help from me for this issue, please let me know,
> I should try.
> 
> 
> :-)
> 
> 
> Thanks.
> 
>> Again, good luck!
>>
>>                                                      Thanx, Paul
>>
>>>> Good luck!
>>>>
>>>
>>> Thanks.
>>>
>>>>                                                    Thanx, Paul
>>>>
>>>>> -------------------------------diff begin-------------------------------
>>>>>
>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>> index dbf74b5..1d02659 100644
>>>>> --- a/kernel/rcutree.c
>>>>> +++ b/kernel/rcutree.c
>>>>> @@ -2728,6 +2728,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool 
>>>>> *all_lazy)
>>>>>           if (rdp->nxtlist)
>>>>>                   hc = true;
>>>>>   }
>>>>> + BUG_ON(!hc && !al);
>>>>>   if (all_lazy)
>>>>>           *all_lazy = al;
>>>>>   return hc;
>>>>>
>>>>> -------------------------------diff end---------------------------------
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> On 08/20/2013 12:45 PM, Chen Gang wrote:
>>>>>> On 08/20/2013 12:43 PM, Chen Gang wrote:
>>>>>>> On 08/20/2013 12:18 PM, Paul E. McKenney wrote:
>>>>>>>> On Tue, Aug 20, 2013 at 11:51:23AM +0800, Chen Gang wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If 'hc' is false, 'al' will never be false, either (only need check
>>>>>>>>> "irdp->qlen != rdp->qlen_lazy' when 'rdp->nxtlist' existance).
>>>>>>>>>
>>>>>>>>> Recommend to improve the related code, like the diff below.
>>>>>>>>
>>>>>>>> Are you sure that this represents an improvement?  If so, why?
>>>>>>>>
>>>>>>>
>>>>>>> If 'hc' and 'al' really has relationships, better to let 'C code'
>>>>>>> express it, that will make the code clearer.
>>>>>>>
>>>>>>>> Or to put it another way, I see a patch that increases the size of the
>>>>>>>> kernel by three lines.  What is the corresponding benefit given common
>>>>>>>> kernel workloads?
>>>>>>>>
>>>>>>>
>>>>>>> For 'al', need not check for each looping, and for 'hc', may save the
>>>>>>> useless looping (so it can make performance better).
>>>>>>>
>>>>>>> For C code, it really increases 3 lines, but may not for assembly code
>>>>>>> (excuse me, I am not check it, I think it is not important, although it
>>>>>>> is easy to give a comparing for binary).
>>>>>>>
>>>>>>
>>>>>> Oh, sorry, I mean: only for our case, "it is not important".
>>>>>>
>>>>>>
>>>>>>>>                                                        Thanx, Paul
>>>>>>>>
>>>>>>>>> ----------------------------------diff 
>>>>>>>>> begin------------------------------------
>>>>>>>>>
>>>>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>>>>>> index 5b53a89..421caf0 100644
>>>>>>>>> --- a/kernel/rcutree.c
>>>>>>>>> +++ b/kernel/rcutree.c
>>>>>>>>> @@ -2719,10 +2719,13 @@ static int rcd'_cpu_has_callbacks(int cpu, 
>>>>>>>>> bool *all_lazy)
>>>>>>>>>
>>>>>>>>>       for_each_rcu_flavor(rsp) {
>>>>>>>>>               rdp = per_cpu_ptr(rsp->rda, cpu);
>>>>>>>>> -             if (rdp->qlen != rdp->qlen_lazy)
>>>>>>>>> -                     al = false;
>>>>>>>>> -             if (rdp->nxtlist)
>>>>>>>>> +             if (rdp->nxtlist) {
>>>>>>>>>                       hc = true;
>>>>>>>>> +                     if (rdp->qlen != rdp->qlen_lazy) {
>>>>>>>>> +                             al = false;
>>>>>>>>> +                             break;
>>>>>>>>> +                     }
>>>>>>>>> +             }
>>>>>>>>>       }
>>>>>>>>>       if (all_lazy)
>>>>>>>>>               *all_lazy = al;
>>>>>>>>>
>>>>>>>>> ----------------------------------diff 
>>>>>>>>> end--------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 08/20/2013 11:50 AM, Chen Gang wrote:
>>>>>>>>>> According to the comment above rcu_cpu_has_callbacks(): "If there are
>>>>>>>>>> no callbacks, all of them are deemed to be lazy".
>>>>>>>>>>
>>>>>>>>>> So when both 'hc' and 'al' are false, '*all_lazy' should be true, not
>>>>>>>>>> false.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Chen Gang <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>>  kernel/rcutree.c |    2 +-
>>>>>>>>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>>>>>>>>> index 5b53a89..9ee9565 100644
>>>>>>>>>> --- a/kernel/rcutree.c
>>>>>>>>>> +++ b/kernel/rcutree.c
>>>>>>>>>> @@ -2725,7 +2725,7 @@ static int rcu_cpu_has_callbacks(int cpu, bool 
>>>>>>>>>> *all_lazy)
>>>>>>>>>>                      hc = true;
>>>>>>>>>>      }
>>>>>>>>>>      if (all_lazy)
>>>>>>>>>> -            *all_lazy = al;
>>>>>>>>>> +            *all_lazy = !hc ? true : al;
>>>>>>>>>>      return hc;
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Chen Gang
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Chen Gang
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> -- 
>>> Chen Gang
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 
> 


-- 
Chen Gang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kernel/rcutree.c: deem to be lazy if there are no callbacks.

Reply via email to