Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-16 Thread Srivatsa S. Bhat
On 07/16/2013 03:05 PM, Viresh Kumar wrote:
> On 16 July 2013 14:59, Srivatsa S. Bhat
>  wrote:
>> On 07/16/2013 02:40 PM, Viresh Kumar wrote:
> 
>>> So, even if you don't keep the fallback storage, things should work
>>> without any issue (probably worth trying as this will get rid of a per
>>> cpu variable :))
>>>
>>
>> No, I already tried that and it didn't work ;-( The thing is, we need the
>> __cpufreq_add_dev() code to call the ->init() routines of drivers etc. But if
>> it finds the policy structure, it will skip all of that initialization and 
>> happily
>> proceed. Which is precisely the cause of all the erratic behaviour we are 
>> seeing
>> (ie., lack of proper initialization post-resume).
> 
> I missed that point. :)
> 
>> So this approach keeps the memory preserved in a fallback storage and lets 
>> the
>> init code run to full completion without any issues.
>>
>> Perhaps we could do some _more_ code reorganization in the future to take 
>> this
>> issue into account etc., but IMHO that might be non-trivial. I'm trying to 
>> keep
>> this as simple and straight-forward as possible as a first step, to atleast 
>> get
>> it properly working. (Changing the order in which init is done is kinda scary
>> since its hard to comprehend what assumptions we might be breaking!).
>>
>> We can perhaps revisit your idea later and optimize out the extra per-cpu 
>> data.
> 
> No, we don't need to optimize it that way. Current design looks good
> for now.

Cool! Thanks :)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-16 Thread Viresh Kumar
On 16 July 2013 14:59, Srivatsa S. Bhat
 wrote:
> On 07/16/2013 02:40 PM, Viresh Kumar wrote:

>> So, even if you don't keep the fallback storage, things should work
>> without any issue (probably worth trying as this will get rid of a per
>> cpu variable :))
>>
>
> No, I already tried that and it didn't work ;-( The thing is, we need the
> __cpufreq_add_dev() code to call the ->init() routines of drivers etc. But if
> it finds the policy structure, it will skip all of that initialization and 
> happily
> proceed. Which is precisely the cause of all the erratic behaviour we are 
> seeing
> (ie., lack of proper initialization post-resume).

I missed that point. :)

> So this approach keeps the memory preserved in a fallback storage and lets the
> init code run to full completion without any issues.
>
> Perhaps we could do some _more_ code reorganization in the future to take this
> issue into account etc., but IMHO that might be non-trivial. I'm trying to 
> keep
> this as simple and straight-forward as possible as a first step, to atleast 
> get
> it properly working. (Changing the order in which init is done is kinda scary
> since its hard to comprehend what assumptions we might be breaking!).
>
> We can perhaps revisit your idea later and optimize out the extra per-cpu 
> data.

No, we don't need to optimize it that way. Current design looks good
for now.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-16 Thread Srivatsa S. Bhat
On 07/16/2013 02:40 PM, Viresh Kumar wrote:
> On 16 July 2013 14:26, Srivatsa S. Bhat
>  wrote:
>> On 07/16/2013 11:45 AM, Viresh Kumar wrote:
> 
>>> To understand it I actually applied your patches to get better view of the 
>>> code.
>>> (Haven't tested it though).. And found that your code is doing the right 
>>> thing
>>> and we shouldn't get a mismatch.. This is the sequence of events I can draw:
>>>
>>> - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
>>> the no. of
>>> cpus in its clock domain.
>>> - _cpu_add_dev() for other cpus: doesn't change anything in refcount
>>>
>>> - Suspend:
>>>  - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the 
>>> value
>>> of count
>>> - Resume:
>>>  - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
>>> value of count.
>>>
>>
>> Actually this one is tricky (I took a look again). So we have this code in 
>> the
>> beginning of _cpufreq_add_dev():
>>
>>
>> 1008 #ifdef CONFIG_SMP
>> 1009 /* check whether a different CPU already registered this
>> 1010  * CPU because it is in the same boat. */
>> 1011 policy = cpufreq_cpu_get(cpu);
>> 1012 if (unlikely(policy)) {
>> 1013 cpufreq_cpu_put(policy);
>> 1014 return 0;
>> 1015 }
>>
>> The _get() is not controlled by the frozen flag, but it still doesn't take a
>> refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set 
>> to
>> NULL in __cpufreq_remove_dev() and the memory was saved away in fallback 
>> storage.
>> So, when __cpufreq_cpu_get() executes, it sees:
>>
>>  204 /* get the CPU */
>>  205 data = per_cpu(cpufreq_cpu_data, cpu);
>>  206
>>  207 if (!data)
>>  208 goto err_out_put_module;
>>
>> Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will 
>> return
>> silently.
> 
> Even if this wouldn't have happened, refcount wouldn't have been
> touched due to this code:
> 
>> 1012 if (unlikely(policy)) {
>> 1013 cpufreq_cpu_put(policy);
>> 1014 return 0;
>> 1015 }
> 
> i.e. If we get a valid policy structure, we siimply put the policy again
> and so decrement the incremented refcount.

Ah, yes!

> 
> So, even if you don't keep the fallback storage, things should work
> without any issue (probably worth trying as this will get rid of a per
> cpu variable :))
>

No, I already tried that and it didn't work ;-( The thing is, we need the
__cpufreq_add_dev() code to call the ->init() routines of drivers etc. But if
it finds the policy structure, it will skip all of that initialization and 
happily
proceed. Which is precisely the cause of all the erratic behaviour we are seeing
(ie., lack of proper initialization post-resume).

So this approach keeps the memory preserved in a fallback storage and lets the
init code run to full completion without any issues.

Perhaps we could do some _more_ code reorganization in the future to take this
issue into account etc., but IMHO that might be non-trivial. I'm trying to keep
this as simple and straight-forward as possible as a first step, to atleast get
it properly working. (Changing the order in which init is done is kinda scary
since its hard to comprehend what assumptions we might be breaking!).

We can perhaps revisit your idea later and optimize out the extra per-cpu data.
 
>> Further down in __cpufreq_add_dev(), we restore the original memory, using
>> the frozen flag:
>>
>> 1037 if (frozen)
>> 1038 /* Restore the saved policy when doing light-weight 
>> init */
>> 1039 policy = cpufreq_policy_restore(cpu);
>> 1040 else
>> 1041 policy = cpufreq_policy_alloc();
>>
>>
>> So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh
>> refcount while resuming :)
 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-16 Thread Viresh Kumar
On 16 July 2013 14:26, Srivatsa S. Bhat
 wrote:
> On 07/16/2013 11:45 AM, Viresh Kumar wrote:

>> To understand it I actually applied your patches to get better view of the 
>> code.
>> (Haven't tested it though).. And found that your code is doing the right 
>> thing
>> and we shouldn't get a mismatch.. This is the sequence of events I can draw:
>>
>> - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
>> the no. of
>> cpus in its clock domain.
>> - _cpu_add_dev() for other cpus: doesn't change anything in refcount
>>
>> - Suspend:
>>  - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value
>> of count
>> - Resume:
>>  - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
>> value of count.
>>
>
> Actually this one is tricky (I took a look again). So we have this code in the
> beginning of _cpufreq_add_dev():
>
>
> 1008 #ifdef CONFIG_SMP
> 1009 /* check whether a different CPU already registered this
> 1010  * CPU because it is in the same boat. */
> 1011 policy = cpufreq_cpu_get(cpu);
> 1012 if (unlikely(policy)) {
> 1013 cpufreq_cpu_put(policy);
> 1014 return 0;
> 1015 }
>
> The _get() is not controlled by the frozen flag, but it still doesn't take a
> refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to
> NULL in __cpufreq_remove_dev() and the memory was saved away in fallback 
> storage.
> So, when __cpufreq_cpu_get() executes, it sees:
>
>  204 /* get the CPU */
>  205 data = per_cpu(cpufreq_cpu_data, cpu);
>  206
>  207 if (!data)
>  208 goto err_out_put_module;
>
> Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will 
> return
> silently.

Even if this wouldn't have happened, refcount wouldn't have been
touched due to this code:

> 1012 if (unlikely(policy)) {
> 1013 cpufreq_cpu_put(policy);
> 1014 return 0;
> 1015 }

i.e. If we get a valid policy structure, we siimply put the policy again
and so decrement the incremented refcount.

So, even if you don't keep the fallback storage, things should work
without any issue (probably worth trying as this will get rid of a per
cpu variable :))

> Further down in __cpufreq_add_dev(), we restore the original memory, using
> the frozen flag:
>
> 1037 if (frozen)
> 1038 /* Restore the saved policy when doing light-weight init 
> */
> 1039 policy = cpufreq_policy_restore(cpu);
> 1040 else
> 1041 policy = cpufreq_policy_alloc();
>
>
> So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh
> refcount while resuming :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-16 Thread Srivatsa S. Bhat
On 07/16/2013 11:45 AM, Viresh Kumar wrote:
> On 15 July 2013 15:35, Srivatsa S. Bhat
>  wrote:
>> Actually even I was wondering about this while writing the patch and
>> I even tested shutdown after multiple suspend/resume cycles, to verify that
>> the refcount is messed up. But surprisingly, things worked just fine.
>>
>> Logically there should've been a refcount mismatch and things should have
>> failed, but everything worked fine during my tests. Apart from suspend/resume
>> and shutdown tests, I even tried mixing a few regular CPU hotplug operations
>> (echo 0/1 to sysfs online files), but nothing stood out.
>>
>> Sorry, I forgot to document this in the patch. Either the patch is wrong
>> or something else is silently fixing this up. Not sure what is the exact
>> situation.
> 
> To understand it I actually applied your patches to get better view of the 
> code.
> (Haven't tested it though).. And found that your code is doing the right thing
> and we shouldn't get a mismatch.. This is the sequence of events I can draw:
> 
> - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
> the no. of
> cpus in its clock domain.
> - _cpu_add_dev() for other cpus: doesn't change anything in refcount
> 
> - Suspend:
>  - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value
> of count
> - Resume:
>  - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
> value of count.
>

Actually this one is tricky (I took a look again). So we have this code in the
beginning of _cpufreq_add_dev():


1008 #ifdef CONFIG_SMP
1009 /* check whether a different CPU already registered this
1010  * CPU because it is in the same boat. */
1011 policy = cpufreq_cpu_get(cpu);
1012 if (unlikely(policy)) {
1013 cpufreq_cpu_put(policy);
1014 return 0;
1015 }

The _get() is not controlled by the frozen flag, but it still doesn't take a
refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to
NULL in __cpufreq_remove_dev() and the memory was saved away in fallback 
storage.
So, when __cpufreq_cpu_get() executes, it sees:

 204 /* get the CPU */
 205 data = per_cpu(cpufreq_cpu_data, cpu);
 206 
 207 if (!data)
 208 goto err_out_put_module;

Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will 
return
silently.

Further down in __cpufreq_add_dev(), we restore the original memory, using
the frozen flag:

1037 if (frozen)
1038 /* Restore the saved policy when doing light-weight init */
1039 policy = cpufreq_policy_restore(cpu);
1040 else
1041 policy = cpufreq_policy_alloc();


So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh
refcount while resuming :)
 
> And so things work as expected. That's why your code isn't breaking anything I
> believe.
> 

Thanks a lot for the code inspection and your detailed analysis!

> But can no. of cpus change inbetween suspend/resume? Then count would be
> tricky as we are using the same policy structure.
> 

No, number of CPUs won't change in between suspend/resume. Even if somebody
tried that, that would be an eccentric case and we won't handle that.
Besides, *many more* things will break than just cpufreq, if somebody actually
tries that out!

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-16 Thread Viresh Kumar
On 15 July 2013 15:35, Srivatsa S. Bhat
 wrote:
> Actually even I was wondering about this while writing the patch and
> I even tested shutdown after multiple suspend/resume cycles, to verify that
> the refcount is messed up. But surprisingly, things worked just fine.
>
> Logically there should've been a refcount mismatch and things should have
> failed, but everything worked fine during my tests. Apart from suspend/resume
> and shutdown tests, I even tried mixing a few regular CPU hotplug operations
> (echo 0/1 to sysfs online files), but nothing stood out.
>
> Sorry, I forgot to document this in the patch. Either the patch is wrong
> or something else is silently fixing this up. Not sure what is the exact
> situation.

To understand it I actually applied your patches to get better view of the code.
(Haven't tested it though).. And found that your code is doing the right thing
and we shouldn't get a mismatch.. This is the sequence of events I can draw:

- __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
the no. of
cpus in its clock domain.
- _cpu_add_dev() for other cpus: doesn't change anything in refcount

- Suspend:
 - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value
of count
- Resume:
 - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
value of count.

And so things work as expected. That's why your code isn't breaking anything I
believe.

But can no. of cpus change inbetween suspend/resume? Then count would be
tricky as we are using the same policy structure.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-16 Thread Viresh Kumar
On 15 July 2013 15:35, Srivatsa S. Bhat
srivatsa.b...@linux.vnet.ibm.com wrote:
 Actually even I was wondering about this while writing the patch and
 I even tested shutdown after multiple suspend/resume cycles, to verify that
 the refcount is messed up. But surprisingly, things worked just fine.

 Logically there should've been a refcount mismatch and things should have
 failed, but everything worked fine during my tests. Apart from suspend/resume
 and shutdown tests, I even tried mixing a few regular CPU hotplug operations
 (echo 0/1 to sysfs online files), but nothing stood out.

 Sorry, I forgot to document this in the patch. Either the patch is wrong
 or something else is silently fixing this up. Not sure what is the exact
 situation.

To understand it I actually applied your patches to get better view of the code.
(Haven't tested it though).. And found that your code is doing the right thing
and we shouldn't get a mismatch.. This is the sequence of events I can draw:

- __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
the no. of
cpus in its clock domain.
- _cpu_add_dev() for other cpus: doesn't change anything in refcount

- Suspend:
 - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value
of count
- Resume:
 - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
value of count.

And so things work as expected. That's why your code isn't breaking anything I
believe.

But can no. of cpus change inbetween suspend/resume? Then count would be
tricky as we are using the same policy structure.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-16 Thread Srivatsa S. Bhat
On 07/16/2013 11:45 AM, Viresh Kumar wrote:
 On 15 July 2013 15:35, Srivatsa S. Bhat
 srivatsa.b...@linux.vnet.ibm.com wrote:
 Actually even I was wondering about this while writing the patch and
 I even tested shutdown after multiple suspend/resume cycles, to verify that
 the refcount is messed up. But surprisingly, things worked just fine.

 Logically there should've been a refcount mismatch and things should have
 failed, but everything worked fine during my tests. Apart from suspend/resume
 and shutdown tests, I even tried mixing a few regular CPU hotplug operations
 (echo 0/1 to sysfs online files), but nothing stood out.

 Sorry, I forgot to document this in the patch. Either the patch is wrong
 or something else is silently fixing this up. Not sure what is the exact
 situation.
 
 To understand it I actually applied your patches to get better view of the 
 code.
 (Haven't tested it though).. And found that your code is doing the right thing
 and we shouldn't get a mismatch.. This is the sequence of events I can draw:
 
 - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
 the no. of
 cpus in its clock domain.
 - _cpu_add_dev() for other cpus: doesn't change anything in refcount
 
 - Suspend:
  - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value
 of count
 - Resume:
  - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
 value of count.


Actually this one is tricky (I took a look again). So we have this code in the
beginning of _cpufreq_add_dev():


1008 #ifdef CONFIG_SMP
1009 /* check whether a different CPU already registered this
1010  * CPU because it is in the same boat. */
1011 policy = cpufreq_cpu_get(cpu);
1012 if (unlikely(policy)) {
1013 cpufreq_cpu_put(policy);
1014 return 0;
1015 }

The _get() is not controlled by the frozen flag, but it still doesn't take a
refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to
NULL in __cpufreq_remove_dev() and the memory was saved away in fallback 
storage.
So, when __cpufreq_cpu_get() executes, it sees:

 204 /* get the CPU */
 205 data = per_cpu(cpufreq_cpu_data, cpu);
 206 
 207 if (!data)
 208 goto err_out_put_module;

Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will 
return
silently.

Further down in __cpufreq_add_dev(), we restore the original memory, using
the frozen flag:

1037 if (frozen)
1038 /* Restore the saved policy when doing light-weight init */
1039 policy = cpufreq_policy_restore(cpu);
1040 else
1041 policy = cpufreq_policy_alloc();


So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh
refcount while resuming :)
 
 And so things work as expected. That's why your code isn't breaking anything I
 believe.
 

Thanks a lot for the code inspection and your detailed analysis!

 But can no. of cpus change inbetween suspend/resume? Then count would be
 tricky as we are using the same policy structure.
 

No, number of CPUs won't change in between suspend/resume. Even if somebody
tried that, that would be an eccentric case and we won't handle that.
Besides, *many more* things will break than just cpufreq, if somebody actually
tries that out!

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-16 Thread Viresh Kumar
On 16 July 2013 14:26, Srivatsa S. Bhat
srivatsa.b...@linux.vnet.ibm.com wrote:
 On 07/16/2013 11:45 AM, Viresh Kumar wrote:

 To understand it I actually applied your patches to get better view of the 
 code.
 (Haven't tested it though).. And found that your code is doing the right 
 thing
 and we shouldn't get a mismatch.. This is the sequence of events I can draw:

 - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
 the no. of
 cpus in its clock domain.
 - _cpu_add_dev() for other cpus: doesn't change anything in refcount

 - Suspend:
  - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value
 of count
 - Resume:
  - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
 value of count.


 Actually this one is tricky (I took a look again). So we have this code in the
 beginning of _cpufreq_add_dev():


 1008 #ifdef CONFIG_SMP
 1009 /* check whether a different CPU already registered this
 1010  * CPU because it is in the same boat. */
 1011 policy = cpufreq_cpu_get(cpu);
 1012 if (unlikely(policy)) {
 1013 cpufreq_cpu_put(policy);
 1014 return 0;
 1015 }

 The _get() is not controlled by the frozen flag, but it still doesn't take a
 refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to
 NULL in __cpufreq_remove_dev() and the memory was saved away in fallback 
 storage.
 So, when __cpufreq_cpu_get() executes, it sees:

  204 /* get the CPU */
  205 data = per_cpu(cpufreq_cpu_data, cpu);
  206
  207 if (!data)
  208 goto err_out_put_module;

 Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will 
 return
 silently.

Even if this wouldn't have happened, refcount wouldn't have been
touched due to this code:

 1012 if (unlikely(policy)) {
 1013 cpufreq_cpu_put(policy);
 1014 return 0;
 1015 }

i.e. If we get a valid policy structure, we siimply put the policy again
and so decrement the incremented refcount.

So, even if you don't keep the fallback storage, things should work
without any issue (probably worth trying as this will get rid of a per
cpu variable :))

 Further down in __cpufreq_add_dev(), we restore the original memory, using
 the frozen flag:

 1037 if (frozen)
 1038 /* Restore the saved policy when doing light-weight init 
 */
 1039 policy = cpufreq_policy_restore(cpu);
 1040 else
 1041 policy = cpufreq_policy_alloc();


 So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh
 refcount while resuming :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-16 Thread Srivatsa S. Bhat
On 07/16/2013 02:40 PM, Viresh Kumar wrote:
 On 16 July 2013 14:26, Srivatsa S. Bhat
 srivatsa.b...@linux.vnet.ibm.com wrote:
 On 07/16/2013 11:45 AM, Viresh Kumar wrote:
 
 To understand it I actually applied your patches to get better view of the 
 code.
 (Haven't tested it though).. And found that your code is doing the right 
 thing
 and we shouldn't get a mismatch.. This is the sequence of events I can draw:

 - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
 the no. of
 cpus in its clock domain.
 - _cpu_add_dev() for other cpus: doesn't change anything in refcount

 - Suspend:
  - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the 
 value
 of count
 - Resume:
  - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
 value of count.


 Actually this one is tricky (I took a look again). So we have this code in 
 the
 beginning of _cpufreq_add_dev():


 1008 #ifdef CONFIG_SMP
 1009 /* check whether a different CPU already registered this
 1010  * CPU because it is in the same boat. */
 1011 policy = cpufreq_cpu_get(cpu);
 1012 if (unlikely(policy)) {
 1013 cpufreq_cpu_put(policy);
 1014 return 0;
 1015 }

 The _get() is not controlled by the frozen flag, but it still doesn't take a
 refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set 
 to
 NULL in __cpufreq_remove_dev() and the memory was saved away in fallback 
 storage.
 So, when __cpufreq_cpu_get() executes, it sees:

  204 /* get the CPU */
  205 data = per_cpu(cpufreq_cpu_data, cpu);
  206
  207 if (!data)
  208 goto err_out_put_module;

 Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will 
 return
 silently.
 
 Even if this wouldn't have happened, refcount wouldn't have been
 touched due to this code:
 
 1012 if (unlikely(policy)) {
 1013 cpufreq_cpu_put(policy);
 1014 return 0;
 1015 }
 
 i.e. If we get a valid policy structure, we siimply put the policy again
 and so decrement the incremented refcount.

Ah, yes!

 
 So, even if you don't keep the fallback storage, things should work
 without any issue (probably worth trying as this will get rid of a per
 cpu variable :))


No, I already tried that and it didn't work ;-( The thing is, we need the
__cpufreq_add_dev() code to call the -init() routines of drivers etc. But if
it finds the policy structure, it will skip all of that initialization and 
happily
proceed. Which is precisely the cause of all the erratic behaviour we are seeing
(ie., lack of proper initialization post-resume).

So this approach keeps the memory preserved in a fallback storage and lets the
init code run to full completion without any issues.

Perhaps we could do some _more_ code reorganization in the future to take this
issue into account etc., but IMHO that might be non-trivial. I'm trying to keep
this as simple and straight-forward as possible as a first step, to atleast get
it properly working. (Changing the order in which init is done is kinda scary
since its hard to comprehend what assumptions we might be breaking!).

We can perhaps revisit your idea later and optimize out the extra per-cpu data.
 
 Further down in __cpufreq_add_dev(), we restore the original memory, using
 the frozen flag:

 1037 if (frozen)
 1038 /* Restore the saved policy when doing light-weight 
 init */
 1039 policy = cpufreq_policy_restore(cpu);
 1040 else
 1041 policy = cpufreq_policy_alloc();


 So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh
 refcount while resuming :)
 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-16 Thread Viresh Kumar
On 16 July 2013 14:59, Srivatsa S. Bhat
srivatsa.b...@linux.vnet.ibm.com wrote:
 On 07/16/2013 02:40 PM, Viresh Kumar wrote:

 So, even if you don't keep the fallback storage, things should work
 without any issue (probably worth trying as this will get rid of a per
 cpu variable :))


 No, I already tried that and it didn't work ;-( The thing is, we need the
 __cpufreq_add_dev() code to call the -init() routines of drivers etc. But if
 it finds the policy structure, it will skip all of that initialization and 
 happily
 proceed. Which is precisely the cause of all the erratic behaviour we are 
 seeing
 (ie., lack of proper initialization post-resume).

I missed that point. :)

 So this approach keeps the memory preserved in a fallback storage and lets the
 init code run to full completion without any issues.

 Perhaps we could do some _more_ code reorganization in the future to take this
 issue into account etc., but IMHO that might be non-trivial. I'm trying to 
 keep
 this as simple and straight-forward as possible as a first step, to atleast 
 get
 it properly working. (Changing the order in which init is done is kinda scary
 since its hard to comprehend what assumptions we might be breaking!).

 We can perhaps revisit your idea later and optimize out the extra per-cpu 
 data.

No, we don't need to optimize it that way. Current design looks good
for now.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-16 Thread Srivatsa S. Bhat
On 07/16/2013 03:05 PM, Viresh Kumar wrote:
 On 16 July 2013 14:59, Srivatsa S. Bhat
 srivatsa.b...@linux.vnet.ibm.com wrote:
 On 07/16/2013 02:40 PM, Viresh Kumar wrote:
 
 So, even if you don't keep the fallback storage, things should work
 without any issue (probably worth trying as this will get rid of a per
 cpu variable :))


 No, I already tried that and it didn't work ;-( The thing is, we need the
 __cpufreq_add_dev() code to call the -init() routines of drivers etc. But if
 it finds the policy structure, it will skip all of that initialization and 
 happily
 proceed. Which is precisely the cause of all the erratic behaviour we are 
 seeing
 (ie., lack of proper initialization post-resume).
 
 I missed that point. :)
 
 So this approach keeps the memory preserved in a fallback storage and lets 
 the
 init code run to full completion without any issues.

 Perhaps we could do some _more_ code reorganization in the future to take 
 this
 issue into account etc., but IMHO that might be non-trivial. I'm trying to 
 keep
 this as simple and straight-forward as possible as a first step, to atleast 
 get
 it properly working. (Changing the order in which init is done is kinda scary
 since its hard to comprehend what assumptions we might be breaking!).

 We can perhaps revisit your idea later and optimize out the extra per-cpu 
 data.
 
 No, we don't need to optimize it that way. Current design looks good
 for now.

Cool! Thanks :)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-15 Thread Srivatsa S. Bhat
On 07/15/2013 05:05 PM, Rafael J. Wysocki wrote:
> On Monday, July 15, 2013 03:35:04 PM Srivatsa S. Bhat wrote:
>> On 07/15/2013 03:25 PM, Viresh Kumar wrote:
>>> Hi Srivatsa,
>>>
>>> I may be wrong but it looks something is wrong in this patch.
>>>
>>> On 12 July 2013 03:47, Srivatsa S. Bhat
>>>  wrote:
 diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>>
 @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
 if ((cpus == 1) && (cpufreq_driver->target))
 __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);

 -   pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
 -   cpufreq_cpu_put(data);
 +   if (!frozen) {
 +   pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
 +   cpufreq_cpu_put(data);
>>>
>>> So, we don't decrement usage count here. But we are still increasing
>>> counts on cpufreq_add_dev after resume, isn't it?
>>>
>>> So, we wouldn't be able to free policy struct once all the cpus of a
>>> policy are removed after suspend/resume has happened once.
>>>
>>
>> Actually even I was wondering about this while writing the patch and
>> I even tested shutdown after multiple suspend/resume cycles, to verify that
>> the refcount is messed up. But surprisingly, things worked just fine.
>>
>> Logically there should've been a refcount mismatch and things should have
>> failed, but everything worked fine during my tests. Apart from suspend/resume
>> and shutdown tests, I even tried mixing a few regular CPU hotplug operations
>> (echo 0/1 to sysfs online files), but nothing stood out.
>>
>> Sorry, I forgot to document this in the patch. Either the patch is wrong
>> or something else is silently fixing this up. Not sure what is the exact
>> situation.
> 
> OK, so I'm not going to queue [2-8/8] up until we find out what's going on
> here (and until Toralf tells me that it doesn't break his system any more).
> 

Ok, that sounds good.

> I've queued up [1/8] for 3.11 already.
> 

Thank you!
 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-15 Thread Srivatsa S. Bhat
On 07/15/2013 03:51 PM, Viresh Kumar wrote:
> On 15 July 2013 15:35, Srivatsa S. Bhat
>  wrote:
>> Actually even I was wondering about this while writing the patch and
>> I even tested shutdown after multiple suspend/resume cycles, to verify that
>> the refcount is messed up. But surprisingly, things worked just fine.
> 
> What kind of system have you tested it on?
> 

The system has 2 sockets with 8 cores each, and has Intel Sandybridge
CPUs. I had used a local patch to simulate CPU hotplug in the suspend-to-ram
path using the freeze state of pm_test (because I had other problems in
using the 'processors' state of pm_test). The patch is shown below:


diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index ece0422..fe07b77 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -342,8 +342,13 @@ static int enter_state(suspend_state_t state)
if (error)
goto Unlock;
 
-   if (suspend_test(TEST_FREEZER))
+   if (suspend_test(TEST_FREEZER)) {
+   pr_debug("Disabling nonboot CPUs\n");
+   disable_nonboot_cpus();
+   pr_debug("Enabling nonboot CPUs\n");
+   enable_nonboot_cpus();
goto Finish;
+   }
 
pr_debug("PM: Entering %s sleep\n", pm_states[state]);
pm_restrict_gfp_mask();


 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-15 Thread Rafael J. Wysocki
On Monday, July 15, 2013 03:35:04 PM Srivatsa S. Bhat wrote:
> On 07/15/2013 03:25 PM, Viresh Kumar wrote:
> > Hi Srivatsa,
> > 
> > I may be wrong but it looks something is wrong in this patch.
> > 
> > On 12 July 2013 03:47, Srivatsa S. Bhat
> >  wrote:
> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > 
> >> @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
> >> if ((cpus == 1) && (cpufreq_driver->target))
> >> __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
> >>
> >> -   pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
> >> -   cpufreq_cpu_put(data);
> >> +   if (!frozen) {
> >> +   pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
> >> +   cpufreq_cpu_put(data);
> > 
> > So, we don't decrement usage count here. But we are still increasing
> > counts on cpufreq_add_dev after resume, isn't it?
> > 
> > So, we wouldn't be able to free policy struct once all the cpus of a
> > policy are removed after suspend/resume has happened once.
> > 
> 
> Actually even I was wondering about this while writing the patch and
> I even tested shutdown after multiple suspend/resume cycles, to verify that
> the refcount is messed up. But surprisingly, things worked just fine.
> 
> Logically there should've been a refcount mismatch and things should have
> failed, but everything worked fine during my tests. Apart from suspend/resume
> and shutdown tests, I even tried mixing a few regular CPU hotplug operations
> (echo 0/1 to sysfs online files), but nothing stood out.
> 
> Sorry, I forgot to document this in the patch. Either the patch is wrong
> or something else is silently fixing this up. Not sure what is the exact
> situation.

OK, so I'm not going to queue [2-8/8] up until we find out what's going on
here (and until Toralf tells me that it doesn't break his system any more).

I've queued up [1/8] for 3.11 already.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-15 Thread Viresh Kumar
On 15 July 2013 15:35, Srivatsa S. Bhat
 wrote:
> Actually even I was wondering about this while writing the patch and
> I even tested shutdown after multiple suspend/resume cycles, to verify that
> the refcount is messed up. But surprisingly, things worked just fine.

What kind of system have you tested it on?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-15 Thread Srivatsa S. Bhat
On 07/15/2013 03:25 PM, Viresh Kumar wrote:
> Hi Srivatsa,
> 
> I may be wrong but it looks something is wrong in this patch.
> 
> On 12 July 2013 03:47, Srivatsa S. Bhat
>  wrote:
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> 
>> @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
>> if ((cpus == 1) && (cpufreq_driver->target))
>> __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
>>
>> -   pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
>> -   cpufreq_cpu_put(data);
>> +   if (!frozen) {
>> +   pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
>> +   cpufreq_cpu_put(data);
> 
> So, we don't decrement usage count here. But we are still increasing
> counts on cpufreq_add_dev after resume, isn't it?
> 
> So, we wouldn't be able to free policy struct once all the cpus of a
> policy are removed after suspend/resume has happened once.
> 

Actually even I was wondering about this while writing the patch and
I even tested shutdown after multiple suspend/resume cycles, to verify that
the refcount is messed up. But surprisingly, things worked just fine.

Logically there should've been a refcount mismatch and things should have
failed, but everything worked fine during my tests. Apart from suspend/resume
and shutdown tests, I even tried mixing a few regular CPU hotplug operations
(echo 0/1 to sysfs online files), but nothing stood out.

Sorry, I forgot to document this in the patch. Either the patch is wrong
or something else is silently fixing this up. Not sure what is the exact
situation.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-15 Thread Viresh Kumar
Hi Srivatsa,

I may be wrong but it looks something is wrong in this patch.

On 12 July 2013 03:47, Srivatsa S. Bhat
 wrote:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c

> @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
> if ((cpus == 1) && (cpufreq_driver->target))
> __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
>
> -   pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
> -   cpufreq_cpu_put(data);
> +   if (!frozen) {
> +   pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
> +   cpufreq_cpu_put(data);

So, we don't decrement usage count here. But we are still increasing
counts on cpufreq_add_dev after resume, isn't it?

So, we wouldn't be able to free policy struct once all the cpus of a
policy are removed after suspend/resume has happened once.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-15 Thread Viresh Kumar
Hi Srivatsa,

I may be wrong but it looks something is wrong in this patch.

On 12 July 2013 03:47, Srivatsa S. Bhat
srivatsa.b...@linux.vnet.ibm.com wrote:
 diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c

 @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
 if ((cpus == 1)  (cpufreq_driver-target))
 __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);

 -   pr_debug(%s: removing link, cpu: %d\n, __func__, cpu);
 -   cpufreq_cpu_put(data);
 +   if (!frozen) {
 +   pr_debug(%s: removing link, cpu: %d\n, __func__, cpu);
 +   cpufreq_cpu_put(data);

So, we don't decrement usage count here. But we are still increasing
counts on cpufreq_add_dev after resume, isn't it?

So, we wouldn't be able to free policy struct once all the cpus of a
policy are removed after suspend/resume has happened once.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-15 Thread Srivatsa S. Bhat
On 07/15/2013 03:25 PM, Viresh Kumar wrote:
 Hi Srivatsa,
 
 I may be wrong but it looks something is wrong in this patch.
 
 On 12 July 2013 03:47, Srivatsa S. Bhat
 srivatsa.b...@linux.vnet.ibm.com wrote:
 diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
 
 @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
 if ((cpus == 1)  (cpufreq_driver-target))
 __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);

 -   pr_debug(%s: removing link, cpu: %d\n, __func__, cpu);
 -   cpufreq_cpu_put(data);
 +   if (!frozen) {
 +   pr_debug(%s: removing link, cpu: %d\n, __func__, cpu);
 +   cpufreq_cpu_put(data);
 
 So, we don't decrement usage count here. But we are still increasing
 counts on cpufreq_add_dev after resume, isn't it?
 
 So, we wouldn't be able to free policy struct once all the cpus of a
 policy are removed after suspend/resume has happened once.
 

Actually even I was wondering about this while writing the patch and
I even tested shutdown after multiple suspend/resume cycles, to verify that
the refcount is messed up. But surprisingly, things worked just fine.

Logically there should've been a refcount mismatch and things should have
failed, but everything worked fine during my tests. Apart from suspend/resume
and shutdown tests, I even tried mixing a few regular CPU hotplug operations
(echo 0/1 to sysfs online files), but nothing stood out.

Sorry, I forgot to document this in the patch. Either the patch is wrong
or something else is silently fixing this up. Not sure what is the exact
situation.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-15 Thread Viresh Kumar
On 15 July 2013 15:35, Srivatsa S. Bhat
srivatsa.b...@linux.vnet.ibm.com wrote:
 Actually even I was wondering about this while writing the patch and
 I even tested shutdown after multiple suspend/resume cycles, to verify that
 the refcount is messed up. But surprisingly, things worked just fine.

What kind of system have you tested it on?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-15 Thread Rafael J. Wysocki
On Monday, July 15, 2013 03:35:04 PM Srivatsa S. Bhat wrote:
 On 07/15/2013 03:25 PM, Viresh Kumar wrote:
  Hi Srivatsa,
  
  I may be wrong but it looks something is wrong in this patch.
  
  On 12 July 2013 03:47, Srivatsa S. Bhat
  srivatsa.b...@linux.vnet.ibm.com wrote:
  diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
  
  @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
  if ((cpus == 1)  (cpufreq_driver-target))
  __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
 
  -   pr_debug(%s: removing link, cpu: %d\n, __func__, cpu);
  -   cpufreq_cpu_put(data);
  +   if (!frozen) {
  +   pr_debug(%s: removing link, cpu: %d\n, __func__, cpu);
  +   cpufreq_cpu_put(data);
  
  So, we don't decrement usage count here. But we are still increasing
  counts on cpufreq_add_dev after resume, isn't it?
  
  So, we wouldn't be able to free policy struct once all the cpus of a
  policy are removed after suspend/resume has happened once.
  
 
 Actually even I was wondering about this while writing the patch and
 I even tested shutdown after multiple suspend/resume cycles, to verify that
 the refcount is messed up. But surprisingly, things worked just fine.
 
 Logically there should've been a refcount mismatch and things should have
 failed, but everything worked fine during my tests. Apart from suspend/resume
 and shutdown tests, I even tried mixing a few regular CPU hotplug operations
 (echo 0/1 to sysfs online files), but nothing stood out.
 
 Sorry, I forgot to document this in the patch. Either the patch is wrong
 or something else is silently fixing this up. Not sure what is the exact
 situation.

OK, so I'm not going to queue [2-8/8] up until we find out what's going on
here (and until Toralf tells me that it doesn't break his system any more).

I've queued up [1/8] for 3.11 already.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-15 Thread Srivatsa S. Bhat
On 07/15/2013 03:51 PM, Viresh Kumar wrote:
 On 15 July 2013 15:35, Srivatsa S. Bhat
 srivatsa.b...@linux.vnet.ibm.com wrote:
 Actually even I was wondering about this while writing the patch and
 I even tested shutdown after multiple suspend/resume cycles, to verify that
 the refcount is messed up. But surprisingly, things worked just fine.
 
 What kind of system have you tested it on?
 

The system has 2 sockets with 8 cores each, and has Intel Sandybridge
CPUs. I had used a local patch to simulate CPU hotplug in the suspend-to-ram
path using the freeze state of pm_test (because I had other problems in
using the 'processors' state of pm_test). The patch is shown below:


diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index ece0422..fe07b77 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -342,8 +342,13 @@ static int enter_state(suspend_state_t state)
if (error)
goto Unlock;
 
-   if (suspend_test(TEST_FREEZER))
+   if (suspend_test(TEST_FREEZER)) {
+   pr_debug(Disabling nonboot CPUs\n);
+   disable_nonboot_cpus();
+   pr_debug(Enabling nonboot CPUs\n);
+   enable_nonboot_cpus();
goto Finish;
+   }
 
pr_debug(PM: Entering %s sleep\n, pm_states[state]);
pm_restrict_gfp_mask();


 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-15 Thread Srivatsa S. Bhat
On 07/15/2013 05:05 PM, Rafael J. Wysocki wrote:
 On Monday, July 15, 2013 03:35:04 PM Srivatsa S. Bhat wrote:
 On 07/15/2013 03:25 PM, Viresh Kumar wrote:
 Hi Srivatsa,

 I may be wrong but it looks something is wrong in this patch.

 On 12 July 2013 03:47, Srivatsa S. Bhat
 srivatsa.b...@linux.vnet.ibm.com wrote:
 diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c

 @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
 if ((cpus == 1)  (cpufreq_driver-target))
 __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);

 -   pr_debug(%s: removing link, cpu: %d\n, __func__, cpu);
 -   cpufreq_cpu_put(data);
 +   if (!frozen) {
 +   pr_debug(%s: removing link, cpu: %d\n, __func__, cpu);
 +   cpufreq_cpu_put(data);

 So, we don't decrement usage count here. But we are still increasing
 counts on cpufreq_add_dev after resume, isn't it?

 So, we wouldn't be able to free policy struct once all the cpus of a
 policy are removed after suspend/resume has happened once.


 Actually even I was wondering about this while writing the patch and
 I even tested shutdown after multiple suspend/resume cycles, to verify that
 the refcount is messed up. But surprisingly, things worked just fine.

 Logically there should've been a refcount mismatch and things should have
 failed, but everything worked fine during my tests. Apart from suspend/resume
 and shutdown tests, I even tried mixing a few regular CPU hotplug operations
 (echo 0/1 to sysfs online files), but nothing stood out.

 Sorry, I forgot to document this in the patch. Either the patch is wrong
 or something else is silently fixing this up. Not sure what is the exact
 situation.
 
 OK, so I'm not going to queue [2-8/8] up until we find out what's going on
 here (and until Toralf tells me that it doesn't break his system any more).
 

Ok, that sounds good.

 I've queued up [1/8] for 3.11 already.
 

Thank you!
 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-11 Thread Srivatsa S. Bhat
To perform light-weight cpu-init and teardown in the cpufreq subsystem
during suspend/resume, we need to separate out the 2 main functionalities
of the cpufreq CPU hotplug callbacks, as outlined below:

1. Init/tear-down of core cpufreq and CPU-specific components, which are
   critical to the correct functioning of the cpufreq subsystem.

2. Init/tear-down of cpufreq sysfs files during suspend/resume.

The first part requires accurate updates to the policy structure such as
its ->cpus and ->related_cpus masks, whereas the second part requires that
the policy->kobj structure is not released or re-initialized during
suspend/resume.

To handle both these requirements, we need to allow updates to the policy
structure throughout suspend/resume, but prevent the structure from getting
freed up. Also, we must have a mechanism by which the cpu-up callbacks can
restore the policy structure, without allocating things afresh. (That also
helps avoid memory leaks).

To achieve this, we use 2 schemes:
a. Use a fallback per-cpu storage area for preserving the policy structures
   during suspend, so that they can be restored during resume appropriately.

b. Use the 'frozen' flag to determine when to free or allocate the policy
   structure vs when to restore the policy from the saved fallback storage.
   Thus we can successfully preserve the structure across suspend/resume.

Effectively, this helps us complete the separation of the 'light-weight'
and the 'full' init/tear-down sequences in the cpufreq subsystem, so that
this can be made use of in the suspend/resume scenario.

Signed-off-by: Srivatsa S. Bhat 
---

 drivers/cpufreq/cpufreq.c |   69 ++---
 1 file changed, 52 insertions(+), 17 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 1128753..15ced5f 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -44,6 +44,7 @@
  */
 static struct cpufreq_driver *cpufreq_driver;
 static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
+static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data_fallback);
 static DEFINE_RWLOCK(cpufreq_driver_lock);
 static DEFINE_MUTEX(cpufreq_governor_lock);
 
@@ -942,6 +943,20 @@ static int cpufreq_add_policy_cpu(unsigned int cpu, 
unsigned int sibling,
 }
 #endif
 
+static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu)
+{
+   struct cpufreq_policy *policy;
+   unsigned long flags;
+
+   write_lock_irqsave(_driver_lock, flags);
+
+   policy = per_cpu(cpufreq_cpu_data_fallback, cpu);
+
+   write_unlock_irqrestore(_driver_lock, flags);
+
+   return policy;
+}
+
 static struct cpufreq_policy *cpufreq_policy_alloc(void)
 {
struct cpufreq_policy *policy;
@@ -1019,7 +1034,12 @@ static int __cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif,
goto module_out;
}
 
-   policy = cpufreq_policy_alloc();
+   if (frozen)
+   /* Restore the saved policy when doing light-weight init */
+   policy = cpufreq_policy_restore(cpu);
+   else
+   policy = cpufreq_policy_alloc();
+
if (!policy)
goto nomem_out;
 
@@ -1199,6 +1219,10 @@ static int __cpufreq_remove_dev(struct device *dev,
data = per_cpu(cpufreq_cpu_data, cpu);
per_cpu(cpufreq_cpu_data, cpu) = NULL;
 
+   /* Save the policy somewhere when doing a light-weight tear-down */
+   if (frozen)
+   per_cpu(cpufreq_cpu_data_fallback, cpu) = data;
+
write_unlock_irqrestore(_driver_lock, flags);
 
if (!data) {
@@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
if ((cpus == 1) && (cpufreq_driver->target))
__cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
 
-   pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
-   cpufreq_cpu_put(data);
+   if (!frozen) {
+   pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
+   cpufreq_cpu_put(data);
+   }
 
/* If cpu is last user of policy, free policy */
if (cpus == 1) {
-   lock_policy_rwsem_read(cpu);
-   kobj = >kobj;
-   cmp = >kobj_unregister;
-   unlock_policy_rwsem_read(cpu);
-   kobject_put(kobj);
-
-   /* we need to make sure that the underlying kobj is actually
-* not referenced anymore by anybody before we proceed with
-* unloading.
-*/
-   pr_debug("waiting for dropping of refcount\n");
-   wait_for_completion(cmp);
-   pr_debug("wait complete\n");
+   if (!frozen) {
+   lock_policy_rwsem_read(cpu);
+   kobj = >kobj;
+   cmp = >kobj_unregister;
+   unlock_policy_rwsem_read(cpu);
+   kobject_put(kobj);
+
+  

[PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

2013-07-11 Thread Srivatsa S. Bhat
To perform light-weight cpu-init and teardown in the cpufreq subsystem
during suspend/resume, we need to separate out the 2 main functionalities
of the cpufreq CPU hotplug callbacks, as outlined below:

1. Init/tear-down of core cpufreq and CPU-specific components, which are
   critical to the correct functioning of the cpufreq subsystem.

2. Init/tear-down of cpufreq sysfs files during suspend/resume.

The first part requires accurate updates to the policy structure such as
its -cpus and -related_cpus masks, whereas the second part requires that
the policy-kobj structure is not released or re-initialized during
suspend/resume.

To handle both these requirements, we need to allow updates to the policy
structure throughout suspend/resume, but prevent the structure from getting
freed up. Also, we must have a mechanism by which the cpu-up callbacks can
restore the policy structure, without allocating things afresh. (That also
helps avoid memory leaks).

To achieve this, we use 2 schemes:
a. Use a fallback per-cpu storage area for preserving the policy structures
   during suspend, so that they can be restored during resume appropriately.

b. Use the 'frozen' flag to determine when to free or allocate the policy
   structure vs when to restore the policy from the saved fallback storage.
   Thus we can successfully preserve the structure across suspend/resume.

Effectively, this helps us complete the separation of the 'light-weight'
and the 'full' init/tear-down sequences in the cpufreq subsystem, so that
this can be made use of in the suspend/resume scenario.

Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 drivers/cpufreq/cpufreq.c |   69 ++---
 1 file changed, 52 insertions(+), 17 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 1128753..15ced5f 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -44,6 +44,7 @@
  */
 static struct cpufreq_driver *cpufreq_driver;
 static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
+static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data_fallback);
 static DEFINE_RWLOCK(cpufreq_driver_lock);
 static DEFINE_MUTEX(cpufreq_governor_lock);
 
@@ -942,6 +943,20 @@ static int cpufreq_add_policy_cpu(unsigned int cpu, 
unsigned int sibling,
 }
 #endif
 
+static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu)
+{
+   struct cpufreq_policy *policy;
+   unsigned long flags;
+
+   write_lock_irqsave(cpufreq_driver_lock, flags);
+
+   policy = per_cpu(cpufreq_cpu_data_fallback, cpu);
+
+   write_unlock_irqrestore(cpufreq_driver_lock, flags);
+
+   return policy;
+}
+
 static struct cpufreq_policy *cpufreq_policy_alloc(void)
 {
struct cpufreq_policy *policy;
@@ -1019,7 +1034,12 @@ static int __cpufreq_add_dev(struct device *dev, struct 
subsys_interface *sif,
goto module_out;
}
 
-   policy = cpufreq_policy_alloc();
+   if (frozen)
+   /* Restore the saved policy when doing light-weight init */
+   policy = cpufreq_policy_restore(cpu);
+   else
+   policy = cpufreq_policy_alloc();
+
if (!policy)
goto nomem_out;
 
@@ -1199,6 +1219,10 @@ static int __cpufreq_remove_dev(struct device *dev,
data = per_cpu(cpufreq_cpu_data, cpu);
per_cpu(cpufreq_cpu_data, cpu) = NULL;
 
+   /* Save the policy somewhere when doing a light-weight tear-down */
+   if (frozen)
+   per_cpu(cpufreq_cpu_data_fallback, cpu) = data;
+
write_unlock_irqrestore(cpufreq_driver_lock, flags);
 
if (!data) {
@@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
if ((cpus == 1)  (cpufreq_driver-target))
__cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
 
-   pr_debug(%s: removing link, cpu: %d\n, __func__, cpu);
-   cpufreq_cpu_put(data);
+   if (!frozen) {
+   pr_debug(%s: removing link, cpu: %d\n, __func__, cpu);
+   cpufreq_cpu_put(data);
+   }
 
/* If cpu is last user of policy, free policy */
if (cpus == 1) {
-   lock_policy_rwsem_read(cpu);
-   kobj = data-kobj;
-   cmp = data-kobj_unregister;
-   unlock_policy_rwsem_read(cpu);
-   kobject_put(kobj);
-
-   /* we need to make sure that the underlying kobj is actually
-* not referenced anymore by anybody before we proceed with
-* unloading.
-*/
-   pr_debug(waiting for dropping of refcount\n);
-   wait_for_completion(cmp);
-   pr_debug(wait complete\n);
+   if (!frozen) {
+   lock_policy_rwsem_read(cpu);
+   kobj = data-kobj;
+   cmp = data-kobj_unregister;
+   unlock_policy_rwsem_read(cpu);
+