Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 07/16/2013 03:05 PM, Viresh Kumar wrote: > On 16 July 2013 14:59, Srivatsa S. Bhat > wrote: >> On 07/16/2013 02:40 PM, Viresh Kumar wrote: > >>> So, even if you don't keep the fallback storage, things should work >>> without any issue (probably worth trying as this will get rid of a per >>> cpu variable :)) >>> >> >> No, I already tried that and it didn't work ;-( The thing is, we need the >> __cpufreq_add_dev() code to call the ->init() routines of drivers etc. But if >> it finds the policy structure, it will skip all of that initialization and >> happily >> proceed. Which is precisely the cause of all the erratic behaviour we are >> seeing >> (ie., lack of proper initialization post-resume). > > I missed that point. :) > >> So this approach keeps the memory preserved in a fallback storage and lets >> the >> init code run to full completion without any issues. >> >> Perhaps we could do some _more_ code reorganization in the future to take >> this >> issue into account etc., but IMHO that might be non-trivial. I'm trying to >> keep >> this as simple and straight-forward as possible as a first step, to atleast >> get >> it properly working. (Changing the order in which init is done is kinda scary >> since its hard to comprehend what assumptions we might be breaking!). >> >> We can perhaps revisit your idea later and optimize out the extra per-cpu >> data. > > No, we don't need to optimize it that way. Current design looks good > for now. Cool! Thanks :) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 16 July 2013 14:59, Srivatsa S. Bhat wrote: > On 07/16/2013 02:40 PM, Viresh Kumar wrote: >> So, even if you don't keep the fallback storage, things should work >> without any issue (probably worth trying as this will get rid of a per >> cpu variable :)) >> > > No, I already tried that and it didn't work ;-( The thing is, we need the > __cpufreq_add_dev() code to call the ->init() routines of drivers etc. But if > it finds the policy structure, it will skip all of that initialization and > happily > proceed. Which is precisely the cause of all the erratic behaviour we are > seeing > (ie., lack of proper initialization post-resume). I missed that point. :) > So this approach keeps the memory preserved in a fallback storage and lets the > init code run to full completion without any issues. > > Perhaps we could do some _more_ code reorganization in the future to take this > issue into account etc., but IMHO that might be non-trivial. I'm trying to > keep > this as simple and straight-forward as possible as a first step, to atleast > get > it properly working. (Changing the order in which init is done is kinda scary > since its hard to comprehend what assumptions we might be breaking!). > > We can perhaps revisit your idea later and optimize out the extra per-cpu > data. No, we don't need to optimize it that way. Current design looks good for now. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 07/16/2013 02:40 PM, Viresh Kumar wrote: > On 16 July 2013 14:26, Srivatsa S. Bhat > wrote: >> On 07/16/2013 11:45 AM, Viresh Kumar wrote: > >>> To understand it I actually applied your patches to get better view of the >>> code. >>> (Haven't tested it though).. And found that your code is doing the right >>> thing >>> and we shouldn't get a mismatch.. This is the sequence of events I can draw: >>> >>> - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are >>> the no. of >>> cpus in its clock domain. >>> - _cpu_add_dev() for other cpus: doesn't change anything in refcount >>> >>> - Suspend: >>> - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the >>> value >>> of count >>> - Resume: >>> - cpu_add_dev() for all cpus, due to frozen flag we don't touch the >>> value of count. >>> >> >> Actually this one is tricky (I took a look again). So we have this code in >> the >> beginning of _cpufreq_add_dev(): >> >> >> 1008 #ifdef CONFIG_SMP >> 1009 /* check whether a different CPU already registered this >> 1010 * CPU because it is in the same boat. */ >> 1011 policy = cpufreq_cpu_get(cpu); >> 1012 if (unlikely(policy)) { >> 1013 cpufreq_cpu_put(policy); >> 1014 return 0; >> 1015 } >> >> The _get() is not controlled by the frozen flag, but it still doesn't take a >> refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set >> to >> NULL in __cpufreq_remove_dev() and the memory was saved away in fallback >> storage. >> So, when __cpufreq_cpu_get() executes, it sees: >> >> 204 /* get the CPU */ >> 205 data = per_cpu(cpufreq_cpu_data, cpu); >> 206 >> 207 if (!data) >> 208 goto err_out_put_module; >> >> Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will >> return >> silently. > > Even if this wouldn't have happened, refcount wouldn't have been > touched due to this code: > >> 1012 if (unlikely(policy)) { >> 1013 cpufreq_cpu_put(policy); >> 1014 return 0; >> 1015 } > > i.e. If we get a valid policy structure, we siimply put the policy again > and so decrement the incremented refcount. Ah, yes! > > So, even if you don't keep the fallback storage, things should work > without any issue (probably worth trying as this will get rid of a per > cpu variable :)) > No, I already tried that and it didn't work ;-( The thing is, we need the __cpufreq_add_dev() code to call the ->init() routines of drivers etc. But if it finds the policy structure, it will skip all of that initialization and happily proceed. Which is precisely the cause of all the erratic behaviour we are seeing (ie., lack of proper initialization post-resume). So this approach keeps the memory preserved in a fallback storage and lets the init code run to full completion without any issues. Perhaps we could do some _more_ code reorganization in the future to take this issue into account etc., but IMHO that might be non-trivial. I'm trying to keep this as simple and straight-forward as possible as a first step, to atleast get it properly working. (Changing the order in which init is done is kinda scary since its hard to comprehend what assumptions we might be breaking!). We can perhaps revisit your idea later and optimize out the extra per-cpu data. >> Further down in __cpufreq_add_dev(), we restore the original memory, using >> the frozen flag: >> >> 1037 if (frozen) >> 1038 /* Restore the saved policy when doing light-weight >> init */ >> 1039 policy = cpufreq_policy_restore(cpu); >> 1040 else >> 1041 policy = cpufreq_policy_alloc(); >> >> >> So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh >> refcount while resuming :) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 16 July 2013 14:26, Srivatsa S. Bhat wrote: > On 07/16/2013 11:45 AM, Viresh Kumar wrote: >> To understand it I actually applied your patches to get better view of the >> code. >> (Haven't tested it though).. And found that your code is doing the right >> thing >> and we shouldn't get a mismatch.. This is the sequence of events I can draw: >> >> - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are >> the no. of >> cpus in its clock domain. >> - _cpu_add_dev() for other cpus: doesn't change anything in refcount >> >> - Suspend: >> - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value >> of count >> - Resume: >> - cpu_add_dev() for all cpus, due to frozen flag we don't touch the >> value of count. >> > > Actually this one is tricky (I took a look again). So we have this code in the > beginning of _cpufreq_add_dev(): > > > 1008 #ifdef CONFIG_SMP > 1009 /* check whether a different CPU already registered this > 1010 * CPU because it is in the same boat. */ > 1011 policy = cpufreq_cpu_get(cpu); > 1012 if (unlikely(policy)) { > 1013 cpufreq_cpu_put(policy); > 1014 return 0; > 1015 } > > The _get() is not controlled by the frozen flag, but it still doesn't take a > refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to > NULL in __cpufreq_remove_dev() and the memory was saved away in fallback > storage. > So, when __cpufreq_cpu_get() executes, it sees: > > 204 /* get the CPU */ > 205 data = per_cpu(cpufreq_cpu_data, cpu); > 206 > 207 if (!data) > 208 goto err_out_put_module; > > Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will > return > silently. Even if this wouldn't have happened, refcount wouldn't have been touched due to this code: > 1012 if (unlikely(policy)) { > 1013 cpufreq_cpu_put(policy); > 1014 return 0; > 1015 } i.e. If we get a valid policy structure, we siimply put the policy again and so decrement the incremented refcount. So, even if you don't keep the fallback storage, things should work without any issue (probably worth trying as this will get rid of a per cpu variable :)) > Further down in __cpufreq_add_dev(), we restore the original memory, using > the frozen flag: > > 1037 if (frozen) > 1038 /* Restore the saved policy when doing light-weight init > */ > 1039 policy = cpufreq_policy_restore(cpu); > 1040 else > 1041 policy = cpufreq_policy_alloc(); > > > So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh > refcount while resuming :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 07/16/2013 11:45 AM, Viresh Kumar wrote: > On 15 July 2013 15:35, Srivatsa S. Bhat > wrote: >> Actually even I was wondering about this while writing the patch and >> I even tested shutdown after multiple suspend/resume cycles, to verify that >> the refcount is messed up. But surprisingly, things worked just fine. >> >> Logically there should've been a refcount mismatch and things should have >> failed, but everything worked fine during my tests. Apart from suspend/resume >> and shutdown tests, I even tried mixing a few regular CPU hotplug operations >> (echo 0/1 to sysfs online files), but nothing stood out. >> >> Sorry, I forgot to document this in the patch. Either the patch is wrong >> or something else is silently fixing this up. Not sure what is the exact >> situation. > > To understand it I actually applied your patches to get better view of the > code. > (Haven't tested it though).. And found that your code is doing the right thing > and we shouldn't get a mismatch.. This is the sequence of events I can draw: > > - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are > the no. of > cpus in its clock domain. > - _cpu_add_dev() for other cpus: doesn't change anything in refcount > > - Suspend: > - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value > of count > - Resume: > - cpu_add_dev() for all cpus, due to frozen flag we don't touch the > value of count. > Actually this one is tricky (I took a look again). So we have this code in the beginning of _cpufreq_add_dev(): 1008 #ifdef CONFIG_SMP 1009 /* check whether a different CPU already registered this 1010 * CPU because it is in the same boat. */ 1011 policy = cpufreq_cpu_get(cpu); 1012 if (unlikely(policy)) { 1013 cpufreq_cpu_put(policy); 1014 return 0; 1015 } The _get() is not controlled by the frozen flag, but it still doesn't take a refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to NULL in __cpufreq_remove_dev() and the memory was saved away in fallback storage. So, when __cpufreq_cpu_get() executes, it sees: 204 /* get the CPU */ 205 data = per_cpu(cpufreq_cpu_data, cpu); 206 207 if (!data) 208 goto err_out_put_module; Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will return silently. Further down in __cpufreq_add_dev(), we restore the original memory, using the frozen flag: 1037 if (frozen) 1038 /* Restore the saved policy when doing light-weight init */ 1039 policy = cpufreq_policy_restore(cpu); 1040 else 1041 policy = cpufreq_policy_alloc(); So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh refcount while resuming :) > And so things work as expected. That's why your code isn't breaking anything I > believe. > Thanks a lot for the code inspection and your detailed analysis! > But can no. of cpus change inbetween suspend/resume? Then count would be > tricky as we are using the same policy structure. > No, number of CPUs won't change in between suspend/resume. Even if somebody tried that, that would be an eccentric case and we won't handle that. Besides, *many more* things will break than just cpufreq, if somebody actually tries that out! Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 15 July 2013 15:35, Srivatsa S. Bhat wrote: > Actually even I was wondering about this while writing the patch and > I even tested shutdown after multiple suspend/resume cycles, to verify that > the refcount is messed up. But surprisingly, things worked just fine. > > Logically there should've been a refcount mismatch and things should have > failed, but everything worked fine during my tests. Apart from suspend/resume > and shutdown tests, I even tried mixing a few regular CPU hotplug operations > (echo 0/1 to sysfs online files), but nothing stood out. > > Sorry, I forgot to document this in the patch. Either the patch is wrong > or something else is silently fixing this up. Not sure what is the exact > situation. To understand it I actually applied your patches to get better view of the code. (Haven't tested it though).. And found that your code is doing the right thing and we shouldn't get a mismatch.. This is the sequence of events I can draw: - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are the no. of cpus in its clock domain. - _cpu_add_dev() for other cpus: doesn't change anything in refcount - Suspend: - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value of count - Resume: - cpu_add_dev() for all cpus, due to frozen flag we don't touch the value of count. And so things work as expected. That's why your code isn't breaking anything I believe. But can no. of cpus change inbetween suspend/resume? Then count would be tricky as we are using the same policy structure. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 15 July 2013 15:35, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: Actually even I was wondering about this while writing the patch and I even tested shutdown after multiple suspend/resume cycles, to verify that the refcount is messed up. But surprisingly, things worked just fine. Logically there should've been a refcount mismatch and things should have failed, but everything worked fine during my tests. Apart from suspend/resume and shutdown tests, I even tried mixing a few regular CPU hotplug operations (echo 0/1 to sysfs online files), but nothing stood out. Sorry, I forgot to document this in the patch. Either the patch is wrong or something else is silently fixing this up. Not sure what is the exact situation. To understand it I actually applied your patches to get better view of the code. (Haven't tested it though).. And found that your code is doing the right thing and we shouldn't get a mismatch.. This is the sequence of events I can draw: - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are the no. of cpus in its clock domain. - _cpu_add_dev() for other cpus: doesn't change anything in refcount - Suspend: - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value of count - Resume: - cpu_add_dev() for all cpus, due to frozen flag we don't touch the value of count. And so things work as expected. That's why your code isn't breaking anything I believe. But can no. of cpus change inbetween suspend/resume? Then count would be tricky as we are using the same policy structure. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 07/16/2013 11:45 AM, Viresh Kumar wrote: On 15 July 2013 15:35, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: Actually even I was wondering about this while writing the patch and I even tested shutdown after multiple suspend/resume cycles, to verify that the refcount is messed up. But surprisingly, things worked just fine. Logically there should've been a refcount mismatch and things should have failed, but everything worked fine during my tests. Apart from suspend/resume and shutdown tests, I even tried mixing a few regular CPU hotplug operations (echo 0/1 to sysfs online files), but nothing stood out. Sorry, I forgot to document this in the patch. Either the patch is wrong or something else is silently fixing this up. Not sure what is the exact situation. To understand it I actually applied your patches to get better view of the code. (Haven't tested it though).. And found that your code is doing the right thing and we shouldn't get a mismatch.. This is the sequence of events I can draw: - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are the no. of cpus in its clock domain. - _cpu_add_dev() for other cpus: doesn't change anything in refcount - Suspend: - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value of count - Resume: - cpu_add_dev() for all cpus, due to frozen flag we don't touch the value of count. Actually this one is tricky (I took a look again). So we have this code in the beginning of _cpufreq_add_dev(): 1008 #ifdef CONFIG_SMP 1009 /* check whether a different CPU already registered this 1010 * CPU because it is in the same boat. */ 1011 policy = cpufreq_cpu_get(cpu); 1012 if (unlikely(policy)) { 1013 cpufreq_cpu_put(policy); 1014 return 0; 1015 } The _get() is not controlled by the frozen flag, but it still doesn't take a refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to NULL in __cpufreq_remove_dev() and the memory was saved away in fallback storage. So, when __cpufreq_cpu_get() executes, it sees: 204 /* get the CPU */ 205 data = per_cpu(cpufreq_cpu_data, cpu); 206 207 if (!data) 208 goto err_out_put_module; Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will return silently. Further down in __cpufreq_add_dev(), we restore the original memory, using the frozen flag: 1037 if (frozen) 1038 /* Restore the saved policy when doing light-weight init */ 1039 policy = cpufreq_policy_restore(cpu); 1040 else 1041 policy = cpufreq_policy_alloc(); So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh refcount while resuming :) And so things work as expected. That's why your code isn't breaking anything I believe. Thanks a lot for the code inspection and your detailed analysis! But can no. of cpus change inbetween suspend/resume? Then count would be tricky as we are using the same policy structure. No, number of CPUs won't change in between suspend/resume. Even if somebody tried that, that would be an eccentric case and we won't handle that. Besides, *many more* things will break than just cpufreq, if somebody actually tries that out! Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 16 July 2013 14:26, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: On 07/16/2013 11:45 AM, Viresh Kumar wrote: To understand it I actually applied your patches to get better view of the code. (Haven't tested it though).. And found that your code is doing the right thing and we shouldn't get a mismatch.. This is the sequence of events I can draw: - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are the no. of cpus in its clock domain. - _cpu_add_dev() for other cpus: doesn't change anything in refcount - Suspend: - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value of count - Resume: - cpu_add_dev() for all cpus, due to frozen flag we don't touch the value of count. Actually this one is tricky (I took a look again). So we have this code in the beginning of _cpufreq_add_dev(): 1008 #ifdef CONFIG_SMP 1009 /* check whether a different CPU already registered this 1010 * CPU because it is in the same boat. */ 1011 policy = cpufreq_cpu_get(cpu); 1012 if (unlikely(policy)) { 1013 cpufreq_cpu_put(policy); 1014 return 0; 1015 } The _get() is not controlled by the frozen flag, but it still doesn't take a refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to NULL in __cpufreq_remove_dev() and the memory was saved away in fallback storage. So, when __cpufreq_cpu_get() executes, it sees: 204 /* get the CPU */ 205 data = per_cpu(cpufreq_cpu_data, cpu); 206 207 if (!data) 208 goto err_out_put_module; Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will return silently. Even if this wouldn't have happened, refcount wouldn't have been touched due to this code: 1012 if (unlikely(policy)) { 1013 cpufreq_cpu_put(policy); 1014 return 0; 1015 } i.e. If we get a valid policy structure, we siimply put the policy again and so decrement the incremented refcount. So, even if you don't keep the fallback storage, things should work without any issue (probably worth trying as this will get rid of a per cpu variable :)) Further down in __cpufreq_add_dev(), we restore the original memory, using the frozen flag: 1037 if (frozen) 1038 /* Restore the saved policy when doing light-weight init */ 1039 policy = cpufreq_policy_restore(cpu); 1040 else 1041 policy = cpufreq_policy_alloc(); So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh refcount while resuming :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 07/16/2013 02:40 PM, Viresh Kumar wrote: On 16 July 2013 14:26, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: On 07/16/2013 11:45 AM, Viresh Kumar wrote: To understand it I actually applied your patches to get better view of the code. (Haven't tested it though).. And found that your code is doing the right thing and we shouldn't get a mismatch.. This is the sequence of events I can draw: - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are the no. of cpus in its clock domain. - _cpu_add_dev() for other cpus: doesn't change anything in refcount - Suspend: - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value of count - Resume: - cpu_add_dev() for all cpus, due to frozen flag we don't touch the value of count. Actually this one is tricky (I took a look again). So we have this code in the beginning of _cpufreq_add_dev(): 1008 #ifdef CONFIG_SMP 1009 /* check whether a different CPU already registered this 1010 * CPU because it is in the same boat. */ 1011 policy = cpufreq_cpu_get(cpu); 1012 if (unlikely(policy)) { 1013 cpufreq_cpu_put(policy); 1014 return 0; 1015 } The _get() is not controlled by the frozen flag, but it still doesn't take a refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to NULL in __cpufreq_remove_dev() and the memory was saved away in fallback storage. So, when __cpufreq_cpu_get() executes, it sees: 204 /* get the CPU */ 205 data = per_cpu(cpufreq_cpu_data, cpu); 206 207 if (!data) 208 goto err_out_put_module; Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will return silently. Even if this wouldn't have happened, refcount wouldn't have been touched due to this code: 1012 if (unlikely(policy)) { 1013 cpufreq_cpu_put(policy); 1014 return 0; 1015 } i.e. If we get a valid policy structure, we siimply put the policy again and so decrement the incremented refcount. Ah, yes! So, even if you don't keep the fallback storage, things should work without any issue (probably worth trying as this will get rid of a per cpu variable :)) No, I already tried that and it didn't work ;-( The thing is, we need the __cpufreq_add_dev() code to call the -init() routines of drivers etc. But if it finds the policy structure, it will skip all of that initialization and happily proceed. Which is precisely the cause of all the erratic behaviour we are seeing (ie., lack of proper initialization post-resume). So this approach keeps the memory preserved in a fallback storage and lets the init code run to full completion without any issues. Perhaps we could do some _more_ code reorganization in the future to take this issue into account etc., but IMHO that might be non-trivial. I'm trying to keep this as simple and straight-forward as possible as a first step, to atleast get it properly working. (Changing the order in which init is done is kinda scary since its hard to comprehend what assumptions we might be breaking!). We can perhaps revisit your idea later and optimize out the extra per-cpu data. Further down in __cpufreq_add_dev(), we restore the original memory, using the frozen flag: 1037 if (frozen) 1038 /* Restore the saved policy when doing light-weight init */ 1039 policy = cpufreq_policy_restore(cpu); 1040 else 1041 policy = cpufreq_policy_alloc(); So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh refcount while resuming :) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 16 July 2013 14:59, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: On 07/16/2013 02:40 PM, Viresh Kumar wrote: So, even if you don't keep the fallback storage, things should work without any issue (probably worth trying as this will get rid of a per cpu variable :)) No, I already tried that and it didn't work ;-( The thing is, we need the __cpufreq_add_dev() code to call the -init() routines of drivers etc. But if it finds the policy structure, it will skip all of that initialization and happily proceed. Which is precisely the cause of all the erratic behaviour we are seeing (ie., lack of proper initialization post-resume). I missed that point. :) So this approach keeps the memory preserved in a fallback storage and lets the init code run to full completion without any issues. Perhaps we could do some _more_ code reorganization in the future to take this issue into account etc., but IMHO that might be non-trivial. I'm trying to keep this as simple and straight-forward as possible as a first step, to atleast get it properly working. (Changing the order in which init is done is kinda scary since its hard to comprehend what assumptions we might be breaking!). We can perhaps revisit your idea later and optimize out the extra per-cpu data. No, we don't need to optimize it that way. Current design looks good for now. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 07/16/2013 03:05 PM, Viresh Kumar wrote: On 16 July 2013 14:59, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: On 07/16/2013 02:40 PM, Viresh Kumar wrote: So, even if you don't keep the fallback storage, things should work without any issue (probably worth trying as this will get rid of a per cpu variable :)) No, I already tried that and it didn't work ;-( The thing is, we need the __cpufreq_add_dev() code to call the -init() routines of drivers etc. But if it finds the policy structure, it will skip all of that initialization and happily proceed. Which is precisely the cause of all the erratic behaviour we are seeing (ie., lack of proper initialization post-resume). I missed that point. :) So this approach keeps the memory preserved in a fallback storage and lets the init code run to full completion without any issues. Perhaps we could do some _more_ code reorganization in the future to take this issue into account etc., but IMHO that might be non-trivial. I'm trying to keep this as simple and straight-forward as possible as a first step, to atleast get it properly working. (Changing the order in which init is done is kinda scary since its hard to comprehend what assumptions we might be breaking!). We can perhaps revisit your idea later and optimize out the extra per-cpu data. No, we don't need to optimize it that way. Current design looks good for now. Cool! Thanks :) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 07/15/2013 05:05 PM, Rafael J. Wysocki wrote: > On Monday, July 15, 2013 03:35:04 PM Srivatsa S. Bhat wrote: >> On 07/15/2013 03:25 PM, Viresh Kumar wrote: >>> Hi Srivatsa, >>> >>> I may be wrong but it looks something is wrong in this patch. >>> >>> On 12 July 2013 03:47, Srivatsa S. Bhat >>> wrote: diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>> @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev, if ((cpus == 1) && (cpufreq_driver->target)) __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); - pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); - cpufreq_cpu_put(data); + if (!frozen) { + pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); + cpufreq_cpu_put(data); >>> >>> So, we don't decrement usage count here. But we are still increasing >>> counts on cpufreq_add_dev after resume, isn't it? >>> >>> So, we wouldn't be able to free policy struct once all the cpus of a >>> policy are removed after suspend/resume has happened once. >>> >> >> Actually even I was wondering about this while writing the patch and >> I even tested shutdown after multiple suspend/resume cycles, to verify that >> the refcount is messed up. But surprisingly, things worked just fine. >> >> Logically there should've been a refcount mismatch and things should have >> failed, but everything worked fine during my tests. Apart from suspend/resume >> and shutdown tests, I even tried mixing a few regular CPU hotplug operations >> (echo 0/1 to sysfs online files), but nothing stood out. >> >> Sorry, I forgot to document this in the patch. Either the patch is wrong >> or something else is silently fixing this up. Not sure what is the exact >> situation. > > OK, so I'm not going to queue [2-8/8] up until we find out what's going on > here (and until Toralf tells me that it doesn't break his system any more). > Ok, that sounds good. > I've queued up [1/8] for 3.11 already. > Thank you! Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 07/15/2013 03:51 PM, Viresh Kumar wrote: > On 15 July 2013 15:35, Srivatsa S. Bhat > wrote: >> Actually even I was wondering about this while writing the patch and >> I even tested shutdown after multiple suspend/resume cycles, to verify that >> the refcount is messed up. But surprisingly, things worked just fine. > > What kind of system have you tested it on? > The system has 2 sockets with 8 cores each, and has Intel Sandybridge CPUs. I had used a local patch to simulate CPU hotplug in the suspend-to-ram path using the freeze state of pm_test (because I had other problems in using the 'processors' state of pm_test). The patch is shown below: diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index ece0422..fe07b77 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -342,8 +342,13 @@ static int enter_state(suspend_state_t state) if (error) goto Unlock; - if (suspend_test(TEST_FREEZER)) + if (suspend_test(TEST_FREEZER)) { + pr_debug("Disabling nonboot CPUs\n"); + disable_nonboot_cpus(); + pr_debug("Enabling nonboot CPUs\n"); + enable_nonboot_cpus(); goto Finish; + } pr_debug("PM: Entering %s sleep\n", pm_states[state]); pm_restrict_gfp_mask(); Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On Monday, July 15, 2013 03:35:04 PM Srivatsa S. Bhat wrote: > On 07/15/2013 03:25 PM, Viresh Kumar wrote: > > Hi Srivatsa, > > > > I may be wrong but it looks something is wrong in this patch. > > > > On 12 July 2013 03:47, Srivatsa S. Bhat > > wrote: > >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > > >> @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev, > >> if ((cpus == 1) && (cpufreq_driver->target)) > >> __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); > >> > >> - pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); > >> - cpufreq_cpu_put(data); > >> + if (!frozen) { > >> + pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); > >> + cpufreq_cpu_put(data); > > > > So, we don't decrement usage count here. But we are still increasing > > counts on cpufreq_add_dev after resume, isn't it? > > > > So, we wouldn't be able to free policy struct once all the cpus of a > > policy are removed after suspend/resume has happened once. > > > > Actually even I was wondering about this while writing the patch and > I even tested shutdown after multiple suspend/resume cycles, to verify that > the refcount is messed up. But surprisingly, things worked just fine. > > Logically there should've been a refcount mismatch and things should have > failed, but everything worked fine during my tests. Apart from suspend/resume > and shutdown tests, I even tried mixing a few regular CPU hotplug operations > (echo 0/1 to sysfs online files), but nothing stood out. > > Sorry, I forgot to document this in the patch. Either the patch is wrong > or something else is silently fixing this up. Not sure what is the exact > situation. OK, so I'm not going to queue [2-8/8] up until we find out what's going on here (and until Toralf tells me that it doesn't break his system any more). I've queued up [1/8] for 3.11 already. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 15 July 2013 15:35, Srivatsa S. Bhat wrote: > Actually even I was wondering about this while writing the patch and > I even tested shutdown after multiple suspend/resume cycles, to verify that > the refcount is messed up. But surprisingly, things worked just fine. What kind of system have you tested it on? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 07/15/2013 03:25 PM, Viresh Kumar wrote: > Hi Srivatsa, > > I may be wrong but it looks something is wrong in this patch. > > On 12 July 2013 03:47, Srivatsa S. Bhat > wrote: >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > >> @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev, >> if ((cpus == 1) && (cpufreq_driver->target)) >> __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); >> >> - pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); >> - cpufreq_cpu_put(data); >> + if (!frozen) { >> + pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); >> + cpufreq_cpu_put(data); > > So, we don't decrement usage count here. But we are still increasing > counts on cpufreq_add_dev after resume, isn't it? > > So, we wouldn't be able to free policy struct once all the cpus of a > policy are removed after suspend/resume has happened once. > Actually even I was wondering about this while writing the patch and I even tested shutdown after multiple suspend/resume cycles, to verify that the refcount is messed up. But surprisingly, things worked just fine. Logically there should've been a refcount mismatch and things should have failed, but everything worked fine during my tests. Apart from suspend/resume and shutdown tests, I even tried mixing a few regular CPU hotplug operations (echo 0/1 to sysfs online files), but nothing stood out. Sorry, I forgot to document this in the patch. Either the patch is wrong or something else is silently fixing this up. Not sure what is the exact situation. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
Hi Srivatsa, I may be wrong but it looks something is wrong in this patch. On 12 July 2013 03:47, Srivatsa S. Bhat wrote: > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev, > if ((cpus == 1) && (cpufreq_driver->target)) > __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); > > - pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); > - cpufreq_cpu_put(data); > + if (!frozen) { > + pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); > + cpufreq_cpu_put(data); So, we don't decrement usage count here. But we are still increasing counts on cpufreq_add_dev after resume, isn't it? So, we wouldn't be able to free policy struct once all the cpus of a policy are removed after suspend/resume has happened once. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
Hi Srivatsa, I may be wrong but it looks something is wrong in this patch. On 12 July 2013 03:47, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev, if ((cpus == 1) (cpufreq_driver-target)) __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); - pr_debug(%s: removing link, cpu: %d\n, __func__, cpu); - cpufreq_cpu_put(data); + if (!frozen) { + pr_debug(%s: removing link, cpu: %d\n, __func__, cpu); + cpufreq_cpu_put(data); So, we don't decrement usage count here. But we are still increasing counts on cpufreq_add_dev after resume, isn't it? So, we wouldn't be able to free policy struct once all the cpus of a policy are removed after suspend/resume has happened once. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 07/15/2013 03:25 PM, Viresh Kumar wrote: Hi Srivatsa, I may be wrong but it looks something is wrong in this patch. On 12 July 2013 03:47, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev, if ((cpus == 1) (cpufreq_driver-target)) __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); - pr_debug(%s: removing link, cpu: %d\n, __func__, cpu); - cpufreq_cpu_put(data); + if (!frozen) { + pr_debug(%s: removing link, cpu: %d\n, __func__, cpu); + cpufreq_cpu_put(data); So, we don't decrement usage count here. But we are still increasing counts on cpufreq_add_dev after resume, isn't it? So, we wouldn't be able to free policy struct once all the cpus of a policy are removed after suspend/resume has happened once. Actually even I was wondering about this while writing the patch and I even tested shutdown after multiple suspend/resume cycles, to verify that the refcount is messed up. But surprisingly, things worked just fine. Logically there should've been a refcount mismatch and things should have failed, but everything worked fine during my tests. Apart from suspend/resume and shutdown tests, I even tried mixing a few regular CPU hotplug operations (echo 0/1 to sysfs online files), but nothing stood out. Sorry, I forgot to document this in the patch. Either the patch is wrong or something else is silently fixing this up. Not sure what is the exact situation. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 15 July 2013 15:35, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: Actually even I was wondering about this while writing the patch and I even tested shutdown after multiple suspend/resume cycles, to verify that the refcount is messed up. But surprisingly, things worked just fine. What kind of system have you tested it on? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On Monday, July 15, 2013 03:35:04 PM Srivatsa S. Bhat wrote: On 07/15/2013 03:25 PM, Viresh Kumar wrote: Hi Srivatsa, I may be wrong but it looks something is wrong in this patch. On 12 July 2013 03:47, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev, if ((cpus == 1) (cpufreq_driver-target)) __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); - pr_debug(%s: removing link, cpu: %d\n, __func__, cpu); - cpufreq_cpu_put(data); + if (!frozen) { + pr_debug(%s: removing link, cpu: %d\n, __func__, cpu); + cpufreq_cpu_put(data); So, we don't decrement usage count here. But we are still increasing counts on cpufreq_add_dev after resume, isn't it? So, we wouldn't be able to free policy struct once all the cpus of a policy are removed after suspend/resume has happened once. Actually even I was wondering about this while writing the patch and I even tested shutdown after multiple suspend/resume cycles, to verify that the refcount is messed up. But surprisingly, things worked just fine. Logically there should've been a refcount mismatch and things should have failed, but everything worked fine during my tests. Apart from suspend/resume and shutdown tests, I even tried mixing a few regular CPU hotplug operations (echo 0/1 to sysfs online files), but nothing stood out. Sorry, I forgot to document this in the patch. Either the patch is wrong or something else is silently fixing this up. Not sure what is the exact situation. OK, so I'm not going to queue [2-8/8] up until we find out what's going on here (and until Toralf tells me that it doesn't break his system any more). I've queued up [1/8] for 3.11 already. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 07/15/2013 03:51 PM, Viresh Kumar wrote: On 15 July 2013 15:35, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: Actually even I was wondering about this while writing the patch and I even tested shutdown after multiple suspend/resume cycles, to verify that the refcount is messed up. But surprisingly, things worked just fine. What kind of system have you tested it on? The system has 2 sockets with 8 cores each, and has Intel Sandybridge CPUs. I had used a local patch to simulate CPU hotplug in the suspend-to-ram path using the freeze state of pm_test (because I had other problems in using the 'processors' state of pm_test). The patch is shown below: diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index ece0422..fe07b77 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -342,8 +342,13 @@ static int enter_state(suspend_state_t state) if (error) goto Unlock; - if (suspend_test(TEST_FREEZER)) + if (suspend_test(TEST_FREEZER)) { + pr_debug(Disabling nonboot CPUs\n); + disable_nonboot_cpus(); + pr_debug(Enabling nonboot CPUs\n); + enable_nonboot_cpus(); goto Finish; + } pr_debug(PM: Entering %s sleep\n, pm_states[state]); pm_restrict_gfp_mask(); Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
On 07/15/2013 05:05 PM, Rafael J. Wysocki wrote: On Monday, July 15, 2013 03:35:04 PM Srivatsa S. Bhat wrote: On 07/15/2013 03:25 PM, Viresh Kumar wrote: Hi Srivatsa, I may be wrong but it looks something is wrong in this patch. On 12 July 2013 03:47, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev, if ((cpus == 1) (cpufreq_driver-target)) __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); - pr_debug(%s: removing link, cpu: %d\n, __func__, cpu); - cpufreq_cpu_put(data); + if (!frozen) { + pr_debug(%s: removing link, cpu: %d\n, __func__, cpu); + cpufreq_cpu_put(data); So, we don't decrement usage count here. But we are still increasing counts on cpufreq_add_dev after resume, isn't it? So, we wouldn't be able to free policy struct once all the cpus of a policy are removed after suspend/resume has happened once. Actually even I was wondering about this while writing the patch and I even tested shutdown after multiple suspend/resume cycles, to verify that the refcount is messed up. But surprisingly, things worked just fine. Logically there should've been a refcount mismatch and things should have failed, but everything worked fine during my tests. Apart from suspend/resume and shutdown tests, I even tried mixing a few regular CPU hotplug operations (echo 0/1 to sysfs online files), but nothing stood out. Sorry, I forgot to document this in the patch. Either the patch is wrong or something else is silently fixing this up. Not sure what is the exact situation. OK, so I'm not going to queue [2-8/8] up until we find out what's going on here (and until Toralf tells me that it doesn't break his system any more). Ok, that sounds good. I've queued up [1/8] for 3.11 already. Thank you! Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
To perform light-weight cpu-init and teardown in the cpufreq subsystem during suspend/resume, we need to separate out the 2 main functionalities of the cpufreq CPU hotplug callbacks, as outlined below: 1. Init/tear-down of core cpufreq and CPU-specific components, which are critical to the correct functioning of the cpufreq subsystem. 2. Init/tear-down of cpufreq sysfs files during suspend/resume. The first part requires accurate updates to the policy structure such as its ->cpus and ->related_cpus masks, whereas the second part requires that the policy->kobj structure is not released or re-initialized during suspend/resume. To handle both these requirements, we need to allow updates to the policy structure throughout suspend/resume, but prevent the structure from getting freed up. Also, we must have a mechanism by which the cpu-up callbacks can restore the policy structure, without allocating things afresh. (That also helps avoid memory leaks). To achieve this, we use 2 schemes: a. Use a fallback per-cpu storage area for preserving the policy structures during suspend, so that they can be restored during resume appropriately. b. Use the 'frozen' flag to determine when to free or allocate the policy structure vs when to restore the policy from the saved fallback storage. Thus we can successfully preserve the structure across suspend/resume. Effectively, this helps us complete the separation of the 'light-weight' and the 'full' init/tear-down sequences in the cpufreq subsystem, so that this can be made use of in the suspend/resume scenario. Signed-off-by: Srivatsa S. Bhat --- drivers/cpufreq/cpufreq.c | 69 ++--- 1 file changed, 52 insertions(+), 17 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 1128753..15ced5f 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -44,6 +44,7 @@ */ static struct cpufreq_driver *cpufreq_driver; static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); +static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data_fallback); static DEFINE_RWLOCK(cpufreq_driver_lock); static DEFINE_MUTEX(cpufreq_governor_lock); @@ -942,6 +943,20 @@ static int cpufreq_add_policy_cpu(unsigned int cpu, unsigned int sibling, } #endif +static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu) +{ + struct cpufreq_policy *policy; + unsigned long flags; + + write_lock_irqsave(_driver_lock, flags); + + policy = per_cpu(cpufreq_cpu_data_fallback, cpu); + + write_unlock_irqrestore(_driver_lock, flags); + + return policy; +} + static struct cpufreq_policy *cpufreq_policy_alloc(void) { struct cpufreq_policy *policy; @@ -1019,7 +1034,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, goto module_out; } - policy = cpufreq_policy_alloc(); + if (frozen) + /* Restore the saved policy when doing light-weight init */ + policy = cpufreq_policy_restore(cpu); + else + policy = cpufreq_policy_alloc(); + if (!policy) goto nomem_out; @@ -1199,6 +1219,10 @@ static int __cpufreq_remove_dev(struct device *dev, data = per_cpu(cpufreq_cpu_data, cpu); per_cpu(cpufreq_cpu_data, cpu) = NULL; + /* Save the policy somewhere when doing a light-weight tear-down */ + if (frozen) + per_cpu(cpufreq_cpu_data_fallback, cpu) = data; + write_unlock_irqrestore(_driver_lock, flags); if (!data) { @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev, if ((cpus == 1) && (cpufreq_driver->target)) __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); - pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); - cpufreq_cpu_put(data); + if (!frozen) { + pr_debug("%s: removing link, cpu: %d\n", __func__, cpu); + cpufreq_cpu_put(data); + } /* If cpu is last user of policy, free policy */ if (cpus == 1) { - lock_policy_rwsem_read(cpu); - kobj = >kobj; - cmp = >kobj_unregister; - unlock_policy_rwsem_read(cpu); - kobject_put(kobj); - - /* we need to make sure that the underlying kobj is actually -* not referenced anymore by anybody before we proceed with -* unloading. -*/ - pr_debug("waiting for dropping of refcount\n"); - wait_for_completion(cmp); - pr_debug("wait complete\n"); + if (!frozen) { + lock_policy_rwsem_read(cpu); + kobj = >kobj; + cmp = >kobj_unregister; + unlock_policy_rwsem_read(cpu); + kobject_put(kobj); + +
[PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume
To perform light-weight cpu-init and teardown in the cpufreq subsystem during suspend/resume, we need to separate out the 2 main functionalities of the cpufreq CPU hotplug callbacks, as outlined below: 1. Init/tear-down of core cpufreq and CPU-specific components, which are critical to the correct functioning of the cpufreq subsystem. 2. Init/tear-down of cpufreq sysfs files during suspend/resume. The first part requires accurate updates to the policy structure such as its -cpus and -related_cpus masks, whereas the second part requires that the policy-kobj structure is not released or re-initialized during suspend/resume. To handle both these requirements, we need to allow updates to the policy structure throughout suspend/resume, but prevent the structure from getting freed up. Also, we must have a mechanism by which the cpu-up callbacks can restore the policy structure, without allocating things afresh. (That also helps avoid memory leaks). To achieve this, we use 2 schemes: a. Use a fallback per-cpu storage area for preserving the policy structures during suspend, so that they can be restored during resume appropriately. b. Use the 'frozen' flag to determine when to free or allocate the policy structure vs when to restore the policy from the saved fallback storage. Thus we can successfully preserve the structure across suspend/resume. Effectively, this helps us complete the separation of the 'light-weight' and the 'full' init/tear-down sequences in the cpufreq subsystem, so that this can be made use of in the suspend/resume scenario. Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com --- drivers/cpufreq/cpufreq.c | 69 ++--- 1 file changed, 52 insertions(+), 17 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 1128753..15ced5f 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -44,6 +44,7 @@ */ static struct cpufreq_driver *cpufreq_driver; static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data); +static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data_fallback); static DEFINE_RWLOCK(cpufreq_driver_lock); static DEFINE_MUTEX(cpufreq_governor_lock); @@ -942,6 +943,20 @@ static int cpufreq_add_policy_cpu(unsigned int cpu, unsigned int sibling, } #endif +static struct cpufreq_policy *cpufreq_policy_restore(unsigned int cpu) +{ + struct cpufreq_policy *policy; + unsigned long flags; + + write_lock_irqsave(cpufreq_driver_lock, flags); + + policy = per_cpu(cpufreq_cpu_data_fallback, cpu); + + write_unlock_irqrestore(cpufreq_driver_lock, flags); + + return policy; +} + static struct cpufreq_policy *cpufreq_policy_alloc(void) { struct cpufreq_policy *policy; @@ -1019,7 +1034,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, goto module_out; } - policy = cpufreq_policy_alloc(); + if (frozen) + /* Restore the saved policy when doing light-weight init */ + policy = cpufreq_policy_restore(cpu); + else + policy = cpufreq_policy_alloc(); + if (!policy) goto nomem_out; @@ -1199,6 +1219,10 @@ static int __cpufreq_remove_dev(struct device *dev, data = per_cpu(cpufreq_cpu_data, cpu); per_cpu(cpufreq_cpu_data, cpu) = NULL; + /* Save the policy somewhere when doing a light-weight tear-down */ + if (frozen) + per_cpu(cpufreq_cpu_data_fallback, cpu) = data; + write_unlock_irqrestore(cpufreq_driver_lock, flags); if (!data) { @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev, if ((cpus == 1) (cpufreq_driver-target)) __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT); - pr_debug(%s: removing link, cpu: %d\n, __func__, cpu); - cpufreq_cpu_put(data); + if (!frozen) { + pr_debug(%s: removing link, cpu: %d\n, __func__, cpu); + cpufreq_cpu_put(data); + } /* If cpu is last user of policy, free policy */ if (cpus == 1) { - lock_policy_rwsem_read(cpu); - kobj = data-kobj; - cmp = data-kobj_unregister; - unlock_policy_rwsem_read(cpu); - kobject_put(kobj); - - /* we need to make sure that the underlying kobj is actually -* not referenced anymore by anybody before we proceed with -* unloading. -*/ - pr_debug(waiting for dropping of refcount\n); - wait_for_completion(cmp); - pr_debug(wait complete\n); + if (!frozen) { + lock_policy_rwsem_read(cpu); + kobj = data-kobj; + cmp = data-kobj_unregister; + unlock_policy_rwsem_read(cpu); +