Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
On 16 December 2013 17:49, Bjørn Mork wrote: > Viresh Kumar writes: >> As a summary, after my patch to suspend/resume governors we can't >> allow policies to be freed and allocated back. > > How do you deal with errors on suspend/resume then? Are you always able > to keep the policies, for all error cases? We fixed that with http://www.spinics.net/lists/cpufreq/msg08720.html, isn't it? > In any case: Splitting the suspend code between a cpu hotplug hook with > special "frozen" logic and a cpufreq_suspend() called from > dpm_suspend_noirq() confuses me, and I believe many others. This is the > reason such a bug could be caused by two "obviously fine" patches. So > please, at least keep the suspend logic in *one* place. This fixes it I believe. https://lkml.org/lkml/2013/12/2/26 Specially patch 3/3 >> Its not really a war between my patch versus yours :), but I believe the >> right thing to do at this point is to get two patches in for 3.13 as well: >> >> 5a87182 cpufreq: suspend governors on system suspend/hibernate >> and patch discussed here: >> http://www.spinics.net/lists/cpufreq/msg08720.html > > Yes, that would probably work fine, at least as long as nothing goes > wrong. I must admit that I'm in no way able to play out all the > different error scenarios in my head, but won't there still be cases > where you end up freeing policies on suspend/resume? No, we aren't supposed to free policies at all in suspend/resume.. >> To finish this problem as early as possible I tested above two >> patches and didn't saw any regressions with suspend/resume or >> Hibernation.. And obviously this fixes your issues as well :) >> >> @Rafael: I understand that it would be difficult for you to take these >> now for 3.13 but they fix some serious problems reported earlier. >> Specially the first patch which everybody thought is the culprit :) >> >> Please see if we can manage to get them in :) > > I think it needs serious testing with simulated errors first. All error > labels should be executed at least once for all combinations of inputs. That can be best done by the people who reported these issues. And you are one of those :) I will ask others also to give these patches a try. .That would be helpful. > Simply trying it out and verifying that it works in the no-error case is > not enough. Which should be quite obvious now. Sure.. But I already tested it earlier as well, when I sent that patch to you. I tested it on top of my branch where the reverted patch was present. So, for me they should work fine.. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
Viresh Kumar writes: > Adding Rafael, I though he was there on this mail :) > > On 16 December 2013 16:32, Bjørn Mork wrote: >> It's both patches in combination. Interesting case, really :-) > > Completed my testing now and yes this happened after your patch > came in. I read your chats somewhere else as well (there are so > many mails, bug reports on this topic, can't give link now.. :)).. > > I think I know why we are facing issues with these two patches in > at the same time. > > - My patch is actually disabling governors at suspend/resume or > hibernation.. > - Which makes __cpufreq_governor() a nops routine, which simply > returns zero > - Now with your patch you are disabling the frozen feature which > actually makes cpufreq core to free/allocate policies again on suspend > resume. And so the calls like: > > __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT) > > end up doing nothing at resume when CPUs start coming up. And so > cpufreq_resume()'s calls to __cpufreq_governor(policy, > CPUFREQ_GOV_START) would start behaving badly.. Which is quite > obvious.. > > As a summary, after my patch to suspend/resume governors we can't > allow policies to be freed and allocated back. How do you deal with errors on suspend/resume then? Are you always able to keep the policies, for all error cases? In any case: Splitting the suspend code between a cpu hotplug hook with special "frozen" logic and a cpufreq_suspend() called from dpm_suspend_noirq() confuses me, and I believe many others. This is the reason such a bug could be caused by two "obviously fine" patches. So please, at least keep the suspend logic in *one* place. Or add a BIG comment both places, explaining the dependencies. This is never going to become obvious, and it's not going to be easier when you have more than 2 simple patches to look at. > Its not really a war between my patch versus yours :), but I believe the > right thing to do at this point is to get two patches in for 3.13 as well: > > 5a87182 cpufreq: suspend governors on system suspend/hibernate > and patch discussed here: > http://www.spinics.net/lists/cpufreq/msg08720.html Yes, that would probably work fine, at least as long as nothing goes wrong. I must admit that I'm in no way able to play out all the different error scenarios in my head, but won't there still be cases where you end up freeing policies on suspend/resume? > To finish this problem as early as possible I tested above two > patches and didn't saw any regressions with suspend/resume or > Hibernation.. And obviously this fixes your issues as well :) > > @Rafael: I understand that it would be difficult for you to take these > now for 3.13 but they fix some serious problems reported earlier. > Specially the first patch which everybody thought is the culprit :) > > Please see if we can manage to get them in :) I think it needs serious testing with simulated errors first. All error labels should be executed at least once for all combinations of inputs. Simply trying it out and verifying that it works in the no-error case is not enough. Which should be quite obvious now. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
On 12/16/2013 04:32 PM, Bjørn Mork wrote: > Viresh Kumar writes: >> On 8 December 2013 15:54, Zdenek Kabelac wrote: >>> I'm testing recent 3.13-rc kernel - and I've notice my Lenovo T61 is not >>> able to resume. After bisect I've found commit: >>> >>> 5a87182aa21d6d5d306840feab9321818dd3e2a3 >> >> That's my patch, sorry for all the trouble.. Just came back after vacations >> and multiple threads are there with this patch as culprit.. >> >> I just tested this patch again on Rafael's linux-next branch and >> suspend/resume >> and Hibernation are working fine for me on my Thinkpad T420. I don't see any >> issues at all.. >> >> scaling_governor: ondemand >> scaling_driver: acpi-cpufreq >> >>> When I just revert this commit with 374b105797c3d4f29c685f3be535c35f5689b30e >>> (3.13-rc3) suspend/resume works again. >>> (I'm using ondemand CPU governor) >> >>> Here is bisect log: >> >>> # bad: [2167e2399dc5e69c62db56d933e9c8cbe107620a] cpufreq: fix garbage >>> kobjects on errors during suspend/resume >>> git bisect bad 2167e2399dc5e69c62db56d933e9c8cbe107620a >> >> I don't read bisect logs well but doesn't you log say that the culprit is >> 2167e2399dc5e69c62db56d933e9c8cbe107620a instead? >> >> I am trying with this patch now, but will take some time for results to be >> out. > > It's both patches in combination. Interesting case, really :-) > True... Basically, what happens is that commit 5a87182 prevents POLICY_EXIT of governors during suspend - ie., it just does a CPUFREQ_GOV_STOP and prevents any further operations on governors. Later, commit 2167e23 removes the special suspend/resume handling for CPU hotplug and thus in this path, CPU offline sends out POLICY_EXIT and also frees up the memory allocated to the policy structure. Since POLICY_EXIT is prevented by the cpufreq_suspend() code, this tear-down only gets half-completed and thus goes into an inconsistent state. So, upon resume, the CPU online operation tries POLICY_INIT, GOV_START etc, and again none of these get executed because of the cpufreq_suspend() code. Eventually cpufreq_resume() will execute GOV_START, but by then the underlying policy-structure is a newly allocated one, and things have already been messed up so much that it just can't resume properly. Overall, the inconsistency in handling governor code during suspend/resume is the root-cause of the suspend/resume regression. And as Bjorn mentioned, this is caused by the interaction of the 2 commits with one another, and cannot be reproduced by either of the commits independently. IIUC, mainline should be fine now, since Rafael reverted both the commits. We need to be more careful in bringing back either functionality in the future. As Bjorn and Rafael rightly pointed out in the other email threads, we need to fundamentally rethink the design and come up with elegant interfaces/mechanisms in the cpufreq subsystem that will allow us to write code that works as expected and avoid such subtle regressions. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
Adding Rafael, I though he was there on this mail :) On 16 December 2013 16:32, Bjørn Mork wrote: > It's both patches in combination. Interesting case, really :-) Completed my testing now and yes this happened after your patch came in. I read your chats somewhere else as well (there are so many mails, bug reports on this topic, can't give link now.. :)).. I think I know why we are facing issues with these two patches in at the same time. - My patch is actually disabling governors at suspend/resume or hibernation.. - Which makes __cpufreq_governor() a nops routine, which simply returns zero - Now with your patch you are disabling the frozen feature which actually makes cpufreq core to free/allocate policies again on suspend resume. And so the calls like: __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT) end up doing nothing at resume when CPUs start coming up. And so cpufreq_resume()'s calls to __cpufreq_governor(policy, CPUFREQ_GOV_START) would start behaving badly.. Which is quite obvious.. As a summary, after my patch to suspend/resume governors we can't allow policies to be freed and allocated back. Its not really a war between my patch versus yours :), but I believe the right thing to do at this point is to get two patches in for 3.13 as well: 5a87182 cpufreq: suspend governors on system suspend/hibernate and patch discussed here: http://www.spinics.net/lists/cpufreq/msg08720.html To finish this problem as early as possible I tested above two patches and didn't saw any regressions with suspend/resume or Hibernation.. And obviously this fixes your issues as well :) @Rafael: I understand that it would be difficult for you to take these now for 3.13 but they fix some serious problems reported earlier. Specially the first patch which everybody thought is the culprit :) Please see if we can manage to get them in :) -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
Viresh Kumar writes: > On 8 December 2013 15:54, Zdenek Kabelac wrote: >> I'm testing recent 3.13-rc kernel - and I've notice my Lenovo T61 is not >> able to resume. After bisect I've found commit: >> >> 5a87182aa21d6d5d306840feab9321818dd3e2a3 > > That's my patch, sorry for all the trouble.. Just came back after vacations > and multiple threads are there with this patch as culprit.. > > I just tested this patch again on Rafael's linux-next branch and > suspend/resume > and Hibernation are working fine for me on my Thinkpad T420. I don't see any > issues at all.. > > scaling_governor: ondemand > scaling_driver: acpi-cpufreq > >> When I just revert this commit with 374b105797c3d4f29c685f3be535c35f5689b30e >> (3.13-rc3) suspend/resume works again. >> (I'm using ondemand CPU governor) > >> Here is bisect log: > >> # bad: [2167e2399dc5e69c62db56d933e9c8cbe107620a] cpufreq: fix garbage >> kobjects on errors during suspend/resume >> git bisect bad 2167e2399dc5e69c62db56d933e9c8cbe107620a > > I don't read bisect logs well but doesn't you log say that the culprit is > 2167e2399dc5e69c62db56d933e9c8cbe107620a instead? > > I am trying with this patch now, but will take some time for results to be > out. It's both patches in combination. Interesting case, really :-) Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
On 8 December 2013 15:54, Zdenek Kabelac wrote: > I'm testing recent 3.13-rc kernel - and I've notice my Lenovo T61 is not > able to resume. After bisect I've found commit: > > 5a87182aa21d6d5d306840feab9321818dd3e2a3 That's my patch, sorry for all the trouble.. Just came back after vacations and multiple threads are there with this patch as culprit.. I just tested this patch again on Rafael's linux-next branch and suspend/resume and Hibernation are working fine for me on my Thinkpad T420. I don't see any issues at all.. scaling_governor: ondemand scaling_driver: acpi-cpufreq > When I just revert this commit with 374b105797c3d4f29c685f3be535c35f5689b30e > (3.13-rc3) suspend/resume works again. > (I'm using ondemand CPU governor) > Here is bisect log: > # bad: [2167e2399dc5e69c62db56d933e9c8cbe107620a] cpufreq: fix garbage > kobjects on errors during suspend/resume > git bisect bad 2167e2399dc5e69c62db56d933e9c8cbe107620a I don't read bisect logs well but doesn't you log say that the culprit is 2167e2399dc5e69c62db56d933e9c8cbe107620a instead? I am trying with this patch now, but will take some time for results to be out. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
On 8 December 2013 15:54, Zdenek Kabelac zkabe...@redhat.com wrote: I'm testing recent 3.13-rc kernel - and I've notice my Lenovo T61 is not able to resume. After bisect I've found commit: 5a87182aa21d6d5d306840feab9321818dd3e2a3 That's my patch, sorry for all the trouble.. Just came back after vacations and multiple threads are there with this patch as culprit.. I just tested this patch again on Rafael's linux-next branch and suspend/resume and Hibernation are working fine for me on my Thinkpad T420. I don't see any issues at all.. scaling_governor: ondemand scaling_driver: acpi-cpufreq When I just revert this commit with 374b105797c3d4f29c685f3be535c35f5689b30e (3.13-rc3) suspend/resume works again. (I'm using ondemand CPU governor) Here is bisect log: # bad: [2167e2399dc5e69c62db56d933e9c8cbe107620a] cpufreq: fix garbage kobjects on errors during suspend/resume git bisect bad 2167e2399dc5e69c62db56d933e9c8cbe107620a I don't read bisect logs well but doesn't you log say that the culprit is 2167e2399dc5e69c62db56d933e9c8cbe107620a instead? I am trying with this patch now, but will take some time for results to be out. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
Viresh Kumar viresh.ku...@linaro.org writes: On 8 December 2013 15:54, Zdenek Kabelac zkabe...@redhat.com wrote: I'm testing recent 3.13-rc kernel - and I've notice my Lenovo T61 is not able to resume. After bisect I've found commit: 5a87182aa21d6d5d306840feab9321818dd3e2a3 That's my patch, sorry for all the trouble.. Just came back after vacations and multiple threads are there with this patch as culprit.. I just tested this patch again on Rafael's linux-next branch and suspend/resume and Hibernation are working fine for me on my Thinkpad T420. I don't see any issues at all.. scaling_governor: ondemand scaling_driver: acpi-cpufreq When I just revert this commit with 374b105797c3d4f29c685f3be535c35f5689b30e (3.13-rc3) suspend/resume works again. (I'm using ondemand CPU governor) Here is bisect log: # bad: [2167e2399dc5e69c62db56d933e9c8cbe107620a] cpufreq: fix garbage kobjects on errors during suspend/resume git bisect bad 2167e2399dc5e69c62db56d933e9c8cbe107620a I don't read bisect logs well but doesn't you log say that the culprit is 2167e2399dc5e69c62db56d933e9c8cbe107620a instead? I am trying with this patch now, but will take some time for results to be out. It's both patches in combination. Interesting case, really :-) Bjørn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
Adding Rafael, I though he was there on this mail :) On 16 December 2013 16:32, Bjørn Mork bj...@mork.no wrote: It's both patches in combination. Interesting case, really :-) Completed my testing now and yes this happened after your patch came in. I read your chats somewhere else as well (there are so many mails, bug reports on this topic, can't give link now.. :)).. I think I know why we are facing issues with these two patches in at the same time. - My patch is actually disabling governors at suspend/resume or hibernation.. - Which makes __cpufreq_governor() a nops routine, which simply returns zero - Now with your patch you are disabling the frozen feature which actually makes cpufreq core to free/allocate policies again on suspend resume. And so the calls like: __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT) end up doing nothing at resume when CPUs start coming up. And so cpufreq_resume()'s calls to __cpufreq_governor(policy, CPUFREQ_GOV_START) would start behaving badly.. Which is quite obvious.. As a summary, after my patch to suspend/resume governors we can't allow policies to be freed and allocated back. Its not really a war between my patch versus yours :), but I believe the right thing to do at this point is to get two patches in for 3.13 as well: 5a87182 cpufreq: suspend governors on system suspend/hibernate and patch discussed here: http://www.spinics.net/lists/cpufreq/msg08720.html To finish this problem as early as possible I tested above two patches and didn't saw any regressions with suspend/resume or Hibernation.. And obviously this fixes your issues as well :) @Rafael: I understand that it would be difficult for you to take these now for 3.13 but they fix some serious problems reported earlier. Specially the first patch which everybody thought is the culprit :) Please see if we can manage to get them in :) -- viresh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
On 12/16/2013 04:32 PM, Bjørn Mork wrote: Viresh Kumar viresh.ku...@linaro.org writes: On 8 December 2013 15:54, Zdenek Kabelac zkabe...@redhat.com wrote: I'm testing recent 3.13-rc kernel - and I've notice my Lenovo T61 is not able to resume. After bisect I've found commit: 5a87182aa21d6d5d306840feab9321818dd3e2a3 That's my patch, sorry for all the trouble.. Just came back after vacations and multiple threads are there with this patch as culprit.. I just tested this patch again on Rafael's linux-next branch and suspend/resume and Hibernation are working fine for me on my Thinkpad T420. I don't see any issues at all.. scaling_governor: ondemand scaling_driver: acpi-cpufreq When I just revert this commit with 374b105797c3d4f29c685f3be535c35f5689b30e (3.13-rc3) suspend/resume works again. (I'm using ondemand CPU governor) Here is bisect log: # bad: [2167e2399dc5e69c62db56d933e9c8cbe107620a] cpufreq: fix garbage kobjects on errors during suspend/resume git bisect bad 2167e2399dc5e69c62db56d933e9c8cbe107620a I don't read bisect logs well but doesn't you log say that the culprit is 2167e2399dc5e69c62db56d933e9c8cbe107620a instead? I am trying with this patch now, but will take some time for results to be out. It's both patches in combination. Interesting case, really :-) True... Basically, what happens is that commit 5a87182 prevents POLICY_EXIT of governors during suspend - ie., it just does a CPUFREQ_GOV_STOP and prevents any further operations on governors. Later, commit 2167e23 removes the special suspend/resume handling for CPU hotplug and thus in this path, CPU offline sends out POLICY_EXIT and also frees up the memory allocated to the policy structure. Since POLICY_EXIT is prevented by the cpufreq_suspend() code, this tear-down only gets half-completed and thus goes into an inconsistent state. So, upon resume, the CPU online operation tries POLICY_INIT, GOV_START etc, and again none of these get executed because of the cpufreq_suspend() code. Eventually cpufreq_resume() will execute GOV_START, but by then the underlying policy-structure is a newly allocated one, and things have already been messed up so much that it just can't resume properly. Overall, the inconsistency in handling governor code during suspend/resume is the root-cause of the suspend/resume regression. And as Bjorn mentioned, this is caused by the interaction of the 2 commits with one another, and cannot be reproduced by either of the commits independently. IIUC, mainline should be fine now, since Rafael reverted both the commits. We need to be more careful in bringing back either functionality in the future. As Bjorn and Rafael rightly pointed out in the other email threads, we need to fundamentally rethink the design and come up with elegant interfaces/mechanisms in the cpufreq subsystem that will allow us to write code that works as expected and avoid such subtle regressions. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
Viresh Kumar viresh.ku...@linaro.org writes: Adding Rafael, I though he was there on this mail :) On 16 December 2013 16:32, Bjørn Mork bj...@mork.no wrote: It's both patches in combination. Interesting case, really :-) Completed my testing now and yes this happened after your patch came in. I read your chats somewhere else as well (there are so many mails, bug reports on this topic, can't give link now.. :)).. I think I know why we are facing issues with these two patches in at the same time. - My patch is actually disabling governors at suspend/resume or hibernation.. - Which makes __cpufreq_governor() a nops routine, which simply returns zero - Now with your patch you are disabling the frozen feature which actually makes cpufreq core to free/allocate policies again on suspend resume. And so the calls like: __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT) end up doing nothing at resume when CPUs start coming up. And so cpufreq_resume()'s calls to __cpufreq_governor(policy, CPUFREQ_GOV_START) would start behaving badly.. Which is quite obvious.. As a summary, after my patch to suspend/resume governors we can't allow policies to be freed and allocated back. How do you deal with errors on suspend/resume then? Are you always able to keep the policies, for all error cases? In any case: Splitting the suspend code between a cpu hotplug hook with special frozen logic and a cpufreq_suspend() called from dpm_suspend_noirq() confuses me, and I believe many others. This is the reason such a bug could be caused by two obviously fine patches. So please, at least keep the suspend logic in *one* place. Or add a BIG comment both places, explaining the dependencies. This is never going to become obvious, and it's not going to be easier when you have more than 2 simple patches to look at. Its not really a war between my patch versus yours :), but I believe the right thing to do at this point is to get two patches in for 3.13 as well: 5a87182 cpufreq: suspend governors on system suspend/hibernate and patch discussed here: http://www.spinics.net/lists/cpufreq/msg08720.html Yes, that would probably work fine, at least as long as nothing goes wrong. I must admit that I'm in no way able to play out all the different error scenarios in my head, but won't there still be cases where you end up freeing policies on suspend/resume? To finish this problem as early as possible I tested above two patches and didn't saw any regressions with suspend/resume or Hibernation.. And obviously this fixes your issues as well :) @Rafael: I understand that it would be difficult for you to take these now for 3.13 but they fix some serious problems reported earlier. Specially the first patch which everybody thought is the culprit :) Please see if we can manage to get them in :) I think it needs serious testing with simulated errors first. All error labels should be executed at least once for all combinations of inputs. Simply trying it out and verifying that it works in the no-error case is not enough. Which should be quite obvious now. Bjørn -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
On 16 December 2013 17:49, Bjørn Mork bj...@mork.no wrote: Viresh Kumar viresh.ku...@linaro.org writes: As a summary, after my patch to suspend/resume governors we can't allow policies to be freed and allocated back. How do you deal with errors on suspend/resume then? Are you always able to keep the policies, for all error cases? We fixed that with http://www.spinics.net/lists/cpufreq/msg08720.html, isn't it? In any case: Splitting the suspend code between a cpu hotplug hook with special frozen logic and a cpufreq_suspend() called from dpm_suspend_noirq() confuses me, and I believe many others. This is the reason such a bug could be caused by two obviously fine patches. So please, at least keep the suspend logic in *one* place. This fixes it I believe. https://lkml.org/lkml/2013/12/2/26 Specially patch 3/3 Its not really a war between my patch versus yours :), but I believe the right thing to do at this point is to get two patches in for 3.13 as well: 5a87182 cpufreq: suspend governors on system suspend/hibernate and patch discussed here: http://www.spinics.net/lists/cpufreq/msg08720.html Yes, that would probably work fine, at least as long as nothing goes wrong. I must admit that I'm in no way able to play out all the different error scenarios in my head, but won't there still be cases where you end up freeing policies on suspend/resume? No, we aren't supposed to free policies at all in suspend/resume.. To finish this problem as early as possible I tested above two patches and didn't saw any regressions with suspend/resume or Hibernation.. And obviously this fixes your issues as well :) @Rafael: I understand that it would be difficult for you to take these now for 3.13 but they fix some serious problems reported earlier. Specially the first patch which everybody thought is the culprit :) Please see if we can manage to get them in :) I think it needs serious testing with simulated errors first. All error labels should be executed at least once for all combinations of inputs. That can be best done by the people who reported these issues. And you are one of those :) I will ask others also to give these patches a try. .That would be helpful. Simply trying it out and verifying that it works in the no-error case is not enough. Which should be quite obvious now. Sure.. But I already tested it earlier as well, when I sent that patch to you. I tested it on top of my branch where the reverted patch was present. So, for me they should work fine.. -- viresh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
On Sun, 2013-12-08 at 20:21 +0100, Zdenek Kabelac wrote: > Dne 8.12.2013 12:13, Paul Bolle napsal(a): > > Rafael just asked Linus to pull a revert of that commit (see > > https://lkml.org/lkml/2013/12/7/126 ). > > > I'm not sure - but it also could be related with another warning I'm getting > during resume: > > But possibly this is result in i915 update ? I don't think that's likely. See a recent LKML thread ( https://lkml.org/lkml/2013/11/30/148 ). There are two patches mentioned in that thread. I had to apply both patches to make that WARNING go away (on a 32 bit ThinkPad). (These two patches are now in my local stash, on top of v3.13-rc3. I'm not sure what their status is and whether these will end up in mainline.) > PM: Syncing filesystems ... done. > Freezing user space processes ... (elapsed 0.005 seconds) done. > Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > Suspending console(s) (use no_console_suspend to debug) > sd 0:0:0:0: [sda] Synchronizing SCSI cache > sd 0:0:0:0: [sda] Stopping disk > [ cut here ] > WARNING: CPU: 0 PID: 2042 at drivers/gpu/drm/i915/intel_display.c:9948 > intel_get_pipe_from_connector+0x49/0x50 [i915]() > Modules linked in: i915 i2c_algo_bit drm_kms_helper drm ctr ccm > ipt_MASQUERADE > iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack > nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp tun bridge stp > llc ipv6 ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables > bnep btusb bluetooth snd_hda_codec_analog iTCO_wdt iTCO_vendor_support arc4 > coretemp kvm_intel kvm microcode iwl3945 iwlegacy psmouse serio_raw mac80211 > i2c_i801 i2c_core sdhci_pci r852 sm_common sdhci nand cfg80211 nand_ecc > snd_hda_intel nand_ids mmc_core mtd snd_hda_codec r592 snd_hwdep snd_seq > memstick snd_seq_device snd_pcm lpc_ich mfd_core e1000e snd_page_alloc > snd_timer ptp pps_core thinkpad_acpi wmi nvram snd soundcore evdev > binfmt_misc > nfsd auth_rpcgss oid_registry exportfs loop nfs_acl lockd sunrpc pcmcia > sr_mod > yenta_socket cdrom ehci_pci ehci_hcd uhci_hcd usbcore usb_common video > backlight autofs4 > CPU: 0 PID: 2042 Comm: kworker/u4:3 Tainted: GW > 3.13.0-rc3-6-gee4645c #186 > Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011 > Workqueue: events_unbound async_run_entry_fn > 0009 88009ad21af8 815a3bfe > 88009ad21b30 8104924d 8800b5f4fe00 8800aa408000 > 8800b5f4fe00 8800b5f4fe00 88009ad21b40 > Call Trace: > [] dump_stack+0x4e/0x7a > [] warn_slowpath_common+0x7d/0xa0 > [] warn_slowpath_null+0x1a/0x20 > [] intel_get_pipe_from_connector+0x49/0x50 [i915] > [] intel_panel_disable_backlight+0x25/0x180 [i915] > [] intel_disable_lvds+0x4d/0x180 [i915] > [] i9xx_crtc_disable+0x28a/0x380 [i915] > [] i915_drm_freeze+0xca/0x150 [i915] > [] i915_pm_suspend+0x3d/0x90 [i915] > [] pci_pm_suspend+0x6c/0x150 > [] ? pci_pm_freeze+0xe0/0xe0 > [] dpm_run_callback+0x49/0xa0 > [] __device_suspend+0xdd/0x280 > [] async_suspend+0x1f/0xa0 > [] async_run_entry_fn+0x37/0x130 > [] process_one_work+0x1fd/0x6d0 > [] ? process_one_work+0x19b/0x6d0 > [] worker_thread+0x121/0x3a0 > [] ? process_one_work+0x6d0/0x6d0 > [] kthread+0xf0/0x110 > [] ? insert_kthread_work+0x70/0x70 > [] ret_from_fork+0x7c/0xb0 > [] ? insert_kthread_work+0x70/0x70 > ---[ end trace 7517bdfd550790cc ]--- > PM: suspend of devices complete after 797.599 msecs > PM: late suspend of devices complete after 3.960 msecs Paul Bolle -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
Dne 8.12.2013 12:13, Paul Bolle napsal(a): On Sun, 2013-12-08 at 11:24 +0100, Zdenek Kabelac wrote: I'm testing recent 3.13-rc kernel - and I've notice my Lenovo T61 is not able to resume. After bisect I've found commit: 5a87182aa21d6d5d306840feab9321818dd3e2a3 That's "cpufreq: suspend governors on system suspend/hibernate". When I just revert this commit with 374b105797c3d4f29c685f3be535c35f5689b30e (3.13-rc3) suspend/resume works again. (I'm using ondemand CPU governor) Rafael just asked Linus to pull a revert of that commit (see https://lkml.org/lkml/2013/12/7/126 ). I'm not sure - but it also could be related with another warning I'm getting during resume: But possibly this is result in i915 update ? Zdenek PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.005 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. Suspending console(s) (use no_console_suspend to debug) sd 0:0:0:0: [sda] Synchronizing SCSI cache sd 0:0:0:0: [sda] Stopping disk [ cut here ] WARNING: CPU: 0 PID: 2042 at drivers/gpu/drm/i915/intel_display.c:9948 intel_get_pipe_from_connector+0x49/0x50 [i915]() Modules linked in: i915 i2c_algo_bit drm_kms_helper drm ctr ccm ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp tun bridge stp llc ipv6 ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables bnep btusb bluetooth snd_hda_codec_analog iTCO_wdt iTCO_vendor_support arc4 coretemp kvm_intel kvm microcode iwl3945 iwlegacy psmouse serio_raw mac80211 i2c_i801 i2c_core sdhci_pci r852 sm_common sdhci nand cfg80211 nand_ecc snd_hda_intel nand_ids mmc_core mtd snd_hda_codec r592 snd_hwdep snd_seq memstick snd_seq_device snd_pcm lpc_ich mfd_core e1000e snd_page_alloc snd_timer ptp pps_core thinkpad_acpi wmi nvram snd soundcore evdev binfmt_misc nfsd auth_rpcgss oid_registry exportfs loop nfs_acl lockd sunrpc pcmcia sr_mod yenta_socket cdrom ehci_pci ehci_hcd uhci_hcd usbcore usb_common video backlight autofs4 CPU: 0 PID: 2042 Comm: kworker/u4:3 Tainted: GW 3.13.0-rc3-6-gee4645c #186 Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011 Workqueue: events_unbound async_run_entry_fn 0009 88009ad21af8 815a3bfe 88009ad21b30 8104924d 8800b5f4fe00 8800aa408000 8800b5f4fe00 8800b5f4fe00 88009ad21b40 Call Trace: [] dump_stack+0x4e/0x7a [] warn_slowpath_common+0x7d/0xa0 [] warn_slowpath_null+0x1a/0x20 [] intel_get_pipe_from_connector+0x49/0x50 [i915] [] intel_panel_disable_backlight+0x25/0x180 [i915] [] intel_disable_lvds+0x4d/0x180 [i915] [] i9xx_crtc_disable+0x28a/0x380 [i915] [] i915_drm_freeze+0xca/0x150 [i915] [] i915_pm_suspend+0x3d/0x90 [i915] [] pci_pm_suspend+0x6c/0x150 [] ? pci_pm_freeze+0xe0/0xe0 [] dpm_run_callback+0x49/0xa0 [] __device_suspend+0xdd/0x280 [] async_suspend+0x1f/0xa0 [] async_run_entry_fn+0x37/0x130 [] process_one_work+0x1fd/0x6d0 [] ? process_one_work+0x19b/0x6d0 [] worker_thread+0x121/0x3a0 [] ? process_one_work+0x6d0/0x6d0 [] kthread+0xf0/0x110 [] ? insert_kthread_work+0x70/0x70 [] ret_from_fork+0x7c/0xb0 [] ? insert_kthread_work+0x70/0x70 ---[ end trace 7517bdfd550790cc ]--- PM: suspend of devices complete after 797.599 msecs PM: late suspend of devices complete after 3.960 msecs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
On Sun, 2013-12-08 at 11:24 +0100, Zdenek Kabelac wrote: > I'm testing recent 3.13-rc kernel - and I've notice my Lenovo T61 is not able > to resume. After bisect I've found commit: > > 5a87182aa21d6d5d306840feab9321818dd3e2a3 That's "cpufreq: suspend governors on system suspend/hibernate". > When I just revert this commit with 374b105797c3d4f29c685f3be535c35f5689b30e > (3.13-rc3) suspend/resume works again. > (I'm using ondemand CPU governor) Rafael just asked Linus to pull a revert of that commit (see https://lkml.org/lkml/2013/12/7/126 ). Paul Bolle -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
On Sun, 2013-12-08 at 11:24 +0100, Zdenek Kabelac wrote: I'm testing recent 3.13-rc kernel - and I've notice my Lenovo T61 is not able to resume. After bisect I've found commit: 5a87182aa21d6d5d306840feab9321818dd3e2a3 That's cpufreq: suspend governors on system suspend/hibernate. When I just revert this commit with 374b105797c3d4f29c685f3be535c35f5689b30e (3.13-rc3) suspend/resume works again. (I'm using ondemand CPU governor) Rafael just asked Linus to pull a revert of that commit (see https://lkml.org/lkml/2013/12/7/126 ). Paul Bolle -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
Dne 8.12.2013 12:13, Paul Bolle napsal(a): On Sun, 2013-12-08 at 11:24 +0100, Zdenek Kabelac wrote: I'm testing recent 3.13-rc kernel - and I've notice my Lenovo T61 is not able to resume. After bisect I've found commit: 5a87182aa21d6d5d306840feab9321818dd3e2a3 That's cpufreq: suspend governors on system suspend/hibernate. When I just revert this commit with 374b105797c3d4f29c685f3be535c35f5689b30e (3.13-rc3) suspend/resume works again. (I'm using ondemand CPU governor) Rafael just asked Linus to pull a revert of that commit (see https://lkml.org/lkml/2013/12/7/126 ). I'm not sure - but it also could be related with another warning I'm getting during resume: But possibly this is result in i915 update ? Zdenek PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.005 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. Suspending console(s) (use no_console_suspend to debug) sd 0:0:0:0: [sda] Synchronizing SCSI cache sd 0:0:0:0: [sda] Stopping disk [ cut here ] WARNING: CPU: 0 PID: 2042 at drivers/gpu/drm/i915/intel_display.c:9948 intel_get_pipe_from_connector+0x49/0x50 [i915]() Modules linked in: i915 i2c_algo_bit drm_kms_helper drm ctr ccm ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp tun bridge stp llc ipv6 ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables bnep btusb bluetooth snd_hda_codec_analog iTCO_wdt iTCO_vendor_support arc4 coretemp kvm_intel kvm microcode iwl3945 iwlegacy psmouse serio_raw mac80211 i2c_i801 i2c_core sdhci_pci r852 sm_common sdhci nand cfg80211 nand_ecc snd_hda_intel nand_ids mmc_core mtd snd_hda_codec r592 snd_hwdep snd_seq memstick snd_seq_device snd_pcm lpc_ich mfd_core e1000e snd_page_alloc snd_timer ptp pps_core thinkpad_acpi wmi nvram snd soundcore evdev binfmt_misc nfsd auth_rpcgss oid_registry exportfs loop nfs_acl lockd sunrpc pcmcia sr_mod yenta_socket cdrom ehci_pci ehci_hcd uhci_hcd usbcore usb_common video backlight autofs4 CPU: 0 PID: 2042 Comm: kworker/u4:3 Tainted: GW 3.13.0-rc3-6-gee4645c #186 Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011 Workqueue: events_unbound async_run_entry_fn 0009 88009ad21af8 815a3bfe 88009ad21b30 8104924d 8800b5f4fe00 8800aa408000 8800b5f4fe00 8800b5f4fe00 88009ad21b40 Call Trace: [815a3bfe] dump_stack+0x4e/0x7a [8104924d] warn_slowpath_common+0x7d/0xa0 [8104932a] warn_slowpath_null+0x1a/0x20 [a0c27f59] intel_get_pipe_from_connector+0x49/0x50 [i915] [a0c41e05] intel_panel_disable_backlight+0x25/0x180 [i915] [a0c2cfad] intel_disable_lvds+0x4d/0x180 [i915] [a0c1ceda] i9xx_crtc_disable+0x28a/0x380 [i915] [a0be50ca] i915_drm_freeze+0xca/0x150 [i915] [a0be54cd] i915_pm_suspend+0x3d/0x90 [i915] [813668bc] pci_pm_suspend+0x6c/0x150 [81366850] ? pci_pm_freeze+0xe0/0xe0 [81433e89] dpm_run_callback+0x49/0xa0 [8143428d] __device_suspend+0xdd/0x280 [8143444f] async_suspend+0x1f/0xa0 [81079e87] async_run_entry_fn+0x37/0x130 [8106a8ed] process_one_work+0x1fd/0x6d0 [8106a88b] ? process_one_work+0x19b/0x6d0 [8106aee1] worker_thread+0x121/0x3a0 [8106adc0] ? process_one_work+0x6d0/0x6d0 [81072fc0] kthread+0xf0/0x110 [81072ed0] ? insert_kthread_work+0x70/0x70 [815b402c] ret_from_fork+0x7c/0xb0 [81072ed0] ? insert_kthread_work+0x70/0x70 ---[ end trace 7517bdfd550790cc ]--- PM: suspend of devices complete after 797.599 msecs PM: late suspend of devices complete after 3.960 msecs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with suspend resume 5a87182aa21d6d5d306840feab9321818dd3e2a3
On Sun, 2013-12-08 at 20:21 +0100, Zdenek Kabelac wrote: Dne 8.12.2013 12:13, Paul Bolle napsal(a): Rafael just asked Linus to pull a revert of that commit (see https://lkml.org/lkml/2013/12/7/126 ). I'm not sure - but it also could be related with another warning I'm getting during resume: But possibly this is result in i915 update ? I don't think that's likely. See a recent LKML thread ( https://lkml.org/lkml/2013/11/30/148 ). There are two patches mentioned in that thread. I had to apply both patches to make that WARNING go away (on a 32 bit ThinkPad). (These two patches are now in my local stash, on top of v3.13-rc3. I'm not sure what their status is and whether these will end up in mainline.) PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.005 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. Suspending console(s) (use no_console_suspend to debug) sd 0:0:0:0: [sda] Synchronizing SCSI cache sd 0:0:0:0: [sda] Stopping disk [ cut here ] WARNING: CPU: 0 PID: 2042 at drivers/gpu/drm/i915/intel_display.c:9948 intel_get_pipe_from_connector+0x49/0x50 [i915]() Modules linked in: i915 i2c_algo_bit drm_kms_helper drm ctr ccm ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp tun bridge stp llc ipv6 ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables bnep btusb bluetooth snd_hda_codec_analog iTCO_wdt iTCO_vendor_support arc4 coretemp kvm_intel kvm microcode iwl3945 iwlegacy psmouse serio_raw mac80211 i2c_i801 i2c_core sdhci_pci r852 sm_common sdhci nand cfg80211 nand_ecc snd_hda_intel nand_ids mmc_core mtd snd_hda_codec r592 snd_hwdep snd_seq memstick snd_seq_device snd_pcm lpc_ich mfd_core e1000e snd_page_alloc snd_timer ptp pps_core thinkpad_acpi wmi nvram snd soundcore evdev binfmt_misc nfsd auth_rpcgss oid_registry exportfs loop nfs_acl lockd sunrpc pcmcia sr_mod yenta_socket cdrom ehci_pci ehci_hcd uhci_hcd usbcore usb_common video backlight autofs4 CPU: 0 PID: 2042 Comm: kworker/u4:3 Tainted: GW 3.13.0-rc3-6-gee4645c #186 Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011 Workqueue: events_unbound async_run_entry_fn 0009 88009ad21af8 815a3bfe 88009ad21b30 8104924d 8800b5f4fe00 8800aa408000 8800b5f4fe00 8800b5f4fe00 88009ad21b40 Call Trace: [815a3bfe] dump_stack+0x4e/0x7a [8104924d] warn_slowpath_common+0x7d/0xa0 [8104932a] warn_slowpath_null+0x1a/0x20 [a0c27f59] intel_get_pipe_from_connector+0x49/0x50 [i915] [a0c41e05] intel_panel_disable_backlight+0x25/0x180 [i915] [a0c2cfad] intel_disable_lvds+0x4d/0x180 [i915] [a0c1ceda] i9xx_crtc_disable+0x28a/0x380 [i915] [a0be50ca] i915_drm_freeze+0xca/0x150 [i915] [a0be54cd] i915_pm_suspend+0x3d/0x90 [i915] [813668bc] pci_pm_suspend+0x6c/0x150 [81366850] ? pci_pm_freeze+0xe0/0xe0 [81433e89] dpm_run_callback+0x49/0xa0 [8143428d] __device_suspend+0xdd/0x280 [8143444f] async_suspend+0x1f/0xa0 [81079e87] async_run_entry_fn+0x37/0x130 [8106a8ed] process_one_work+0x1fd/0x6d0 [8106a88b] ? process_one_work+0x19b/0x6d0 [8106aee1] worker_thread+0x121/0x3a0 [8106adc0] ? process_one_work+0x6d0/0x6d0 [81072fc0] kthread+0xf0/0x110 [81072ed0] ? insert_kthread_work+0x70/0x70 [815b402c] ret_from_fork+0x7c/0xb0 [81072ed0] ? insert_kthread_work+0x70/0x70 ---[ end trace 7517bdfd550790cc ]--- PM: suspend of devices complete after 797.599 msecs PM: late suspend of devices complete after 3.960 msecs Paul Bolle -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/