Re: [PATCH v3] PM / devfreq: Restart previous governor if new governor fails to start
On 11/09/2016 04:10 PM, Chanwoo Choi wrote: Hi, On 2016년 11월 10일 05:34, Saravana Kannan wrote: On 11/08/2016 06:38 PM, Chanwoo Choi wrote: On 2016년 11월 09일 11:36, Chanwoo Choi wrote: Hi, On 2016년 11월 09일 10:33, Chanwoo Choi wrote: On 2016년 11월 09일 05:52, Saravana Kannan wrote: On 11/08/2016 01:02 AM, Chanwoo Choi wrote: Hi, On 2016년 11월 08일 03:57, Saravana Kannan wrote: On 10/26/2016 05:06 PM, Chanwoo Choi wrote: Hi, On 2016년 10월 27일 04:17, Saravana Kannan wrote: If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan <skan...@codeaurora.org> --- * Fix NULL deref for real this time. * Addressed some style preferences. drivers/devfreq/devfreq.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..77c41a5 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; -struct devfreq_governor *governor; +const struct devfreq_governor *governor, *prev_governor; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,21 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } +prev_governor = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); -if (ret) +if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); +if (prev_governor) { I think that prev_governor is always set. You don't need to check it. Why do check it? Not true. Even higher up in this function, we check if df->governor != NULL. Simple example would be that the default governor passed in while adding the device isn't compiled in. I don't understand. If device is not registered by devfreq_add_device(), you don't change the governor by using governor_store(). If you can change the governor through governor_store(), it means that the devfreq instance was added without any problem. The added devfreq instance must have the sepecific governor on df->governor variable. So, I think that df->governor is always set and then prev_governor is always set. Read the code more closely. add_device doesn't (and shouldn't) fail if the governor isn't present. After that, the governor could be changed from user space. If governor is not present during devfreq_add_device(), the devfreq instance is not able to register the devfreq framework. So, the user never touch the sysfs[1] to change the governor because the sysfs[1] is not created by devfreq framework. [1] /sys/class/devfreq/[device name]/governor In summary, if governor is not present during devfreq_add_device(), the devfreq framework doesn't create tye sysfs[1] for governor. The user-space never change the governor through sysfs[1] because there is no any sysfs entry[1]. I checked the code of devfreq_add_device(). As you mentioned, if there is no governor, devfreq_add_device() don't pass wihtout calling the devfreq->governor->even_handler(). Wrong expression / don't pass -> pass I think it's correct as is. Just because the governor isn't there (or hasn't registered yet) doesn't mean the device add should fail. That aside, do you care to ack this patch for the way the code is currently? Reviewed-by: Chanwoo Choi <cw00.c...@samsung.com> Thanks After fixing the bug of devfreq_add_device(), I'll remove the unneeded 'if statement' of prev_governor in governor_store. As mentioned earlier, I don't think it's a bug. It's fine as is. -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH v3] PM / devfreq: Restart previous governor if new governor fails to start
On 11/09/2016 04:10 PM, Chanwoo Choi wrote: Hi, On 2016년 11월 10일 05:34, Saravana Kannan wrote: On 11/08/2016 06:38 PM, Chanwoo Choi wrote: On 2016년 11월 09일 11:36, Chanwoo Choi wrote: Hi, On 2016년 11월 09일 10:33, Chanwoo Choi wrote: On 2016년 11월 09일 05:52, Saravana Kannan wrote: On 11/08/2016 01:02 AM, Chanwoo Choi wrote: Hi, On 2016년 11월 08일 03:57, Saravana Kannan wrote: On 10/26/2016 05:06 PM, Chanwoo Choi wrote: Hi, On 2016년 10월 27일 04:17, Saravana Kannan wrote: If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan --- * Fix NULL deref for real this time. * Addressed some style preferences. drivers/devfreq/devfreq.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..77c41a5 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; -struct devfreq_governor *governor; +const struct devfreq_governor *governor, *prev_governor; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,21 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } +prev_governor = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); -if (ret) +if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); +if (prev_governor) { I think that prev_governor is always set. You don't need to check it. Why do check it? Not true. Even higher up in this function, we check if df->governor != NULL. Simple example would be that the default governor passed in while adding the device isn't compiled in. I don't understand. If device is not registered by devfreq_add_device(), you don't change the governor by using governor_store(). If you can change the governor through governor_store(), it means that the devfreq instance was added without any problem. The added devfreq instance must have the sepecific governor on df->governor variable. So, I think that df->governor is always set and then prev_governor is always set. Read the code more closely. add_device doesn't (and shouldn't) fail if the governor isn't present. After that, the governor could be changed from user space. If governor is not present during devfreq_add_device(), the devfreq instance is not able to register the devfreq framework. So, the user never touch the sysfs[1] to change the governor because the sysfs[1] is not created by devfreq framework. [1] /sys/class/devfreq/[device name]/governor In summary, if governor is not present during devfreq_add_device(), the devfreq framework doesn't create tye sysfs[1] for governor. The user-space never change the governor through sysfs[1] because there is no any sysfs entry[1]. I checked the code of devfreq_add_device(). As you mentioned, if there is no governor, devfreq_add_device() don't pass wihtout calling the devfreq->governor->even_handler(). Wrong expression / don't pass -> pass I think it's correct as is. Just because the governor isn't there (or hasn't registered yet) doesn't mean the device add should fail. That aside, do you care to ack this patch for the way the code is currently? Reviewed-by: Chanwoo Choi Thanks After fixing the bug of devfreq_add_device(), I'll remove the unneeded 'if statement' of prev_governor in governor_store. As mentioned earlier, I don't think it's a bug. It's fine as is. -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH v3] PM / devfreq: Restart previous governor if new governor fails to start
On 11/08/2016 06:38 PM, Chanwoo Choi wrote: On 2016년 11월 09일 11:36, Chanwoo Choi wrote: Hi, On 2016년 11월 09일 10:33, Chanwoo Choi wrote: On 2016년 11월 09일 05:52, Saravana Kannan wrote: On 11/08/2016 01:02 AM, Chanwoo Choi wrote: Hi, On 2016년 11월 08일 03:57, Saravana Kannan wrote: On 10/26/2016 05:06 PM, Chanwoo Choi wrote: Hi, On 2016년 10월 27일 04:17, Saravana Kannan wrote: If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan <skan...@codeaurora.org> --- * Fix NULL deref for real this time. * Addressed some style preferences. drivers/devfreq/devfreq.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..77c41a5 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; -struct devfreq_governor *governor; +const struct devfreq_governor *governor, *prev_governor; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,21 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } +prev_governor = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); -if (ret) +if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); +if (prev_governor) { I think that prev_governor is always set. You don't need to check it. Why do check it? Not true. Even higher up in this function, we check if df->governor != NULL. Simple example would be that the default governor passed in while adding the device isn't compiled in. I don't understand. If device is not registered by devfreq_add_device(), you don't change the governor by using governor_store(). If you can change the governor through governor_store(), it means that the devfreq instance was added without any problem. The added devfreq instance must have the sepecific governor on df->governor variable. So, I think that df->governor is always set and then prev_governor is always set. Read the code more closely. add_device doesn't (and shouldn't) fail if the governor isn't present. After that, the governor could be changed from user space. If governor is not present during devfreq_add_device(), the devfreq instance is not able to register the devfreq framework. So, the user never touch the sysfs[1] to change the governor because the sysfs[1] is not created by devfreq framework. [1] /sys/class/devfreq/[device name]/governor In summary, if governor is not present during devfreq_add_device(), the devfreq framework doesn't create tye sysfs[1] for governor. The user-space never change the governor through sysfs[1] because there is no any sysfs entry[1]. I checked the code of devfreq_add_device(). As you mentioned, if there is no governor, devfreq_add_device() don't pass wihtout calling the devfreq->governor->even_handler(). Wrong expression / don't pass -> pass I think it's correct as is. Just because the governor isn't there (or hasn't registered yet) doesn't mean the device add should fail. That aside, do you care to ack this patch for the way the code is currently? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH v3] PM / devfreq: Restart previous governor if new governor fails to start
On 11/08/2016 06:38 PM, Chanwoo Choi wrote: On 2016년 11월 09일 11:36, Chanwoo Choi wrote: Hi, On 2016년 11월 09일 10:33, Chanwoo Choi wrote: On 2016년 11월 09일 05:52, Saravana Kannan wrote: On 11/08/2016 01:02 AM, Chanwoo Choi wrote: Hi, On 2016년 11월 08일 03:57, Saravana Kannan wrote: On 10/26/2016 05:06 PM, Chanwoo Choi wrote: Hi, On 2016년 10월 27일 04:17, Saravana Kannan wrote: If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan --- * Fix NULL deref for real this time. * Addressed some style preferences. drivers/devfreq/devfreq.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..77c41a5 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; -struct devfreq_governor *governor; +const struct devfreq_governor *governor, *prev_governor; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,21 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } +prev_governor = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); -if (ret) +if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); +if (prev_governor) { I think that prev_governor is always set. You don't need to check it. Why do check it? Not true. Even higher up in this function, we check if df->governor != NULL. Simple example would be that the default governor passed in while adding the device isn't compiled in. I don't understand. If device is not registered by devfreq_add_device(), you don't change the governor by using governor_store(). If you can change the governor through governor_store(), it means that the devfreq instance was added without any problem. The added devfreq instance must have the sepecific governor on df->governor variable. So, I think that df->governor is always set and then prev_governor is always set. Read the code more closely. add_device doesn't (and shouldn't) fail if the governor isn't present. After that, the governor could be changed from user space. If governor is not present during devfreq_add_device(), the devfreq instance is not able to register the devfreq framework. So, the user never touch the sysfs[1] to change the governor because the sysfs[1] is not created by devfreq framework. [1] /sys/class/devfreq/[device name]/governor In summary, if governor is not present during devfreq_add_device(), the devfreq framework doesn't create tye sysfs[1] for governor. The user-space never change the governor through sysfs[1] because there is no any sysfs entry[1]. I checked the code of devfreq_add_device(). As you mentioned, if there is no governor, devfreq_add_device() don't pass wihtout calling the devfreq->governor->even_handler(). Wrong expression / don't pass -> pass I think it's correct as is. Just because the governor isn't there (or hasn't registered yet) doesn't mean the device add should fail. That aside, do you care to ack this patch for the way the code is currently? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH v3] PM / devfreq: Restart previous governor if new governor fails to start
On 11/08/2016 01:02 AM, Chanwoo Choi wrote: Hi, On 2016년 11월 08일 03:57, Saravana Kannan wrote: On 10/26/2016 05:06 PM, Chanwoo Choi wrote: Hi, On 2016년 10월 27일 04:17, Saravana Kannan wrote: If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan <skan...@codeaurora.org> --- * Fix NULL deref for real this time. * Addressed some style preferences. drivers/devfreq/devfreq.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..77c41a5 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; -struct devfreq_governor *governor; +const struct devfreq_governor *governor, *prev_governor; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,21 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } +prev_governor = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); -if (ret) +if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); +if (prev_governor) { I think that prev_governor is always set. You don't need to check it. Why do check it? Not true. Even higher up in this function, we check if df->governor != NULL. Simple example would be that the default governor passed in while adding the device isn't compiled in. I don't understand. If device is not registered by devfreq_add_device(), you don't change the governor by using governor_store(). If you can change the governor through governor_store(), it means that the devfreq instance was added without any problem. The added devfreq instance must have the sepecific governor on df->governor variable. So, I think that df->governor is always set and then prev_governor is always set. Read the code more closely. add_device doesn't (and shouldn't) fail if the governor isn't present. After that, the governor could be changed from user space. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH v3] PM / devfreq: Restart previous governor if new governor fails to start
On 11/08/2016 01:02 AM, Chanwoo Choi wrote: Hi, On 2016년 11월 08일 03:57, Saravana Kannan wrote: On 10/26/2016 05:06 PM, Chanwoo Choi wrote: Hi, On 2016년 10월 27일 04:17, Saravana Kannan wrote: If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan --- * Fix NULL deref for real this time. * Addressed some style preferences. drivers/devfreq/devfreq.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..77c41a5 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; -struct devfreq_governor *governor; +const struct devfreq_governor *governor, *prev_governor; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,21 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } +prev_governor = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); -if (ret) +if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); +if (prev_governor) { I think that prev_governor is always set. You don't need to check it. Why do check it? Not true. Even higher up in this function, we check if df->governor != NULL. Simple example would be that the default governor passed in while adding the device isn't compiled in. I don't understand. If device is not registered by devfreq_add_device(), you don't change the governor by using governor_store(). If you can change the governor through governor_store(), it means that the devfreq instance was added without any problem. The added devfreq instance must have the sepecific governor on df->governor variable. So, I think that df->governor is always set and then prev_governor is always set. Read the code more closely. add_device doesn't (and shouldn't) fail if the governor isn't present. After that, the governor could be changed from user space. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH v3] PM / devfreq: Restart previous governor if new governor fails to start
On 10/26/2016 05:06 PM, Chanwoo Choi wrote: Hi, On 2016년 10월 27일 04:17, Saravana Kannan wrote: If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan <skan...@codeaurora.org> --- * Fix NULL deref for real this time. * Addressed some style preferences. drivers/devfreq/devfreq.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..77c41a5 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; - struct devfreq_governor *governor; + const struct devfreq_governor *governor, *prev_governor; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,21 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } + prev_governor = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); - if (ret) + if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); + if (prev_governor) { I think that prev_governor is always set. You don't need to check it. Why do check it? Not true. Even higher up in this function, we check if df->governor != NULL. Simple example would be that the default governor passed in while adding the device isn't compiled in. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH v3] PM / devfreq: Restart previous governor if new governor fails to start
On 10/26/2016 05:06 PM, Chanwoo Choi wrote: Hi, On 2016년 10월 27일 04:17, Saravana Kannan wrote: If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan --- * Fix NULL deref for real this time. * Addressed some style preferences. drivers/devfreq/devfreq.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..77c41a5 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; - struct devfreq_governor *governor; + const struct devfreq_governor *governor, *prev_governor; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,21 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } + prev_governor = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); - if (ret) + if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); + if (prev_governor) { I think that prev_governor is always set. You don't need to check it. Why do check it? Not true. Even higher up in this function, we check if df->governor != NULL. Simple example would be that the default governor passed in while adding the device isn't compiled in. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH v2] PM / devfreq: Restart previous governor if new governor fails to start
On 10/26/2016 12:22 AM, Chanwoo Choi wrote: Hi, On 2016년 10월 26일 10:25, Saravana Kannan wrote: If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan <skan...@codeaurora.org> --- * Fixed typo in commit text * Fixed potential NULL deref drivers/devfreq/devfreq.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..243253e 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; - struct devfreq_governor *governor; + const struct devfreq_governor *governor, *prev_gov; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,17 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } + prev_gov = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); - if (ret) + if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); + df->governor = prev_gov; + strncpy(df->governor_name, prev_gov->name, DEVFREQ_NAME_LEN); + df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); + } Sigh... looks like I added the changes to the index but forgot to commit it before I sent the patch. Will send v3. out: mutex_unlock(_list_lock); Looks good to me. But, how about using the 'prev_governor' instead of 'prev_gov'? because the variable name of current governor is 'governor' instead of 'gov'. I was trying to avoid wrapping lines too much. But this doesn't cause any additional wraps. Done. Reviewed-by: Chanwoo Choi <cw00.c...@samsung.com> Can you please review v3? Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH v2] PM / devfreq: Restart previous governor if new governor fails to start
On 10/26/2016 12:22 AM, Chanwoo Choi wrote: Hi, On 2016년 10월 26일 10:25, Saravana Kannan wrote: If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan --- * Fixed typo in commit text * Fixed potential NULL deref drivers/devfreq/devfreq.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..243253e 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; - struct devfreq_governor *governor; + const struct devfreq_governor *governor, *prev_gov; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,17 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } + prev_gov = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); - if (ret) + if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); + df->governor = prev_gov; + strncpy(df->governor_name, prev_gov->name, DEVFREQ_NAME_LEN); + df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); + } Sigh... looks like I added the changes to the index but forgot to commit it before I sent the patch. Will send v3. out: mutex_unlock(_list_lock); Looks good to me. But, how about using the 'prev_governor' instead of 'prev_gov'? because the variable name of current governor is 'governor' instead of 'gov'. I was trying to avoid wrapping lines too much. But this doesn't cause any additional wraps. Done. Reviewed-by: Chanwoo Choi Can you please review v3? Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH v3] PM / devfreq: Restart previous governor if new governor fails to start
If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan <skan...@codeaurora.org> --- * Fix NULL deref for real this time. * Addressed some style preferences. drivers/devfreq/devfreq.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..77c41a5 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; - struct devfreq_governor *governor; + const struct devfreq_governor *governor, *prev_governor; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,21 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } + prev_governor = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); - if (ret) + if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); + if (prev_governor) { + df->governor = prev_governor; + strncpy(df->governor_name, prev_governor->name, + DEVFREQ_NAME_LEN); + df->governor->event_handler(df, DEVFREQ_GOV_START, + NULL); + } + } out: mutex_unlock(_list_lock); -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH v3] PM / devfreq: Restart previous governor if new governor fails to start
If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan --- * Fix NULL deref for real this time. * Addressed some style preferences. drivers/devfreq/devfreq.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..77c41a5 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; - struct devfreq_governor *governor; + const struct devfreq_governor *governor, *prev_governor; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,21 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } + prev_governor = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); - if (ret) + if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); + if (prev_governor) { + df->governor = prev_governor; + strncpy(df->governor_name, prev_governor->name, + DEVFREQ_NAME_LEN); + df->governor->event_handler(df, DEVFREQ_GOV_START, + NULL); + } + } out: mutex_unlock(_list_lock); -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH v2] PM / devfreq: Restart previous governor if new governor fails to start
If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan <skan...@codeaurora.org> --- * Fixed typo in commit text * Fixed potential NULL deref drivers/devfreq/devfreq.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..243253e 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; - struct devfreq_governor *governor; + const struct devfreq_governor *governor, *prev_gov; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,17 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } + prev_gov = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); - if (ret) + if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); + df->governor = prev_gov; + strncpy(df->governor_name, prev_gov->name, DEVFREQ_NAME_LEN); + df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); + } out: mutex_unlock(_list_lock); -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH v2] PM / devfreq: Restart previous governor if new governor fails to start
If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. Signed-off-by: Saravana Kannan --- * Fixed typo in commit text * Fixed potential NULL deref drivers/devfreq/devfreq.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..243253e 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; - struct devfreq_governor *governor; + const struct devfreq_governor *governor, *prev_gov; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,17 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } + prev_gov = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); - if (ret) + if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); + df->governor = prev_gov; + strncpy(df->governor_name, prev_gov->name, DEVFREQ_NAME_LEN); + df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); + } out: mutex_unlock(_list_lock); -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH] PM / devfreq: Restart previous governor if new governor fails to start
If the new governor fails to start, switch back to old governor so that the devfreq state is not left if some weird limbo. Signed-off-by: Saravana Kannan <skan...@codeaurora.org> --- drivers/devfreq/devfreq.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..243253e 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; - struct devfreq_governor *governor; + const struct devfreq_governor *governor, *prev_gov; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,17 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } + prev_gov = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); - if (ret) + if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); + df->governor = prev_gov; + strncpy(df->governor_name, prev_gov->name, DEVFREQ_NAME_LEN); + df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); + } out: mutex_unlock(_list_lock); -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH] PM / devfreq: Restart previous governor if new governor fails to start
If the new governor fails to start, switch back to old governor so that the devfreq state is not left if some weird limbo. Signed-off-by: Saravana Kannan --- drivers/devfreq/devfreq.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index bf3ea76..243253e 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -919,7 +919,7 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, struct devfreq *df = to_devfreq(dev); int ret; char str_governor[DEVFREQ_NAME_LEN + 1]; - struct devfreq_governor *governor; + const struct devfreq_governor *governor, *prev_gov; ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor); if (ret != 1) @@ -944,12 +944,17 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr, goto out; } } + prev_gov = df->governor; df->governor = governor; strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN); ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); - if (ret) + if (ret) { dev_warn(dev, "%s: Governor %s not started(%d)\n", __func__, df->governor->name, ret); + df->governor = prev_gov; + strncpy(df->governor_name, prev_gov->name, DEVFREQ_NAME_LEN); + df->governor->event_handler(df, DEVFREQ_GOV_START, NULL); + } out: mutex_unlock(_list_lock); -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 1/7] cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event
On 04/06/2016 02:45 PM, Rafael J. Wysocki wrote: On Wed, Apr 6, 2016 at 11:29 PM, Saravana Kannan <skan...@codeaurora.org> wrote: On 04/06/2016 02:21 PM, Rafael J. Wysocki wrote: On Wed, Apr 6, 2016 at 10:30 PM, Saravana Kannan <skan...@codeaurora.org> wrote: On 09/09/2015 05:53 PM, Rafael J. Wysocki wrote: [cut] Well, nobody was using that event. True, but that's more of a bug in drivers/thermal/cpu-cooling.c and drivers/acpi/processor_thermal.c. We should revert this patch and fix those drivers. Does that seem acceptable to you? I'd rather see a patch series adding the event back along with some users. One user at least. Ok, I'll make those two drivers use them and send it out. It's very clearly a bug in those drivers. -Saravana` -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 1/7] cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event
On 04/06/2016 02:45 PM, Rafael J. Wysocki wrote: On Wed, Apr 6, 2016 at 11:29 PM, Saravana Kannan wrote: On 04/06/2016 02:21 PM, Rafael J. Wysocki wrote: On Wed, Apr 6, 2016 at 10:30 PM, Saravana Kannan wrote: On 09/09/2015 05:53 PM, Rafael J. Wysocki wrote: [cut] Well, nobody was using that event. True, but that's more of a bug in drivers/thermal/cpu-cooling.c and drivers/acpi/processor_thermal.c. We should revert this patch and fix those drivers. Does that seem acceptable to you? I'd rather see a patch series adding the event back along with some users. One user at least. Ok, I'll make those two drivers use them and send it out. It's very clearly a bug in those drivers. -Saravana` -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 1/7] cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event
On 04/06/2016 02:21 PM, Rafael J. Wysocki wrote: On Wed, Apr 6, 2016 at 10:30 PM, Saravana Kannan <skan...@codeaurora.org> wrote: On 09/09/2015 05:53 PM, Rafael J. Wysocki wrote: Hi, On Thu, Sep 10, 2015 at 2:39 AM, Viresh Kumar <viresh.ku...@linaro.org> wrote: On 10-09-15, 01:26, Rafael J. Wysocki wrote: On Monday, August 03, 2015 08:36:14 AM Viresh Kumar wrote: What's being done from CPUFREQ_INCOMPATIBLE, can also be done with CPUFREQ_ADJUST. There is nothing special with CPUFREQ_INCOMPATIBLE notifier. The above part of the changelog is a disaster to me. :-( It not only doesn't explain what really goes on, but it's actively confusing. What really happens is that the core sends CPUFREQ_INCOMPATIBLE notifications unconditionally right after sending the CPUFREQ_ADJUST ones, so the former is just redundant and it's more efficient to merge the two into one. Undoubtedly this looks far better :) But, isn't this series already applied some time back ? Right, never mind. For some reason that patch was left in the "New" state. The code is OK. I guess I didn't notice this change when it was sent out. The comment that was deleted in this patch clearly states why the INCOMPATIBLE notifier is needed. Some client might want to boost the CPU min freq for performance or other reasons, but thermal might want to limit it. So, by having thermal register for INCOMPATIBLE notifiers to enforce the limits, we provide a way to guarantee it gets the final say. The real fix should have been to change drivers/thermal/cpu_cooling.c to use CPUFREQ_INCOMPATIBLE instead of CPUFREQ_ADJUST. Is there something I'm missing? If not, can we please revert this patch? Well, nobody was using that event. True, but that's more of a bug in drivers/thermal/cpu-cooling.c and drivers/acpi/processor_thermal.c. We should revert this patch and fix those drivers. Does that seem acceptable to you? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 1/7] cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event
On 04/06/2016 02:21 PM, Rafael J. Wysocki wrote: On Wed, Apr 6, 2016 at 10:30 PM, Saravana Kannan wrote: On 09/09/2015 05:53 PM, Rafael J. Wysocki wrote: Hi, On Thu, Sep 10, 2015 at 2:39 AM, Viresh Kumar wrote: On 10-09-15, 01:26, Rafael J. Wysocki wrote: On Monday, August 03, 2015 08:36:14 AM Viresh Kumar wrote: What's being done from CPUFREQ_INCOMPATIBLE, can also be done with CPUFREQ_ADJUST. There is nothing special with CPUFREQ_INCOMPATIBLE notifier. The above part of the changelog is a disaster to me. :-( It not only doesn't explain what really goes on, but it's actively confusing. What really happens is that the core sends CPUFREQ_INCOMPATIBLE notifications unconditionally right after sending the CPUFREQ_ADJUST ones, so the former is just redundant and it's more efficient to merge the two into one. Undoubtedly this looks far better :) But, isn't this series already applied some time back ? Right, never mind. For some reason that patch was left in the "New" state. The code is OK. I guess I didn't notice this change when it was sent out. The comment that was deleted in this patch clearly states why the INCOMPATIBLE notifier is needed. Some client might want to boost the CPU min freq for performance or other reasons, but thermal might want to limit it. So, by having thermal register for INCOMPATIBLE notifiers to enforce the limits, we provide a way to guarantee it gets the final say. The real fix should have been to change drivers/thermal/cpu_cooling.c to use CPUFREQ_INCOMPATIBLE instead of CPUFREQ_ADJUST. Is there something I'm missing? If not, can we please revert this patch? Well, nobody was using that event. True, but that's more of a bug in drivers/thermal/cpu-cooling.c and drivers/acpi/processor_thermal.c. We should revert this patch and fix those drivers. Does that seem acceptable to you? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
On 02/07/2016 06:28 PM, Rafael J. Wysocki wrote: On Friday, February 05, 2016 06:22:35 PM Saravana Kannan wrote: On 02/04/2016 07:54 PM, Rafael J. Wysocki wrote: On Thursday, February 04, 2016 07:18:32 PM Rafael J. Wysocki wrote: On Thu, Feb 4, 2016 at 6:44 PM, Saravana Kannan wrote: On 02/04/2016 09:43 AM, Saravana Kannan wrote: On 02/04/2016 03:09 AM, Viresh Kumar wrote: On 04-02-16, 00:50, Rafael J. Wysocki wrote: This is exactly right. We've avoided one deadlock only to trip into another one. This happens because update_sampling_rate() acquires od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by cpufreq_governor_dbs(). Worse yet, a deadlock can still happen without (the new) dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if update_sampling_rate() runs in parallel with cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins the race. It looks like we need to drop the governor mutex before putting the kobject in cpufreq_governor_exit(). [cut] No no no no! Let's not open up this can of worms of queuing up the work to handle a write to a sysfs file. It *MIGHT* work for this specific tunable (I haven't bothered to analyze), but this makes it impossible to return a useful/proper error value. Sent too soon. Not only that, but it can also cause the writes to the sysfs files to get processed in a different order and I don't know what other issues/races THAT will open up. Well, I don't like this too. I actually do have an idea about how to fix these deadlocks, but it is on top of my cleanup series. I'll write more about it later today. Having actually posted that series again after cleaning it up I can say what I'm thinking about hopefully without confusing anyone too much. So please bear in mind that I'm going to refer to this series below: http://marc.info/?l=linux-pm=145463901630950=4 Also this is more of a brain dump rather than actual design description, so there may be holes etc in it. Please let me know if you can see any. The problem at hand is that policy->rwsem needs to be held around *all* operations in cpufreq_set_policy(). In particular, it cannot be dropped around invocations of __cpufreq_governor() with the event arg equal to _EXIT as that leads to interesting races. Unfortunately, we know that holding policy->rwsem in those places leads to a deadlock with governor sysfs attributes removal in cpufreq_governor_exit(). Viresh attempted to fix this by avoiding to acquire policy->rwsem for governor attributes access (as holding it is not necessary for them in principle). That was a nice try, but it turned out to be insufficient because of another deadlock scenario uncovered by it. Namely, since the ondemand governor's update_sampling_rate() acquires the governor mutex (called dbs_data_mutex after my patches mentioned above), it may deadlock with exactly the same piece of code in cpufreq_governor_exit() in almost exactly the same way. To avoid that other deadlock, we'd either need to drop dbs_data_mutex from update_sampling_rate(), or drop it for the removal of the governor sysfs attributes in cpufreq_governor_exit(). I don't think the former is an option at least at this point, so it looks like we pretty much have to do the latter. With that in mind, I'd start with the changes made by Viresh (maybe without the first patch which really isn't essential here). That is, introduce a separate kobject type for the governor attributes kobject and register that in cpufreq_governor_init(). The show/store callbacks for that kobject type won't acquire policy->rwsem so the first deadlock will be avoided. But in addition to that, I'd drop dbs_data_mutex before the removal of governor sysfs attributes. That actually happens in two places, in cpufreq_governor_exit() and in the error path of cpufreq_governor_init(). To that end, I'd move the locking from cpufreq_governor_dbs() to the functions called by it. That should be readily doable and they can do all of the necessary checks themselves. cpufreq_governor_dbs() would become a pure mux then, but that's not such a big deal. With that, cpufreq_governor_exit() may just drop the lock before it does the final kobject_put(). The danger here is that the sysfs show/store callbacks of the governor attributes kobject may see invalid dbs_data for a while, after the lock has been dropped and before the kobject is deleted. That may be addressed by checking, for example, the presence of the dbs_data's "tuners" pointer in those callbacks. If it is NULL, they can simply return -EAGAIN or similar. Now, that means, though, that they need to acquire the same lock as cpufreq_governor_exit(), or they may see things go away while they are running. The simplest approach here would be to take dbs_data_mutex in them too, although that's a bit of a sledgehammer. It might be better to have a per-policy lock in struct policy_dbs_info for that, for example, but
Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
On 02/07/2016 06:28 PM, Rafael J. Wysocki wrote: On Friday, February 05, 2016 06:22:35 PM Saravana Kannan wrote: On 02/04/2016 07:54 PM, Rafael J. Wysocki wrote: On Thursday, February 04, 2016 07:18:32 PM Rafael J. Wysocki wrote: On Thu, Feb 4, 2016 at 6:44 PM, Saravana Kannan <skan...@codeaurora.org> wrote: On 02/04/2016 09:43 AM, Saravana Kannan wrote: On 02/04/2016 03:09 AM, Viresh Kumar wrote: On 04-02-16, 00:50, Rafael J. Wysocki wrote: This is exactly right. We've avoided one deadlock only to trip into another one. This happens because update_sampling_rate() acquires od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by cpufreq_governor_dbs(). Worse yet, a deadlock can still happen without (the new) dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if update_sampling_rate() runs in parallel with cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins the race. It looks like we need to drop the governor mutex before putting the kobject in cpufreq_governor_exit(). [cut] No no no no! Let's not open up this can of worms of queuing up the work to handle a write to a sysfs file. It *MIGHT* work for this specific tunable (I haven't bothered to analyze), but this makes it impossible to return a useful/proper error value. Sent too soon. Not only that, but it can also cause the writes to the sysfs files to get processed in a different order and I don't know what other issues/races THAT will open up. Well, I don't like this too. I actually do have an idea about how to fix these deadlocks, but it is on top of my cleanup series. I'll write more about it later today. Having actually posted that series again after cleaning it up I can say what I'm thinking about hopefully without confusing anyone too much. So please bear in mind that I'm going to refer to this series below: http://marc.info/?l=linux-pm=145463901630950=4 Also this is more of a brain dump rather than actual design description, so there may be holes etc in it. Please let me know if you can see any. The problem at hand is that policy->rwsem needs to be held around *all* operations in cpufreq_set_policy(). In particular, it cannot be dropped around invocations of __cpufreq_governor() with the event arg equal to _EXIT as that leads to interesting races. Unfortunately, we know that holding policy->rwsem in those places leads to a deadlock with governor sysfs attributes removal in cpufreq_governor_exit(). Viresh attempted to fix this by avoiding to acquire policy->rwsem for governor attributes access (as holding it is not necessary for them in principle). That was a nice try, but it turned out to be insufficient because of another deadlock scenario uncovered by it. Namely, since the ondemand governor's update_sampling_rate() acquires the governor mutex (called dbs_data_mutex after my patches mentioned above), it may deadlock with exactly the same piece of code in cpufreq_governor_exit() in almost exactly the same way. To avoid that other deadlock, we'd either need to drop dbs_data_mutex from update_sampling_rate(), or drop it for the removal of the governor sysfs attributes in cpufreq_governor_exit(). I don't think the former is an option at least at this point, so it looks like we pretty much have to do the latter. With that in mind, I'd start with the changes made by Viresh (maybe without the first patch which really isn't essential here). That is, introduce a separate kobject type for the governor attributes kobject and register that in cpufreq_governor_init(). The show/store callbacks for that kobject type won't acquire policy->rwsem so the first deadlock will be avoided. But in addition to that, I'd drop dbs_data_mutex before the removal of governor sysfs attributes. That actually happens in two places, in cpufreq_governor_exit() and in the error path of cpufreq_governor_init(). To that end, I'd move the locking from cpufreq_governor_dbs() to the functions called by it. That should be readily doable and they can do all of the necessary checks themselves. cpufreq_governor_dbs() would become a pure mux then, but that's not such a big deal. With that, cpufreq_governor_exit() may just drop the lock before it does the final kobject_put(). The danger here is that the sysfs show/store callbacks of the governor attributes kobject may see invalid dbs_data for a while, after the lock has been dropped and before the kobject is deleted. That may be addressed by checking, for example, the presence of the dbs_data's "tuners" pointer in those callbacks. If it is NULL, they can simply return -EAGAIN or similar. Now, that means, though, that they need to acquire the same lock as cpufreq_governor_exit(), or they may see things go away while they are running. The simplest approach here would be to take dbs_data_mutex in them too, although that's a bit of a sledgehammer. It might be better to have a per-policy lock in struct pol
Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
On 02/04/2016 07:54 PM, Rafael J. Wysocki wrote: On Thursday, February 04, 2016 07:18:32 PM Rafael J. Wysocki wrote: On Thu, Feb 4, 2016 at 6:44 PM, Saravana Kannan wrote: On 02/04/2016 09:43 AM, Saravana Kannan wrote: On 02/04/2016 03:09 AM, Viresh Kumar wrote: On 04-02-16, 00:50, Rafael J. Wysocki wrote: This is exactly right. We've avoided one deadlock only to trip into another one. This happens because update_sampling_rate() acquires od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by cpufreq_governor_dbs(). Worse yet, a deadlock can still happen without (the new) dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if update_sampling_rate() runs in parallel with cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins the race. It looks like we need to drop the governor mutex before putting the kobject in cpufreq_governor_exit(). [cut] No no no no! Let's not open up this can of worms of queuing up the work to handle a write to a sysfs file. It *MIGHT* work for this specific tunable (I haven't bothered to analyze), but this makes it impossible to return a useful/proper error value. Sent too soon. Not only that, but it can also cause the writes to the sysfs files to get processed in a different order and I don't know what other issues/races THAT will open up. Well, I don't like this too. I actually do have an idea about how to fix these deadlocks, but it is on top of my cleanup series. I'll write more about it later today. Having actually posted that series again after cleaning it up I can say what I'm thinking about hopefully without confusing anyone too much. So please bear in mind that I'm going to refer to this series below: http://marc.info/?l=linux-pm=145463901630950=4 Also this is more of a brain dump rather than actual design description, so there may be holes etc in it. Please let me know if you can see any. The problem at hand is that policy->rwsem needs to be held around *all* operations in cpufreq_set_policy(). In particular, it cannot be dropped around invocations of __cpufreq_governor() with the event arg equal to _EXIT as that leads to interesting races. Unfortunately, we know that holding policy->rwsem in those places leads to a deadlock with governor sysfs attributes removal in cpufreq_governor_exit(). Viresh attempted to fix this by avoiding to acquire policy->rwsem for governor attributes access (as holding it is not necessary for them in principle). That was a nice try, but it turned out to be insufficient because of another deadlock scenario uncovered by it. Namely, since the ondemand governor's update_sampling_rate() acquires the governor mutex (called dbs_data_mutex after my patches mentioned above), it may deadlock with exactly the same piece of code in cpufreq_governor_exit() in almost exactly the same way. To avoid that other deadlock, we'd either need to drop dbs_data_mutex from update_sampling_rate(), or drop it for the removal of the governor sysfs attributes in cpufreq_governor_exit(). I don't think the former is an option at least at this point, so it looks like we pretty much have to do the latter. With that in mind, I'd start with the changes made by Viresh (maybe without the first patch which really isn't essential here). That is, introduce a separate kobject type for the governor attributes kobject and register that in cpufreq_governor_init(). The show/store callbacks for that kobject type won't acquire policy->rwsem so the first deadlock will be avoided. But in addition to that, I'd drop dbs_data_mutex before the removal of governor sysfs attributes. That actually happens in two places, in cpufreq_governor_exit() and in the error path of cpufreq_governor_init(). To that end, I'd move the locking from cpufreq_governor_dbs() to the functions called by it. That should be readily doable and they can do all of the necessary checks themselves. cpufreq_governor_dbs() would become a pure mux then, but that's not such a big deal. With that, cpufreq_governor_exit() may just drop the lock before it does the final kobject_put(). The danger here is that the sysfs show/store callbacks of the governor attributes kobject may see invalid dbs_data for a while, after the lock has been dropped and before the kobject is deleted. That may be addressed by checking, for example, the presence of the dbs_data's "tuners" pointer in those callbacks. If it is NULL, they can simply return -EAGAIN or similar. Now, that means, though, that they need to acquire the same lock as cpufreq_governor_exit(), or they may see things go away while they are running. The simplest approach here would be to take dbs_data_mutex in them too, although that's a bit of a sledgehammer. It might be better to have a per-policy lock in struct policy_dbs_info for that, for example, but then the governor attribute sysfs callbacks would need to get that object instead of dbs_data. On the fli
Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
On 02/04/2016 07:54 PM, Rafael J. Wysocki wrote: On Thursday, February 04, 2016 07:18:32 PM Rafael J. Wysocki wrote: On Thu, Feb 4, 2016 at 6:44 PM, Saravana Kannan <skan...@codeaurora.org> wrote: On 02/04/2016 09:43 AM, Saravana Kannan wrote: On 02/04/2016 03:09 AM, Viresh Kumar wrote: On 04-02-16, 00:50, Rafael J. Wysocki wrote: This is exactly right. We've avoided one deadlock only to trip into another one. This happens because update_sampling_rate() acquires od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by cpufreq_governor_dbs(). Worse yet, a deadlock can still happen without (the new) dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if update_sampling_rate() runs in parallel with cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins the race. It looks like we need to drop the governor mutex before putting the kobject in cpufreq_governor_exit(). [cut] No no no no! Let's not open up this can of worms of queuing up the work to handle a write to a sysfs file. It *MIGHT* work for this specific tunable (I haven't bothered to analyze), but this makes it impossible to return a useful/proper error value. Sent too soon. Not only that, but it can also cause the writes to the sysfs files to get processed in a different order and I don't know what other issues/races THAT will open up. Well, I don't like this too. I actually do have an idea about how to fix these deadlocks, but it is on top of my cleanup series. I'll write more about it later today. Having actually posted that series again after cleaning it up I can say what I'm thinking about hopefully without confusing anyone too much. So please bear in mind that I'm going to refer to this series below: http://marc.info/?l=linux-pm=145463901630950=4 Also this is more of a brain dump rather than actual design description, so there may be holes etc in it. Please let me know if you can see any. The problem at hand is that policy->rwsem needs to be held around *all* operations in cpufreq_set_policy(). In particular, it cannot be dropped around invocations of __cpufreq_governor() with the event arg equal to _EXIT as that leads to interesting races. Unfortunately, we know that holding policy->rwsem in those places leads to a deadlock with governor sysfs attributes removal in cpufreq_governor_exit(). Viresh attempted to fix this by avoiding to acquire policy->rwsem for governor attributes access (as holding it is not necessary for them in principle). That was a nice try, but it turned out to be insufficient because of another deadlock scenario uncovered by it. Namely, since the ondemand governor's update_sampling_rate() acquires the governor mutex (called dbs_data_mutex after my patches mentioned above), it may deadlock with exactly the same piece of code in cpufreq_governor_exit() in almost exactly the same way. To avoid that other deadlock, we'd either need to drop dbs_data_mutex from update_sampling_rate(), or drop it for the removal of the governor sysfs attributes in cpufreq_governor_exit(). I don't think the former is an option at least at this point, so it looks like we pretty much have to do the latter. With that in mind, I'd start with the changes made by Viresh (maybe without the first patch which really isn't essential here). That is, introduce a separate kobject type for the governor attributes kobject and register that in cpufreq_governor_init(). The show/store callbacks for that kobject type won't acquire policy->rwsem so the first deadlock will be avoided. But in addition to that, I'd drop dbs_data_mutex before the removal of governor sysfs attributes. That actually happens in two places, in cpufreq_governor_exit() and in the error path of cpufreq_governor_init(). To that end, I'd move the locking from cpufreq_governor_dbs() to the functions called by it. That should be readily doable and they can do all of the necessary checks themselves. cpufreq_governor_dbs() would become a pure mux then, but that's not such a big deal. With that, cpufreq_governor_exit() may just drop the lock before it does the final kobject_put(). The danger here is that the sysfs show/store callbacks of the governor attributes kobject may see invalid dbs_data for a while, after the lock has been dropped and before the kobject is deleted. That may be addressed by checking, for example, the presence of the dbs_data's "tuners" pointer in those callbacks. If it is NULL, they can simply return -EAGAIN or similar. Now, that means, though, that they need to acquire the same lock as cpufreq_governor_exit(), or they may see things go away while they are running. The simplest approach here would be to take dbs_data_mutex in them too, although that's a bit of a sledgehammer. It might be better to have a per-policy lock in struct policy_dbs_info for that, for example, but then the governor attribute sysfs callbacks would need to get that object inst
Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
On 02/04/2016 09:43 AM, Saravana Kannan wrote: On 02/04/2016 03:09 AM, Viresh Kumar wrote: On 04-02-16, 00:50, Rafael J. Wysocki wrote: This is exactly right. We've avoided one deadlock only to trip into another one. This happens because update_sampling_rate() acquires od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by cpufreq_governor_dbs(). Worse yet, a deadlock can still happen without (the new) dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if update_sampling_rate() runs in parallel with cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins the race. It looks like we need to drop the governor mutex before putting the kobject in cpufreq_governor_exit(). I have tried to explore all possible ways of fixing this, and every other way looked to be racy in some way. Does anyone else have a better idea (untested): -8<- Subject: [PATCH] cpufreq: ondemand: Shoot update_sampling_rate with a separate work Signed-off-by: Viresh Kumar --- drivers/cpufreq/cpufreq_governor.h | 2 ++ drivers/cpufreq/cpufreq_ondemand.c | 39 +- 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 7bed63e14e7d..97e604356b20 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -141,6 +141,8 @@ struct od_dbs_tuners { unsigned int powersave_bias; unsigned int io_is_busy; unsigned int min_sampling_rate; +struct work_struct work; +struct dbs_data *dbs_data; }; struct cs_dbs_tuners { diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 82ed490f7de0..93ad7a226aee 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -242,20 +242,27 @@ static struct common_dbs_data od_dbs_cdata; * reducing the sampling rate, we need to make the new value effective * immediately. */ -static void update_sampling_rate(struct dbs_data *dbs_data, -unsigned int new_rate) +static void update_sampling_rate(struct work_struct *work) { -struct od_dbs_tuners *od_tuners = dbs_data->tuners; +struct od_dbs_tuners *od_tuners = container_of(work, struct + od_dbs_tuners, work); +unsigned int new_rate = od_tuners->sampling_rate; +struct dbs_data *dbs_data = od_tuners->dbs_data; struct cpumask cpumask; int cpu; -od_tuners->sampling_rate = new_rate = max(new_rate, -od_tuners->min_sampling_rate); - /* * Lock governor so that governor start/stop can't execute in parallel. + * + * We can't do a regular mutex_lock() here, as that may deadlock against + * another thread performing CPUFREQ_GOV_POLICY_EXIT event on the + * governor, which might have already taken od_dbs_cdata.mutex and is + * waiting for this work to finish. */ -mutex_lock(_dbs_cdata.mutex); +if (!mutex_trylock(_dbs_cdata.mutex)) { +queue_work(system_wq, _tuners->work); +return; +} cpumask_copy(, cpu_online_mask); @@ -311,13 +318,22 @@ static void update_sampling_rate(struct dbs_data *dbs_data, static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf, size_t count) { +struct od_dbs_tuners *od_tuners = dbs_data->tuners; unsigned int input; int ret; ret = sscanf(buf, "%u", ); if (ret != 1) return -EINVAL; -update_sampling_rate(dbs_data, input); +od_tuners->sampling_rate = max(input, od_tuners->min_sampling_rate); + +/* + * update_sampling_rate() requires to hold od_dbs_cdata.mutex, but we + * can't take that from this thread, otherwise it results in ABBA + * lockdep between s_active and od_dbs_cdata.mutex locks. + */ +queue_work(system_wq, _tuners->work); + return count; } @@ -501,6 +517,8 @@ static int od_init(struct dbs_data *dbs_data, bool notify) tuners->ignore_nice_load = 0; tuners->powersave_bias = default_powersave_bias; tuners->io_is_busy = should_io_be_busy(); +INIT_WORK(>work, update_sampling_rate); +tuners->dbs_data = dbs_data; dbs_data->tuners = tuners; return 0; @@ -508,7 +526,10 @@ static int od_init(struct dbs_data *dbs_data, bool notify) static void od_exit(struct dbs_data *dbs_data, bool notify) { -kfree(dbs_data->tuners); +struct od_dbs_tuners *tuners = dbs_data->tuners; + +cancel_work_sync(>work); +kfree(tuners); } define_get_cpu_dbs_routines(od_cpu_dbs_info); No no no no! Let's not open up this can of worms of queuing up the work to handle a write to a sysfs file. It *MIGHT* work for this specific tunable (I haven't bothered to analyze), but this makes it impossible to return a useful/proper e
Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
On 02/04/2016 03:09 AM, Viresh Kumar wrote: On 04-02-16, 00:50, Rafael J. Wysocki wrote: This is exactly right. We've avoided one deadlock only to trip into another one. This happens because update_sampling_rate() acquires od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by cpufreq_governor_dbs(). Worse yet, a deadlock can still happen without (the new) dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if update_sampling_rate() runs in parallel with cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins the race. It looks like we need to drop the governor mutex before putting the kobject in cpufreq_governor_exit(). I have tried to explore all possible ways of fixing this, and every other way looked to be racy in some way. Does anyone else have a better idea (untested): -8<- Subject: [PATCH] cpufreq: ondemand: Shoot update_sampling_rate with a separate work Signed-off-by: Viresh Kumar --- drivers/cpufreq/cpufreq_governor.h | 2 ++ drivers/cpufreq/cpufreq_ondemand.c | 39 +- 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 7bed63e14e7d..97e604356b20 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -141,6 +141,8 @@ struct od_dbs_tuners { unsigned int powersave_bias; unsigned int io_is_busy; unsigned int min_sampling_rate; + struct work_struct work; + struct dbs_data *dbs_data; }; struct cs_dbs_tuners { diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 82ed490f7de0..93ad7a226aee 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -242,20 +242,27 @@ static struct common_dbs_data od_dbs_cdata; * reducing the sampling rate, we need to make the new value effective * immediately. */ -static void update_sampling_rate(struct dbs_data *dbs_data, - unsigned int new_rate) +static void update_sampling_rate(struct work_struct *work) { - struct od_dbs_tuners *od_tuners = dbs_data->tuners; + struct od_dbs_tuners *od_tuners = container_of(work, struct + od_dbs_tuners, work); + unsigned int new_rate = od_tuners->sampling_rate; + struct dbs_data *dbs_data = od_tuners->dbs_data; struct cpumask cpumask; int cpu; - od_tuners->sampling_rate = new_rate = max(new_rate, - od_tuners->min_sampling_rate); - /* * Lock governor so that governor start/stop can't execute in parallel. +* +* We can't do a regular mutex_lock() here, as that may deadlock against +* another thread performing CPUFREQ_GOV_POLICY_EXIT event on the +* governor, which might have already taken od_dbs_cdata.mutex and is +* waiting for this work to finish. */ - mutex_lock(_dbs_cdata.mutex); + if (!mutex_trylock(_dbs_cdata.mutex)) { + queue_work(system_wq, _tuners->work); + return; + } cpumask_copy(, cpu_online_mask); @@ -311,13 +318,22 @@ static void update_sampling_rate(struct dbs_data *dbs_data, static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf, size_t count) { + struct od_dbs_tuners *od_tuners = dbs_data->tuners; unsigned int input; int ret; ret = sscanf(buf, "%u", ); if (ret != 1) return -EINVAL; - update_sampling_rate(dbs_data, input); + od_tuners->sampling_rate = max(input, od_tuners->min_sampling_rate); + + /* +* update_sampling_rate() requires to hold od_dbs_cdata.mutex, but we +* can't take that from this thread, otherwise it results in ABBA +* lockdep between s_active and od_dbs_cdata.mutex locks. +*/ + queue_work(system_wq, _tuners->work); + return count; } @@ -501,6 +517,8 @@ static int od_init(struct dbs_data *dbs_data, bool notify) tuners->ignore_nice_load = 0; tuners->powersave_bias = default_powersave_bias; tuners->io_is_busy = should_io_be_busy(); + INIT_WORK(>work, update_sampling_rate); + tuners->dbs_data = dbs_data; dbs_data->tuners = tuners; return 0; @@ -508,7 +526,10 @@ static int od_init(struct dbs_data *dbs_data, bool notify) static void od_exit(struct dbs_data *dbs_data, bool notify) { - kfree(dbs_data->tuners); + struct od_dbs_tuners *tuners = dbs_data->tuners; + + cancel_work_sync(>work); + kfree(tuners); } define_get_cpu_dbs_routines(od_cpu_dbs_info); No no no no! Let's not open up this can of worms of queuing up the work to handle a write to a sysfs file. It *MIGHT* work for this specific tunable (I haven't bothered to
Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
On 02/04/2016 09:43 AM, Saravana Kannan wrote: On 02/04/2016 03:09 AM, Viresh Kumar wrote: On 04-02-16, 00:50, Rafael J. Wysocki wrote: This is exactly right. We've avoided one deadlock only to trip into another one. This happens because update_sampling_rate() acquires od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by cpufreq_governor_dbs(). Worse yet, a deadlock can still happen without (the new) dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if update_sampling_rate() runs in parallel with cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins the race. It looks like we need to drop the governor mutex before putting the kobject in cpufreq_governor_exit(). I have tried to explore all possible ways of fixing this, and every other way looked to be racy in some way. Does anyone else have a better idea (untested): -8<- Subject: [PATCH] cpufreq: ondemand: Shoot update_sampling_rate with a separate work Signed-off-by: Viresh Kumar <viresh.ku...@linaro.org> --- drivers/cpufreq/cpufreq_governor.h | 2 ++ drivers/cpufreq/cpufreq_ondemand.c | 39 +- 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 7bed63e14e7d..97e604356b20 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -141,6 +141,8 @@ struct od_dbs_tuners { unsigned int powersave_bias; unsigned int io_is_busy; unsigned int min_sampling_rate; +struct work_struct work; +struct dbs_data *dbs_data; }; struct cs_dbs_tuners { diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 82ed490f7de0..93ad7a226aee 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -242,20 +242,27 @@ static struct common_dbs_data od_dbs_cdata; * reducing the sampling rate, we need to make the new value effective * immediately. */ -static void update_sampling_rate(struct dbs_data *dbs_data, -unsigned int new_rate) +static void update_sampling_rate(struct work_struct *work) { -struct od_dbs_tuners *od_tuners = dbs_data->tuners; +struct od_dbs_tuners *od_tuners = container_of(work, struct + od_dbs_tuners, work); +unsigned int new_rate = od_tuners->sampling_rate; +struct dbs_data *dbs_data = od_tuners->dbs_data; struct cpumask cpumask; int cpu; -od_tuners->sampling_rate = new_rate = max(new_rate, -od_tuners->min_sampling_rate); - /* * Lock governor so that governor start/stop can't execute in parallel. + * + * We can't do a regular mutex_lock() here, as that may deadlock against + * another thread performing CPUFREQ_GOV_POLICY_EXIT event on the + * governor, which might have already taken od_dbs_cdata.mutex and is + * waiting for this work to finish. */ -mutex_lock(_dbs_cdata.mutex); +if (!mutex_trylock(_dbs_cdata.mutex)) { +queue_work(system_wq, _tuners->work); +return; +} cpumask_copy(, cpu_online_mask); @@ -311,13 +318,22 @@ static void update_sampling_rate(struct dbs_data *dbs_data, static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf, size_t count) { +struct od_dbs_tuners *od_tuners = dbs_data->tuners; unsigned int input; int ret; ret = sscanf(buf, "%u", ); if (ret != 1) return -EINVAL; -update_sampling_rate(dbs_data, input); +od_tuners->sampling_rate = max(input, od_tuners->min_sampling_rate); + +/* + * update_sampling_rate() requires to hold od_dbs_cdata.mutex, but we + * can't take that from this thread, otherwise it results in ABBA + * lockdep between s_active and od_dbs_cdata.mutex locks. + */ +queue_work(system_wq, _tuners->work); + return count; } @@ -501,6 +517,8 @@ static int od_init(struct dbs_data *dbs_data, bool notify) tuners->ignore_nice_load = 0; tuners->powersave_bias = default_powersave_bias; tuners->io_is_busy = should_io_be_busy(); +INIT_WORK(>work, update_sampling_rate); +tuners->dbs_data = dbs_data; dbs_data->tuners = tuners; return 0; @@ -508,7 +526,10 @@ static int od_init(struct dbs_data *dbs_data, bool notify) static void od_exit(struct dbs_data *dbs_data, bool notify) { -kfree(dbs_data->tuners); +struct od_dbs_tuners *tuners = dbs_data->tuners; + +cancel_work_sync(>work); +kfree(tuners); } define_get_cpu_dbs_routines(od_cpu_dbs_info); No no no no! Let's not open up this can of worms of queuing up the work to handle a write to a sysfs file. It *MIGHT* work for this specific tunable (I haven't bothered to analyze), but this makes it impossible to r
Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
On 02/04/2016 03:09 AM, Viresh Kumar wrote: On 04-02-16, 00:50, Rafael J. Wysocki wrote: This is exactly right. We've avoided one deadlock only to trip into another one. This happens because update_sampling_rate() acquires od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by cpufreq_governor_dbs(). Worse yet, a deadlock can still happen without (the new) dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if update_sampling_rate() runs in parallel with cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins the race. It looks like we need to drop the governor mutex before putting the kobject in cpufreq_governor_exit(). I have tried to explore all possible ways of fixing this, and every other way looked to be racy in some way. Does anyone else have a better idea (untested): -8<- Subject: [PATCH] cpufreq: ondemand: Shoot update_sampling_rate with a separate work Signed-off-by: Viresh Kumar--- drivers/cpufreq/cpufreq_governor.h | 2 ++ drivers/cpufreq/cpufreq_ondemand.c | 39 +- 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index 7bed63e14e7d..97e604356b20 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -141,6 +141,8 @@ struct od_dbs_tuners { unsigned int powersave_bias; unsigned int io_is_busy; unsigned int min_sampling_rate; + struct work_struct work; + struct dbs_data *dbs_data; }; struct cs_dbs_tuners { diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c index 82ed490f7de0..93ad7a226aee 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -242,20 +242,27 @@ static struct common_dbs_data od_dbs_cdata; * reducing the sampling rate, we need to make the new value effective * immediately. */ -static void update_sampling_rate(struct dbs_data *dbs_data, - unsigned int new_rate) +static void update_sampling_rate(struct work_struct *work) { - struct od_dbs_tuners *od_tuners = dbs_data->tuners; + struct od_dbs_tuners *od_tuners = container_of(work, struct + od_dbs_tuners, work); + unsigned int new_rate = od_tuners->sampling_rate; + struct dbs_data *dbs_data = od_tuners->dbs_data; struct cpumask cpumask; int cpu; - od_tuners->sampling_rate = new_rate = max(new_rate, - od_tuners->min_sampling_rate); - /* * Lock governor so that governor start/stop can't execute in parallel. +* +* We can't do a regular mutex_lock() here, as that may deadlock against +* another thread performing CPUFREQ_GOV_POLICY_EXIT event on the +* governor, which might have already taken od_dbs_cdata.mutex and is +* waiting for this work to finish. */ - mutex_lock(_dbs_cdata.mutex); + if (!mutex_trylock(_dbs_cdata.mutex)) { + queue_work(system_wq, _tuners->work); + return; + } cpumask_copy(, cpu_online_mask); @@ -311,13 +318,22 @@ static void update_sampling_rate(struct dbs_data *dbs_data, static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf, size_t count) { + struct od_dbs_tuners *od_tuners = dbs_data->tuners; unsigned int input; int ret; ret = sscanf(buf, "%u", ); if (ret != 1) return -EINVAL; - update_sampling_rate(dbs_data, input); + od_tuners->sampling_rate = max(input, od_tuners->min_sampling_rate); + + /* +* update_sampling_rate() requires to hold od_dbs_cdata.mutex, but we +* can't take that from this thread, otherwise it results in ABBA +* lockdep between s_active and od_dbs_cdata.mutex locks. +*/ + queue_work(system_wq, _tuners->work); + return count; } @@ -501,6 +517,8 @@ static int od_init(struct dbs_data *dbs_data, bool notify) tuners->ignore_nice_load = 0; tuners->powersave_bias = default_powersave_bias; tuners->io_is_busy = should_io_be_busy(); + INIT_WORK(>work, update_sampling_rate); + tuners->dbs_data = dbs_data; dbs_data->tuners = tuners; return 0; @@ -508,7 +526,10 @@ static int od_init(struct dbs_data *dbs_data, bool notify) static void od_exit(struct dbs_data *dbs_data, bool notify) { - kfree(dbs_data->tuners); + struct od_dbs_tuners *tuners = dbs_data->tuners; + + cancel_work_sync(>work); + kfree(tuners); } define_get_cpu_dbs_routines(od_cpu_dbs_info); No no no no! Let's not open up this can of worms of queuing up the work to handle a write to a sysfs file. It *MIGHT* work for this specific tunable
Re: [PATCH 7/11] cpufreq: governor: Rework cpufreq_governor_dbs()
ic struct dbs_governor od_dbs_gov = { .gov = { .name = "ondemand", - .governor = od_cpufreq_governor_dbs, + .governor = cpufreq_governor_dbs, .max_transition_latency = TRANSITION_LATENCY_LIMIT, .owner = THIS_MODULE, }, @@ -563,12 +560,6 @@ static struct dbs_governor od_dbs_gov = #define CPU_FREQ_GOV_ONDEMAND (_dbs_gov.gov) -static int od_cpufreq_governor_dbs(struct cpufreq_policy *policy, - unsigned int event) -{ - return cpufreq_governor_dbs(policy, _dbs_gov, event); -} - static void od_set_powersave_bias(unsigned int powersave_bias) { struct cpufreq_policy *policy; -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Nice! Acked-by: Saravana Kannan -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 5/11] cpufreq: governor: Put governor structure into common_dbs_data
cpu_get_raw(freq->cpu); - - if (!policy) - return 0; - - /* policy isn't governed by conservative governor */ - if (policy->governor != _gov_conservative) - return 0; - - /* -* we only care if our internally tracked freq moves outside the 'valid' -* ranges of frequency available to us otherwise we do not change it - */ - if (dbs_info->requested_freq > policy->max - || dbs_info->requested_freq < policy->min) - dbs_info->requested_freq = freq->new; - - return 0; -} + void *data); static struct notifier_block cs_cpufreq_notifier_block = { .notifier_call = dbs_cpufreq_notifier, @@ -358,7 +325,16 @@ static void cs_exit(struct dbs_data *dbs define_get_cpu_dbs_routines(cs_cpu_dbs_info); +static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, + unsigned int event); + static struct common_dbs_data cs_dbs_cdata = { + .gov = { + .name = "conservative", + .governor = cs_cpufreq_governor_dbs, + .max_transition_latency = TRANSITION_LATENCY_LIMIT, + .owner = THIS_MODULE, + }, .governor = GOV_CONSERVATIVE, .attr_group_gov_sys = _attr_group_gov_sys, .attr_group_gov_pol = _attr_group_gov_pol, @@ -370,20 +346,48 @@ static struct common_dbs_data cs_dbs_cda .exit = cs_exit, }; +#define CPU_FREQ_GOV_CONSERVATIVE (_dbs_cdata.gov) + static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, unsigned int event) { return cpufreq_governor_dbs(policy, _dbs_cdata, event); } +static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, + void *data) +{ + struct cpufreq_freqs *freq = data; + struct cs_cpu_dbs_info_s *dbs_info = + _cpu(cs_cpu_dbs_info, freq->cpu); + struct cpufreq_policy *policy = cpufreq_cpu_get_raw(freq->cpu); + + if (!policy) + return 0; + + /* policy isn't governed by conservative governor */ + if (policy->governor != CPU_FREQ_GOV_CONSERVATIVE) + return 0; + + /* +* we only care if our internally tracked freq moves outside the 'valid' +* ranges of frequency available to us otherwise we do not change it + */ + if (dbs_info->requested_freq > policy->max + || dbs_info->requested_freq < policy->min) + dbs_info->requested_freq = freq->new; + + return 0; +} + static int __init cpufreq_gov_dbs_init(void) { - return cpufreq_register_governor(_gov_conservative); + return cpufreq_register_governor(CPU_FREQ_GOV_CONSERVATIVE); } static void __exit cpufreq_gov_dbs_exit(void) { - cpufreq_unregister_governor(_gov_conservative); + cpufreq_unregister_governor(CPU_FREQ_GOV_CONSERVATIVE); } MODULE_AUTHOR("Alexander Clouter "); @@ -395,7 +399,7 @@ MODULE_LICENSE("GPL"); #ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE struct cpufreq_governor *cpufreq_default_governor(void) { - return _gov_conservative; + return CPU_FREQ_GOV_CONSERVATIVE; } fs_initcall(cpufreq_gov_dbs_init); I'm not sold on the macros/#defines for the , but not a strong opinion. Acked-by: Saravana Kannan -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 3/11] cpufreq: governor: Use common global_dbs_data pointer
On 02/03/2016 05:11 PM, Saravana Kannan wrote: On 02/03/2016 03:22 PM, Rafael J. Wysocki wrote: From: Rafael J. Wysocki If the ondemand and conservative governors cannot use per-policy tunables (CPUFREQ_HAVE_GOVERNOR_PER_POLICY is not set in the cpufreq driver), all policy objects point to the same single dbs_data object. Additionally, that object is pointed to by a global pointer hidden in the governor's data structures. There is no reason for that pointer to be buried in those data structures, though, so make it explicitly global. Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq_governor.c | 20 ++-- drivers/cpufreq/cpufreq_governor.h | 20 ++-- 2 files changed, 20 insertions(+), 20 deletions(-) Index: linux-pm/drivers/cpufreq/cpufreq_governor.h === --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h +++ linux-pm/drivers/cpufreq/cpufreq_governor.h @@ -78,7 +78,7 @@ __ATTR(_name, 0644, show_##_name##_gov_p static ssize_t show_##file_name##_gov_sys\ (struct kobject *kobj, struct attribute *attr, char *buf)\ {\ -struct _gov##_dbs_tuners *tuners = _gov##_dbs_cdata.gdbs_data->tuners; \ +struct _gov##_dbs_tuners *tuners = global_dbs_data->tuners; \ return sprintf(buf, "%u\n", tuners->file_name);\ }\ \ @@ -94,7 +94,7 @@ static ssize_t show_##file_name##_gov_po static ssize_t store_##file_name##_gov_sys\ (struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) \ {\ -struct dbs_data *dbs_data = _gov##_dbs_cdata.gdbs_data;\ +struct dbs_data *dbs_data = global_dbs_data;\ return store_##file_name(dbs_data, buf, count);\ }\ \ @@ -201,19 +201,14 @@ struct cs_dbs_tuners { /* Common Governor data across policies */ struct dbs_data; struct common_dbs_data { -/* Common across governors */ +struct cpufreq_governor gov; + Actually, this line is completely unrelated to this patch. Should go on Patch 5? Cautiously Acked-by: Saravana Kannan Good call on the "cautiously" I guess! -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 3/11] cpufreq: governor: Use common global_dbs_data pointer
On 02/03/2016 05:25 PM, Rafael J. Wysocki wrote: On Thu, Feb 4, 2016 at 2:11 AM, Saravana Kannan wrote: On 02/03/2016 03:22 PM, Rafael J. Wysocki wrote: From: Rafael J. Wysocki If the ondemand and conservative governors cannot use per-policy tunables (CPUFREQ_HAVE_GOVERNOR_PER_POLICY is not set in the cpufreq driver), all policy objects point to the same single dbs_data object. Additionally, that object is pointed to by a global pointer hidden in the governor's data structures. There is no reason for that pointer to be buried in those data structures, though, so make it explicitly global. Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq_governor.c | 20 ++-- drivers/cpufreq/cpufreq_governor.h | 20 ++-- 2 files changed, 20 insertions(+), 20 deletions(-) Index: linux-pm/drivers/cpufreq/cpufreq_governor.h === --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h +++ linux-pm/drivers/cpufreq/cpufreq_governor.h @@ -78,7 +78,7 @@ __ATTR(_name, 0644, show_##_name##_gov_p static ssize_t show_##file_name##_gov_sys \ (struct kobject *kobj, struct attribute *attr, char *buf) \ { \ - struct _gov##_dbs_tuners *tuners = _gov##_dbs_cdata.gdbs_data->tuners; \ + struct _gov##_dbs_tuners *tuners = global_dbs_data->tuners; \ return sprintf(buf, "%u\n", tuners->file_name); \ } \ \ @@ -94,7 +94,7 @@ static ssize_t show_##file_name##_gov_po static ssize_t store_##file_name##_gov_sys\ (struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) \ { \ - struct dbs_data *dbs_data = _gov##_dbs_cdata.gdbs_data; \ + struct dbs_data *dbs_data = global_dbs_data;\ return store_##file_name(dbs_data, buf, count); \ } \ \ @@ -201,19 +201,14 @@ struct cs_dbs_tuners { /* Common Governor data across policies */ struct dbs_data; struct common_dbs_data { - /* Common across governors */ + struct cpufreq_governor gov; + #define GOV_ONDEMAND0 #define GOV_CONSERVATIVE1 int governor; struct attribute_group *attr_group_gov_sys; /* one governor - system */ struct attribute_group *attr_group_gov_pol; /* one governor - policy */ - /* -* Common data for platforms that don't set -* CPUFREQ_HAVE_GOVERNOR_PER_POLICY -*/ - struct dbs_data *gdbs_data; - struct cpu_dbs_info *(*get_cpu_cdbs)(int cpu); void *(*get_cpu_dbs_info_s)(int cpu); unsigned int (*gov_dbs_timer)(struct cpufreq_policy *policy); @@ -233,6 +228,11 @@ struct dbs_data { void *tuners; }; +/* + * Common governor data for platforms without CPUFREQ_HAVE_GOVERNOR_PER_POLICY. + */ +extern struct dbs_data *global_dbs_data; + /* Governor specific ops, will be passed to dbs_data->gov_ops */ struct od_ops { void (*powersave_bias_init_cpu)(int cpu); @@ -256,7 +256,7 @@ static inline int delay_for_sampling_rat static ssize_t show_sampling_rate_min_gov_sys \ (struct kobject *kobj, struct attribute *attr, char *buf) \ { \ - struct dbs_data *dbs_data = _gov##_dbs_cdata.gdbs_data; \ + struct dbs_data *dbs_data = global_dbs_data;\ return sprintf(buf, "%u\n", dbs_data->min_sampling_rate); \ } \ \ Index: linux-pm/drivers/cpufreq/cpufreq_governor.c === --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c +++ linux-pm/drivers/cpufreq/cpufreq_governor.c @@ -22,6 +22,9 @@ #include "cpufreq_governor.h" +struct dbs_data *global_dbs_data; +EXPORT_SYMBOL_GPL(global_dbs_data); + DEFINE_MUTEX(dbs_data_mutex); EXPORT_SYMBOL_GPL(dbs_data_mutex); @@ -377,22 +380,19 @@ static int cpufreq_governor_init(struct latency * LATENCY_MULTIPLIER)); if (!have_governor_per_policy()) - cdata->gdbs_data = dbs_data; + global_dbs_data = dbs_data; policy->governor_data = dbs_data; ret =
Re: [PATCH 4/11] cpufreq: governor: Avoid passing dbs_data pointers around unnecessarily
ck governor to block concurrent initialization of governor */ mutex_lock(_data_mutex); - if (have_governor_per_policy()) - dbs_data = policy->governor_data; - else - dbs_data = global_dbs_data; - - if (!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)) { - ret = -EINVAL; - goto unlock; + if (event == CPUFREQ_GOV_POLICY_INIT) { + ret = cpufreq_governor_init(policy, cdata); + } else if (policy->governor_data) { + switch (event) { + case CPUFREQ_GOV_POLICY_EXIT: + ret = cpufreq_governor_exit(policy); + break; + case CPUFREQ_GOV_START: + ret = cpufreq_governor_start(policy); + break; + case CPUFREQ_GOV_STOP: + ret = cpufreq_governor_stop(policy); + break; + case CPUFREQ_GOV_LIMITS: + ret = cpufreq_governor_limits(policy); + break; + } } - switch (event) { - case CPUFREQ_GOV_POLICY_INIT: - ret = cpufreq_governor_init(policy, dbs_data, cdata); - break; - case CPUFREQ_GOV_POLICY_EXIT: - ret = cpufreq_governor_exit(policy, dbs_data); - break; - case CPUFREQ_GOV_START: - ret = cpufreq_governor_start(policy, dbs_data); - break; - case CPUFREQ_GOV_STOP: - ret = cpufreq_governor_stop(policy, dbs_data); - break; - case CPUFREQ_GOV_LIMITS: - ret = cpufreq_governor_limits(policy, dbs_data); - break; - default: - ret = -EINVAL; - } - -unlock: mutex_unlock(_data_mutex); - Po-tay-to, Po-tah-to. return ret; } EXPORT_SYMBOL_GPL(cpufreq_governor_dbs); Agree with the general idea of the patch though. Conditional on the comment above being resolve amongst the others: Acked-by: Saravana Kannan -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 3/11] cpufreq: governor: Use common global_dbs_data pointer
set_gdbs_data; - - return 0; + if (!ret) + return 0; I think the previous method of a handling the error is easier to read and more in line with the typical kernel coding style. The successful path ends in an unconditional return statement and the error paths are handled with a goto. This also doesn't seem relevant to what the patch is trying to do. So, I'd prefer that it be left as is. -reset_gdbs_data: policy->governor_data = NULL; - if (!have_governor_per_policy()) - cdata->gdbs_data = NULL; + global_dbs_data = NULL; + cdata->exit(dbs_data, !policy->governor->initialized); free_common_dbs_info: free_common_dbs_info(policy, cdata); @@ -418,7 +418,7 @@ static int cpufreq_governor_exit(struct policy->governor_data = NULL; if (!have_governor_per_policy()) - cdata->gdbs_data = NULL; + global_dbs_data = NULL; cdata->exit(dbs_data, policy->governor->initialized == 1); kfree(dbs_data); @@ -550,7 +550,7 @@ int cpufreq_governor_dbs(struct cpufreq_ if (have_governor_per_policy()) dbs_data = policy->governor_data; else - dbs_data = cdata->gdbs_data; + dbs_data = global_dbs_data; if (!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)) { ret = -EINVAL; -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html If the minor comment is addressed, this looks okay to me. Cautiously Acked-by: Saravana Kannan -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/11] cpufreq: governor: Use common mutex for dbs_data protection
On 02/03/2016 03:16 PM, Rafael J. Wysocki wrote: From: Rafael J. Wysocki Every governor relying on the common code in cpufreq_governor.c has to provide its own mutex in struct common_dbs_data. However, those mutexes are never used at the same time and doing it this way makes it rather difficult to follow the code. Moreover, if two governor modules are loaded we end up with two mutexes used for the same purpose one at a time. Introduce a single common mutex for that instead and drop the mutex field from struct common_dbs_data. That at least will ensure that the mutex is always present and initialized regardless of what the particular governors do. Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq_conservative.c |1 - drivers/cpufreq/cpufreq_governor.c |7 +-- drivers/cpufreq/cpufreq_governor.h |6 +- drivers/cpufreq/cpufreq_ondemand.c |5 ++--- 4 files changed, 8 insertions(+), 11 deletions(-) Index: linux-pm/drivers/cpufreq/cpufreq_governor.c === --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c +++ linux-pm/drivers/cpufreq/cpufreq_governor.c @@ -22,6 +22,9 @@ #include "cpufreq_governor.h" +DEFINE_MUTEX(dbs_data_mutex); +EXPORT_SYMBOL_GPL(dbs_data_mutex); + static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data) { if (have_governor_per_policy()) @@ -542,7 +545,7 @@ int cpufreq_governor_dbs(struct cpufreq_ int ret; /* Lock governor to block concurrent initialization of governor */ - mutex_lock(>mutex); + mutex_lock(_data_mutex); if (have_governor_per_policy()) dbs_data = policy->governor_data; @@ -575,7 +578,7 @@ int cpufreq_governor_dbs(struct cpufreq_ } unlock: - mutex_unlock(>mutex); + mutex_unlock(_data_mutex); return ret; } Index: linux-pm/drivers/cpufreq/cpufreq_governor.h === --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h +++ linux-pm/drivers/cpufreq/cpufreq_governor.h @@ -223,11 +223,6 @@ struct common_dbs_data { /* Governor specific ops, see below */ void *gov_ops; - - /* -* Protects governor's data (struct dbs_data and struct common_dbs_data) -*/ - struct mutex mutex; }; /* Governor Per policy data */ @@ -272,6 +267,7 @@ static ssize_t show_sampling_rate_min_go return sprintf(buf, "%u\n", dbs_data->min_sampling_rate); \ } +extern struct mutex dbs_data_mutex; extern struct mutex cpufreq_governor_lock; void gov_set_update_util(struct cpu_common_dbs_info *shared, unsigned int delay_us); Index: linux-pm/drivers/cpufreq/cpufreq_ondemand.c === --- linux-pm.orig/drivers/cpufreq/cpufreq_ondemand.c +++ linux-pm/drivers/cpufreq/cpufreq_ondemand.c @@ -251,7 +251,7 @@ static void update_sampling_rate(struct /* * Lock governor so that governor start/stop can't execute in parallel. */ - mutex_lock(_dbs_cdata.mutex); + mutex_lock(_data_mutex); cpumask_copy(, cpu_online_mask); @@ -304,7 +304,7 @@ static void update_sampling_rate(struct } } - mutex_unlock(_dbs_cdata.mutex); + mutex_unlock(_data_mutex); } static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf, @@ -550,7 +550,6 @@ static struct common_dbs_data od_dbs_cda .gov_ops = _ops, .init = od_init, .exit = od_exit, - .mutex = __MUTEX_INITIALIZER(od_dbs_cdata.mutex), }; static int od_cpufreq_governor_dbs(struct cpufreq_policy *policy, Index: linux-pm/drivers/cpufreq/cpufreq_conservative.c === --- linux-pm.orig/drivers/cpufreq/cpufreq_conservative.c +++ linux-pm/drivers/cpufreq/cpufreq_conservative.c @@ -368,7 +368,6 @@ static struct common_dbs_data cs_dbs_cda .gov_check_cpu = cs_check_cpu, .init = cs_init, .exit = cs_exit, - .mutex = __MUTEX_INITIALIZER(cs_dbs_cdata.mutex), }; static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Acked-by: Saravana Kannan -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 1/11] cpufreq: Clean up default and fallback governor setup
_CPU_FREQ_DEFAULT_GOV_POWERSAVE -static -#endif -struct cpufreq_governor cpufreq_gov_powersave = { +static struct cpufreq_governor cpufreq_gov_powersave = { .name = "powersave", .governor = cpufreq_governor_powersave, .owner = THIS_MODULE, @@ -57,6 +54,11 @@ MODULE_DESCRIPTION("CPUfreq policy gover MODULE_LICENSE("GPL"); #ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE +struct cpufreq_governor *cpufreq_default_governor(void) +{ + return _gov_powersave; +} + fs_initcall(cpufreq_gov_powersave_init); #else module_init(cpufreq_gov_powersave_init); Index: linux-pm/drivers/cpufreq/cpufreq_userspace.c === --- linux-pm.orig/drivers/cpufreq/cpufreq_userspace.c +++ linux-pm/drivers/cpufreq/cpufreq_userspace.c @@ -89,10 +89,7 @@ static int cpufreq_governor_userspace(st return rc; } -#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE -static -#endif -struct cpufreq_governor cpufreq_gov_userspace = { +static struct cpufreq_governor cpufreq_gov_userspace = { .name = "userspace", .governor = cpufreq_governor_userspace, .store_setspeed = cpufreq_set, @@ -116,6 +113,11 @@ MODULE_DESCRIPTION("CPUfreq policy gover MODULE_LICENSE("GPL"); #ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE +struct cpufreq_governor *cpufreq_default_governor(void) +{ + return _gov_userspace; +} + fs_initcall(cpufreq_gov_userspace_init); #else module_init(cpufreq_gov_userspace_init); -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Ah, that's so much more cleaner! Acked-by: Saravana Kannan -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2 5/7] cpufreq: Merge cpufreq_offline_prepare/finish routines
On 02/03/2016 06:02 AM, Viresh Kumar wrote: The offline routine was separated into two halves earlier by 'commit 1aee40ac9c86 ("cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock");. And the reasons cited were, race issues between accessing policy's sysfs files and policy kobject's cleanup. That race isn't valid anymore, as we don't remove the policy & its kobject completely on hotplugs, but do that from ->remove() callback of subsys framework. These two routines can be merged back now. This is a preparatory step for the next patch, that will enforce policy->rwsem lock around __cpufreq_governor() routines STOP/EXIT sequence. Is this stale text? Seems like this is now done in the *previous* patch? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 10:54 PM, Viresh Kumar wrote: On 02-02-16, 17:32, Saravana Kannan wrote: But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c We can move the common part to cpufreq core and not make sched-dvfs reuse cpufreq_governor.c Let's do this please. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 10:57 PM, Viresh Kumar wrote: On 02-02-16, 20:03, Saravana Kannan wrote: This is the hotplug case I mentioned. The sysfs file removals will happen only for the last CPU in that policy (we thankfully optimized that part last year). We also know that multiple CPUs can't be hotplugged at the same time. So, in the start of cpufreq_offline_prepare, we just need to check if this is the last CPU in the policy and if that's the case, do the gov sysfs remove and then grab the policy lock and do all our crap. If a read is going on, that's going to finish before the sysfs attr remove can go ahead and it can grab the policy lock if it needs to and that still won't cause a deadlock because we haven't yet grabbed the policy lock in cpufreq_offline_prepare(). If the read comes after the sysfs remove, then the read is obviously going to fail (we can depend on the sysfs framework on doing its job there). IMHO, these are all dirty hacks we should stay away from. Adding such hunks in code is considered a band-aid kind of solution and hurts readability badly. The new solution (new governor show/store) implement this in a very clean and proper way I feel.. Others are free to disagree though :) I think it looks clean since we haven't sorted out the race conditions that Juri pointed out. So, it's early to call this series clean :) Also, I don't see it as a dirty hack at all. What's so hacky about it? We are just identifying conditions when we'll have to remove the sysfs files and removing them before grabbing the policy lock. The unlock/lock that we have now is what is a dirty hack. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2 5/7] cpufreq: Merge cpufreq_offline_prepare/finish routines
On 02/03/2016 06:02 AM, Viresh Kumar wrote: The offline routine was separated into two halves earlier by 'commit 1aee40ac9c86 ("cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock");. And the reasons cited were, race issues between accessing policy's sysfs files and policy kobject's cleanup. That race isn't valid anymore, as we don't remove the policy & its kobject completely on hotplugs, but do that from ->remove() callback of subsys framework. These two routines can be merged back now. This is a preparatory step for the next patch, that will enforce policy->rwsem lock around __cpufreq_governor() routines STOP/EXIT sequence. Is this stale text? Seems like this is now done in the *previous* patch? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 1/11] cpufreq: Clean up default and fallback governor setup
@@ static int cpufreq_governor_powersave(st return 0; } -#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE -static -#endif -struct cpufreq_governor cpufreq_gov_powersave = { +static struct cpufreq_governor cpufreq_gov_powersave = { .name = "powersave", .governor = cpufreq_governor_powersave, .owner = THIS_MODULE, @@ -57,6 +54,11 @@ MODULE_DESCRIPTION("CPUfreq policy gover MODULE_LICENSE("GPL"); #ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE +struct cpufreq_governor *cpufreq_default_governor(void) +{ + return _gov_powersave; +} + fs_initcall(cpufreq_gov_powersave_init); #else module_init(cpufreq_gov_powersave_init); Index: linux-pm/drivers/cpufreq/cpufreq_userspace.c === --- linux-pm.orig/drivers/cpufreq/cpufreq_userspace.c +++ linux-pm/drivers/cpufreq/cpufreq_userspace.c @@ -89,10 +89,7 @@ static int cpufreq_governor_userspace(st return rc; } -#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE -static -#endif -struct cpufreq_governor cpufreq_gov_userspace = { +static struct cpufreq_governor cpufreq_gov_userspace = { .name = "userspace", .governor = cpufreq_governor_userspace, .store_setspeed = cpufreq_set, @@ -116,6 +113,11 @@ MODULE_DESCRIPTION("CPUfreq policy gover MODULE_LICENSE("GPL"); #ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE +struct cpufreq_governor *cpufreq_default_governor(void) +{ + return _gov_userspace; +} + fs_initcall(cpufreq_gov_userspace_init); #else module_init(cpufreq_gov_userspace_init); -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Ah, that's so much more cleaner! Acked-by: Saravana Kannan <skan...@codeaurora.org> -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/11] cpufreq: governor: Use common mutex for dbs_data protection
On 02/03/2016 03:16 PM, Rafael J. Wysocki wrote: From: Rafael J. Wysocki <rafael.j.wyso...@intel.com> Every governor relying on the common code in cpufreq_governor.c has to provide its own mutex in struct common_dbs_data. However, those mutexes are never used at the same time and doing it this way makes it rather difficult to follow the code. Moreover, if two governor modules are loaded we end up with two mutexes used for the same purpose one at a time. Introduce a single common mutex for that instead and drop the mutex field from struct common_dbs_data. That at least will ensure that the mutex is always present and initialized regardless of what the particular governors do. Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com> --- drivers/cpufreq/cpufreq_conservative.c |1 - drivers/cpufreq/cpufreq_governor.c |7 +-- drivers/cpufreq/cpufreq_governor.h |6 +- drivers/cpufreq/cpufreq_ondemand.c |5 ++--- 4 files changed, 8 insertions(+), 11 deletions(-) Index: linux-pm/drivers/cpufreq/cpufreq_governor.c === --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c +++ linux-pm/drivers/cpufreq/cpufreq_governor.c @@ -22,6 +22,9 @@ #include "cpufreq_governor.h" +DEFINE_MUTEX(dbs_data_mutex); +EXPORT_SYMBOL_GPL(dbs_data_mutex); + static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data) { if (have_governor_per_policy()) @@ -542,7 +545,7 @@ int cpufreq_governor_dbs(struct cpufreq_ int ret; /* Lock governor to block concurrent initialization of governor */ - mutex_lock(>mutex); + mutex_lock(_data_mutex); if (have_governor_per_policy()) dbs_data = policy->governor_data; @@ -575,7 +578,7 @@ int cpufreq_governor_dbs(struct cpufreq_ } unlock: - mutex_unlock(>mutex); + mutex_unlock(_data_mutex); return ret; } Index: linux-pm/drivers/cpufreq/cpufreq_governor.h === --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h +++ linux-pm/drivers/cpufreq/cpufreq_governor.h @@ -223,11 +223,6 @@ struct common_dbs_data { /* Governor specific ops, see below */ void *gov_ops; - - /* -* Protects governor's data (struct dbs_data and struct common_dbs_data) -*/ - struct mutex mutex; }; /* Governor Per policy data */ @@ -272,6 +267,7 @@ static ssize_t show_sampling_rate_min_go return sprintf(buf, "%u\n", dbs_data->min_sampling_rate); \ } +extern struct mutex dbs_data_mutex; extern struct mutex cpufreq_governor_lock; void gov_set_update_util(struct cpu_common_dbs_info *shared, unsigned int delay_us); Index: linux-pm/drivers/cpufreq/cpufreq_ondemand.c === --- linux-pm.orig/drivers/cpufreq/cpufreq_ondemand.c +++ linux-pm/drivers/cpufreq/cpufreq_ondemand.c @@ -251,7 +251,7 @@ static void update_sampling_rate(struct /* * Lock governor so that governor start/stop can't execute in parallel. */ - mutex_lock(_dbs_cdata.mutex); + mutex_lock(_data_mutex); cpumask_copy(, cpu_online_mask); @@ -304,7 +304,7 @@ static void update_sampling_rate(struct } } - mutex_unlock(_dbs_cdata.mutex); + mutex_unlock(_data_mutex); } static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf, @@ -550,7 +550,6 @@ static struct common_dbs_data od_dbs_cda .gov_ops = _ops, .init = od_init, .exit = od_exit, - .mutex = __MUTEX_INITIALIZER(od_dbs_cdata.mutex), }; static int od_cpufreq_governor_dbs(struct cpufreq_policy *policy, Index: linux-pm/drivers/cpufreq/cpufreq_conservative.c === --- linux-pm.orig/drivers/cpufreq/cpufreq_conservative.c +++ linux-pm/drivers/cpufreq/cpufreq_conservative.c @@ -368,7 +368,6 @@ static struct common_dbs_data cs_dbs_cda .gov_check_cpu = cs_check_cpu, .init = cs_init, .exit = cs_exit, - .mutex = __MUTEX_INITIALIZER(cs_dbs_cdata.mutex), }; static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Acked-by: Saravana Kannan <skan...@codeaurora.org> -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 4/11] cpufreq: governor: Avoid passing dbs_data pointers around unnecessarily
ata *dbs_data; - int ret; + int ret = -EINVAL; /* Lock governor to block concurrent initialization of governor */ mutex_lock(_data_mutex); - if (have_governor_per_policy()) - dbs_data = policy->governor_data; - else - dbs_data = global_dbs_data; - - if (!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)) { - ret = -EINVAL; - goto unlock; + if (event == CPUFREQ_GOV_POLICY_INIT) { + ret = cpufreq_governor_init(policy, cdata); + } else if (policy->governor_data) { + switch (event) { + case CPUFREQ_GOV_POLICY_EXIT: + ret = cpufreq_governor_exit(policy); + break; + case CPUFREQ_GOV_START: + ret = cpufreq_governor_start(policy); + break; + case CPUFREQ_GOV_STOP: + ret = cpufreq_governor_stop(policy); + break; + case CPUFREQ_GOV_LIMITS: + ret = cpufreq_governor_limits(policy); + break; + } } - switch (event) { - case CPUFREQ_GOV_POLICY_INIT: - ret = cpufreq_governor_init(policy, dbs_data, cdata); - break; - case CPUFREQ_GOV_POLICY_EXIT: - ret = cpufreq_governor_exit(policy, dbs_data); - break; - case CPUFREQ_GOV_START: - ret = cpufreq_governor_start(policy, dbs_data); - break; - case CPUFREQ_GOV_STOP: - ret = cpufreq_governor_stop(policy, dbs_data); - break; - case CPUFREQ_GOV_LIMITS: - ret = cpufreq_governor_limits(policy, dbs_data); - break; - default: - ret = -EINVAL; - } - -unlock: mutex_unlock(_data_mutex); - Po-tay-to, Po-tah-to. return ret; } EXPORT_SYMBOL_GPL(cpufreq_governor_dbs); Agree with the general idea of the patch though. Conditional on the comment above being resolve amongst the others: Acked-by: Saravana Kannan <skan...@codeaurora.org> -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 3/11] cpufreq: governor: Use common global_dbs_data pointer
On 02/03/2016 05:11 PM, Saravana Kannan wrote: On 02/03/2016 03:22 PM, Rafael J. Wysocki wrote: From: Rafael J. Wysocki <rafael.j.wyso...@intel.com> If the ondemand and conservative governors cannot use per-policy tunables (CPUFREQ_HAVE_GOVERNOR_PER_POLICY is not set in the cpufreq driver), all policy objects point to the same single dbs_data object. Additionally, that object is pointed to by a global pointer hidden in the governor's data structures. There is no reason for that pointer to be buried in those data structures, though, so make it explicitly global. Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com> --- drivers/cpufreq/cpufreq_governor.c | 20 ++-- drivers/cpufreq/cpufreq_governor.h | 20 ++-- 2 files changed, 20 insertions(+), 20 deletions(-) Index: linux-pm/drivers/cpufreq/cpufreq_governor.h === --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h +++ linux-pm/drivers/cpufreq/cpufreq_governor.h @@ -78,7 +78,7 @@ __ATTR(_name, 0644, show_##_name##_gov_p static ssize_t show_##file_name##_gov_sys\ (struct kobject *kobj, struct attribute *attr, char *buf)\ {\ -struct _gov##_dbs_tuners *tuners = _gov##_dbs_cdata.gdbs_data->tuners; \ +struct _gov##_dbs_tuners *tuners = global_dbs_data->tuners; \ return sprintf(buf, "%u\n", tuners->file_name);\ }\ \ @@ -94,7 +94,7 @@ static ssize_t show_##file_name##_gov_po static ssize_t store_##file_name##_gov_sys\ (struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) \ {\ -struct dbs_data *dbs_data = _gov##_dbs_cdata.gdbs_data;\ +struct dbs_data *dbs_data = global_dbs_data;\ return store_##file_name(dbs_data, buf, count);\ }\ \ @@ -201,19 +201,14 @@ struct cs_dbs_tuners { /* Common Governor data across policies */ struct dbs_data; struct common_dbs_data { -/* Common across governors */ +struct cpufreq_governor gov; + Actually, this line is completely unrelated to this patch. Should go on Patch 5? Cautiously Acked-by: Saravana Kannan <skan...@codeaurora.org> Good call on the "cautiously" I guess! -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 3/11] cpufreq: governor: Use common global_dbs_data pointer
et_sysfs_attr(dbs_data)); - if (ret) - goto reset_gdbs_data; - - return 0; + if (!ret) + return 0; I think the previous method of a handling the error is easier to read and more in line with the typical kernel coding style. The successful path ends in an unconditional return statement and the error paths are handled with a goto. This also doesn't seem relevant to what the patch is trying to do. So, I'd prefer that it be left as is. -reset_gdbs_data: policy->governor_data = NULL; - if (!have_governor_per_policy()) - cdata->gdbs_data = NULL; + global_dbs_data = NULL; + cdata->exit(dbs_data, !policy->governor->initialized); free_common_dbs_info: free_common_dbs_info(policy, cdata); @@ -418,7 +418,7 @@ static int cpufreq_governor_exit(struct policy->governor_data = NULL; if (!have_governor_per_policy()) - cdata->gdbs_data = NULL; + global_dbs_data = NULL; cdata->exit(dbs_data, policy->governor->initialized == 1); kfree(dbs_data); @@ -550,7 +550,7 @@ int cpufreq_governor_dbs(struct cpufreq_ if (have_governor_per_policy()) dbs_data = policy->governor_data; else - dbs_data = cdata->gdbs_data; + dbs_data = global_dbs_data; if (!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT)) { ret = -EINVAL; -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html If the minor comment is addressed, this looks okay to me. Cautiously Acked-by: Saravana Kannan <skan...@codeaurora.org> -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 3/11] cpufreq: governor: Use common global_dbs_data pointer
On 02/03/2016 05:25 PM, Rafael J. Wysocki wrote: On Thu, Feb 4, 2016 at 2:11 AM, Saravana Kannan <skan...@codeaurora.org> wrote: On 02/03/2016 03:22 PM, Rafael J. Wysocki wrote: From: Rafael J. Wysocki <rafael.j.wyso...@intel.com> If the ondemand and conservative governors cannot use per-policy tunables (CPUFREQ_HAVE_GOVERNOR_PER_POLICY is not set in the cpufreq driver), all policy objects point to the same single dbs_data object. Additionally, that object is pointed to by a global pointer hidden in the governor's data structures. There is no reason for that pointer to be buried in those data structures, though, so make it explicitly global. Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com> --- drivers/cpufreq/cpufreq_governor.c | 20 ++-- drivers/cpufreq/cpufreq_governor.h | 20 ++-- 2 files changed, 20 insertions(+), 20 deletions(-) Index: linux-pm/drivers/cpufreq/cpufreq_governor.h === --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h +++ linux-pm/drivers/cpufreq/cpufreq_governor.h @@ -78,7 +78,7 @@ __ATTR(_name, 0644, show_##_name##_gov_p static ssize_t show_##file_name##_gov_sys \ (struct kobject *kobj, struct attribute *attr, char *buf) \ { \ - struct _gov##_dbs_tuners *tuners = _gov##_dbs_cdata.gdbs_data->tuners; \ + struct _gov##_dbs_tuners *tuners = global_dbs_data->tuners; \ return sprintf(buf, "%u\n", tuners->file_name); \ } \ \ @@ -94,7 +94,7 @@ static ssize_t show_##file_name##_gov_po static ssize_t store_##file_name##_gov_sys\ (struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) \ { \ - struct dbs_data *dbs_data = _gov##_dbs_cdata.gdbs_data; \ + struct dbs_data *dbs_data = global_dbs_data;\ return store_##file_name(dbs_data, buf, count); \ } \ \ @@ -201,19 +201,14 @@ struct cs_dbs_tuners { /* Common Governor data across policies */ struct dbs_data; struct common_dbs_data { - /* Common across governors */ + struct cpufreq_governor gov; + #define GOV_ONDEMAND0 #define GOV_CONSERVATIVE1 int governor; struct attribute_group *attr_group_gov_sys; /* one governor - system */ struct attribute_group *attr_group_gov_pol; /* one governor - policy */ - /* -* Common data for platforms that don't set -* CPUFREQ_HAVE_GOVERNOR_PER_POLICY -*/ - struct dbs_data *gdbs_data; - struct cpu_dbs_info *(*get_cpu_cdbs)(int cpu); void *(*get_cpu_dbs_info_s)(int cpu); unsigned int (*gov_dbs_timer)(struct cpufreq_policy *policy); @@ -233,6 +228,11 @@ struct dbs_data { void *tuners; }; +/* + * Common governor data for platforms without CPUFREQ_HAVE_GOVERNOR_PER_POLICY. + */ +extern struct dbs_data *global_dbs_data; + /* Governor specific ops, will be passed to dbs_data->gov_ops */ struct od_ops { void (*powersave_bias_init_cpu)(int cpu); @@ -256,7 +256,7 @@ static inline int delay_for_sampling_rat static ssize_t show_sampling_rate_min_gov_sys \ (struct kobject *kobj, struct attribute *attr, char *buf) \ { \ - struct dbs_data *dbs_data = _gov##_dbs_cdata.gdbs_data; \ + struct dbs_data *dbs_data = global_dbs_data;\ return sprintf(buf, "%u\n", dbs_data->min_sampling_rate); \ } \ \ Index: linux-pm/drivers/cpufreq/cpufreq_governor.c === --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c +++ linux-pm/drivers/cpufreq/cpufreq_governor.c @@ -22,6 +22,9 @@ #include "cpufreq_governor.h" +struct dbs_data *global_dbs_data; +EXPORT_SYMBOL_GPL(global_dbs_data); + DEFINE_MUTEX(dbs_data_mutex); EXPORT_SYMBOL_GPL(dbs_data_mutex); @@ -377,22 +380,19 @@ static int cpufreq_governor_init(struct latency * LATENCY_MULTIPLIER)); if (!have_governor_per_policy()) - cdata->gdbs_data = dbs_data; + glob
Re: [PATCH 7/11] cpufreq: governor: Rework cpufreq_governor_dbs()
_dbs(struct cpufreq_policy *policy, - unsigned int event); - static struct dbs_governor od_dbs_gov = { .gov = { .name = "ondemand", - .governor = od_cpufreq_governor_dbs, + .governor = cpufreq_governor_dbs, .max_transition_latency = TRANSITION_LATENCY_LIMIT, .owner = THIS_MODULE, }, @@ -563,12 +560,6 @@ static struct dbs_governor od_dbs_gov = #define CPU_FREQ_GOV_ONDEMAND (_dbs_gov.gov) -static int od_cpufreq_governor_dbs(struct cpufreq_policy *policy, - unsigned int event) -{ - return cpufreq_governor_dbs(policy, _dbs_gov, event); -} - static void od_set_powersave_bias(unsigned int powersave_bias) { struct cpufreq_policy *policy; -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Nice! Acked-by: Saravana Kannan <skan...@codeaurora.org> -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 5/11] cpufreq: governor: Put governor structure into common_dbs_data
_cpu(cs_cpu_dbs_info, freq->cpu); - struct cpufreq_policy *policy = cpufreq_cpu_get_raw(freq->cpu); - - if (!policy) - return 0; - - /* policy isn't governed by conservative governor */ - if (policy->governor != _gov_conservative) - return 0; - - /* -* we only care if our internally tracked freq moves outside the 'valid' -* ranges of frequency available to us otherwise we do not change it - */ - if (dbs_info->requested_freq > policy->max - || dbs_info->requested_freq < policy->min) - dbs_info->requested_freq = freq->new; - - return 0; -} + void *data); static struct notifier_block cs_cpufreq_notifier_block = { .notifier_call = dbs_cpufreq_notifier, @@ -358,7 +325,16 @@ static void cs_exit(struct dbs_data *dbs define_get_cpu_dbs_routines(cs_cpu_dbs_info); +static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, + unsigned int event); + static struct common_dbs_data cs_dbs_cdata = { + .gov = { + .name = "conservative", + .governor = cs_cpufreq_governor_dbs, + .max_transition_latency = TRANSITION_LATENCY_LIMIT, + .owner = THIS_MODULE, + }, .governor = GOV_CONSERVATIVE, .attr_group_gov_sys = _attr_group_gov_sys, .attr_group_gov_pol = _attr_group_gov_pol, @@ -370,20 +346,48 @@ static struct common_dbs_data cs_dbs_cda .exit = cs_exit, }; +#define CPU_FREQ_GOV_CONSERVATIVE (_dbs_cdata.gov) + static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy, unsigned int event) { return cpufreq_governor_dbs(policy, _dbs_cdata, event); } +static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val, + void *data) +{ + struct cpufreq_freqs *freq = data; + struct cs_cpu_dbs_info_s *dbs_info = + _cpu(cs_cpu_dbs_info, freq->cpu); + struct cpufreq_policy *policy = cpufreq_cpu_get_raw(freq->cpu); + + if (!policy) + return 0; + + /* policy isn't governed by conservative governor */ + if (policy->governor != CPU_FREQ_GOV_CONSERVATIVE) + return 0; + + /* +* we only care if our internally tracked freq moves outside the 'valid' +* ranges of frequency available to us otherwise we do not change it + */ + if (dbs_info->requested_freq > policy->max + || dbs_info->requested_freq < policy->min) + dbs_info->requested_freq = freq->new; + + return 0; +} + static int __init cpufreq_gov_dbs_init(void) { - return cpufreq_register_governor(_gov_conservative); + return cpufreq_register_governor(CPU_FREQ_GOV_CONSERVATIVE); } static void __exit cpufreq_gov_dbs_exit(void) { - cpufreq_unregister_governor(_gov_conservative); + cpufreq_unregister_governor(CPU_FREQ_GOV_CONSERVATIVE); } MODULE_AUTHOR("Alexander Clouter <a...@digriz.org.uk>"); @@ -395,7 +399,7 @@ MODULE_LICENSE("GPL"); #ifdef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE struct cpufreq_governor *cpufreq_default_governor(void) { - return _gov_conservative; + return CPU_FREQ_GOV_CONSERVATIVE; } fs_initcall(cpufreq_gov_dbs_init); I'm not sold on the macros/#defines for the , but not a strong opinion. Acked-by: Saravana Kannan <skan...@codeaurora.org> -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 10:57 PM, Viresh Kumar wrote: On 02-02-16, 20:03, Saravana Kannan wrote: This is the hotplug case I mentioned. The sysfs file removals will happen only for the last CPU in that policy (we thankfully optimized that part last year). We also know that multiple CPUs can't be hotplugged at the same time. So, in the start of cpufreq_offline_prepare, we just need to check if this is the last CPU in the policy and if that's the case, do the gov sysfs remove and then grab the policy lock and do all our crap. If a read is going on, that's going to finish before the sysfs attr remove can go ahead and it can grab the policy lock if it needs to and that still won't cause a deadlock because we haven't yet grabbed the policy lock in cpufreq_offline_prepare(). If the read comes after the sysfs remove, then the read is obviously going to fail (we can depend on the sysfs framework on doing its job there). IMHO, these are all dirty hacks we should stay away from. Adding such hunks in code is considered a band-aid kind of solution and hurts readability badly. The new solution (new governor show/store) implement this in a very clean and proper way I feel.. Others are free to disagree though :) I think it looks clean since we haven't sorted out the race conditions that Juri pointed out. So, it's early to call this series clean :) Also, I don't see it as a dirty hack at all. What's so hacky about it? We are just identifying conditions when we'll have to remove the sysfs files and removing them before grabbing the policy lock. The unlock/lock that we have now is what is a dirty hack. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 10:54 PM, Viresh Kumar wrote: On 02-02-16, 17:32, Saravana Kannan wrote: But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c We can move the common part to cpufreq core and not make sched-dvfs reuse cpufreq_governor.c Let's do this please. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 11/19] cpufreq: assert policy->rwsem is held in __cpufreq_governor
On 02/02/2016 09:02 PM, Viresh Kumar wrote: On 02-02-16, 20:04, Saravana Kannan wrote: What's the s_active lock in CPU1 coming from? That's taken by sysfs core while removing the files. The only reason it's there today is because of the sysfs dir remove. If you move it before the policy->rwsem, you won't have it after the policy->rwsem too. So, I think it will fix the issue. Its complex and we will end up making ugly.. I disagree. I think it's way better and simpler than this patch set. It also doesn't tie into cpufreq_governor.* which is a good thing IMO since it keeps things simpler for sched-dvfs too. For example, EXIT can be called while switching governors. The policy->rwsem is taken at the beginning cpufreq_set_policy(). To decide if we should remove the governor sysfs directory so early (i.e. before taking rwsem) in the call, is going to be difficult. Just check if the governor is changing. And if it is, you just need to remove the policy specific stuff. Over that the same directory might be shared across multiple policies, and all that information is present only with the governor-core. That's why I said the gov needs to register the per pol and system wide attrs list separately. This will also remove the need for ever governor to do this crap. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 11/19] cpufreq: assert policy->rwsem is held in __cpufreq_governor
On 02/02/2016 06:13 PM, Viresh Kumar wrote: On 02-02-16, 13:37, Saravana Kannan wrote: On 02/01/2016 10:34 PM, Viresh Kumar wrote: What will that solve? It will stay exactly same then as well, as we would be adding/removing these attributes from within the same policy->rwsem .. The problem isn't that you are holding the policy rwsem. The problem is that we are trying to grab the same locks in different order. This is trying to fix that. That's exactly what I was trying to say, sorry for not being very clear. Even if you would move the sysfs file creation thing into the cpufreq core, instead of governor, we will have locks this way: CPU0CPU1 (sysfs read)(sysfs dir remove) s_active lock policy->rwsem policy->rwsem s_active lock (hang) And so I said, nothing will change. What's the s_active lock in CPU1 coming from? The only reason it's there today is because of the sysfs dir remove. If you move it before the policy->rwsem, you won't have it after the policy->rwsem too. So, I think it will fix the issue. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 05:52 PM, Rafael J. Wysocki wrote: On Wed, Feb 3, 2016 at 2:32 AM, Saravana Kannan wrote: On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote: On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan wrote: On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli wrote: [cut] I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint). The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves. I'm sorry that I just keep talking about the idea and not sending out the patches. I think you have a point, though. The deadlock really is specific to the governors using the code in cpufreq_governor.c. That said no other governors in the tree use any sysfs attributes for tunables AFAICS and the out-of-the tree ones are out of interest here. But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c Well, do you honestly think that using the existing stuff in it would be a good idea? If not, then why it matters at all? Also the deadlock happens if one of the tunable attributes is accessed while we're trying to remove it which very well may happen on read access too. Isn't this THE deadlock we are talking about? The removal of the attributes only happen when governors are changes and we send a POLICY_EXIT and or all the cores are hotplugged out. It generally happens when the "old" governor is going away, whatever the reason. And my suggestion would work just as well there. Why are you prefixing your sentence with "Also"? Is there some other case I'm not considering? Say someone is reading sampling_rate for a policy with 1 CPU in it and someone else is taking the CPU offline. The governor EXIT code path (that will trigger as a result) will try to remove the sampling_rate attribute and (if it does that under policy->rwsem) it'll wait for the read access to finish. Where exactly would you put the deadlock prevention in this case? This is the hotplug case I mentioned. The sysfs file removals will happen only for the last CPU in that policy (we thankfully optimized that part last year). We also know that multiple CPUs can't be hotplugged at the same time. So, in the start of cpufreq_offline_prepare, we just need to check if this is the last CPU in the policy and if that's the case, do the gov sysfs remove and then grab the policy lock and do all our crap. If a read is going on, that's going to finish before the sysfs attr remove can go ahead and it can grab the policy lock if it needs to and that still won't cause a deadlock because we haven't yet grabbed the policy lock in cpufreq_offline_prepare(). If the read comes after the sysfs remove, then the read is obviously going to fail (we can depend on the sysfs framework on doing its job there). Will that still leave any race conditions in? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote: On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan wrote: On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli wrote: [cut] I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint). The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves. I'm sorry that I just keep talking about the idea and not sending out the patches. I think you have a point, though. The deadlock really is specific to the governors using the code in cpufreq_governor.c. That said no other governors in the tree use any sysfs attributes for tunables AFAICS and the out-of-the tree ones are out of interest here. But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c Also the deadlock happens if one of the tunable attributes is accessed while we're trying to remove it which very well may happen on read access too. Isn't this THE deadlock we are talking about? The removal of the attributes only happen when governors are changes and we send a POLICY_EXIT and or all the cores are hotplugged out. And my suggestion would work just as well there. Why are you prefixing your sentence with "Also"? Is there some other case I'm not considering? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli wrote: Hi Rafael, On 02/02/16 17:35, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli wrote: Hi Viresh, On 02/02/16 16:27, Viresh Kumar wrote: Until now, governors (ondemand/conservative) were using the 'global-attr' or 'freq-attr', depending on the sysfs location where we want to create governor's directory. The problem is that, in case of 'freq-attr', we are forced to use show()/store() present in cpufreq.c, which always take policy->rwsem. And because of that we were facing some ABBA lockups during governor callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT event. That caused further problems and it never worked perfectly. This patch attempts to fix that by creating separate sysfs-ops for cpufreq governors. Because things got much simplified now, we don't need separate show/store callbacks for governor-for-system and governor-per-policy cases. Signed-off-by: Viresh Kumar This patch cleans things up a lot, that's good. One thing I'm still concerned about, though: don't we need some locking in place for some of the store operations on governors attributes? Are store_{ignore_nice_load, sampling_down_fact, etc} safe without locking? That would require some investigation I suppose. It seems that we can call them from different cpus concurrently. Yes, we can. One quick-and-dirty way of dealing with that might be to introduce a "sysfs lock" into struct dbs_data and hold that around the invocation of gattr->store() in the sysfs_ops's ->store callback. There is value in trying to solve this issue by using some of the existing locks, IMHO. Some value - maybe. I'm not sure how much of it, though. Finer-grained locking is generally easier to follow, because the locks tend to be used for specific purposes only. Can't we actually try to use the policy->rwsem (or one of the core locks) + wait_for_completion approach as we do in cpufreq core? No. Too many things depend on that lock already and some of them work by accident rather than by design. Also, wait_for_completion() and complete() is just another way to implement a lock. So, it won't necessarily solve any deadlock issues. I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint). The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves. I'm sorry that I just keep talking about the idea and not sending out the patches. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 11/19] cpufreq: assert policy->rwsem is held in __cpufreq_governor
On 02/01/2016 10:36 PM, Viresh Kumar wrote: On 01-02-16, 22:00, Rafael J. Wysocki wrote: I'm not sure what you mean by "the sysfs lock" here? The policy rwsem or something else? He perhaps referred to the s_active.lock that we see in traces. Yeah, that's what I mean. I generally don't use the exact name of the lock in emails (lazy to look it up) if there isn't a lot of chance for mistaking it for another lock. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 11/19] cpufreq: assert policy->rwsem is held in __cpufreq_governor
On 02/01/2016 10:34 PM, Viresh Kumar wrote: On 01-02-16, 12:24, Saravana Kannan wrote: On 02/01/2016 02:22 AM, Rafael J. Wysocki wrote: I'm not sure whose idea you are referring to. Viresh's (I don't think I saw his proposal) or mine. http://git.linaro.org/people/viresh.kumar/linux.git/commit/57714d5b1778f2f610bcc5c74d85b29ba1cc1995 Anyway, to explain my suggestion better, I'm proposing to make it so that we don't have a need for the AB BA locking. The only reason the governor needs to even grab the sysfs lock is to add/remove the sysfs attribute files. That can be easily achieved if the policy struct has some "gov_attrs" field(s) that each governor populates. Then the framework just has to create them after POLICY_INIT is processed by the governor and remove them before POILICY_EXIT is sent to the governor. What will that solve? It will stay exactly same then as well, as we would be adding/removing these attributes from within the same policy->rwsem .. The problem isn't that you are holding the policy rwsem. The problem is that we are trying to grab the same locks in different order. This is trying to fix that. That way, we also avoid having to worry about the gov attributes accessed by the show/store disappearing while the files are being accessed. It can't happen. S_active lock should be taking care of that, isn't it? You are right. That can't happen because we have the s_active lock. I meant to say that in general we don't have to worry about the races between a show/store needing some policy specific data within the governor to be valid but racing with governor change where it ends up being invalid. The releasing of the policy rwsem across POLICY_EXIT allows this to happen today. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 11/19] cpufreq: assert policy->rwsem is held in __cpufreq_governor
On 02/01/2016 10:36 PM, Viresh Kumar wrote: On 01-02-16, 22:00, Rafael J. Wysocki wrote: I'm not sure what you mean by "the sysfs lock" here? The policy rwsem or something else? He perhaps referred to the s_active.lock that we see in traces. Yeah, that's what I mean. I generally don't use the exact name of the lock in emails (lazy to look it up) if there isn't a lot of chance for mistaking it for another lock. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 11/19] cpufreq: assert policy->rwsem is held in __cpufreq_governor
On 02/01/2016 10:34 PM, Viresh Kumar wrote: On 01-02-16, 12:24, Saravana Kannan wrote: On 02/01/2016 02:22 AM, Rafael J. Wysocki wrote: I'm not sure whose idea you are referring to. Viresh's (I don't think I saw his proposal) or mine. http://git.linaro.org/people/viresh.kumar/linux.git/commit/57714d5b1778f2f610bcc5c74d85b29ba1cc1995 Anyway, to explain my suggestion better, I'm proposing to make it so that we don't have a need for the AB BA locking. The only reason the governor needs to even grab the sysfs lock is to add/remove the sysfs attribute files. That can be easily achieved if the policy struct has some "gov_attrs" field(s) that each governor populates. Then the framework just has to create them after POLICY_INIT is processed by the governor and remove them before POILICY_EXIT is sent to the governor. What will that solve? It will stay exactly same then as well, as we would be adding/removing these attributes from within the same policy->rwsem .. The problem isn't that you are holding the policy rwsem. The problem is that we are trying to grab the same locks in different order. This is trying to fix that. That way, we also avoid having to worry about the gov attributes accessed by the show/store disappearing while the files are being accessed. It can't happen. S_active lock should be taking care of that, isn't it? You are right. That can't happen because we have the s_active lock. I meant to say that in general we don't have to worry about the races between a show/store needing some policy specific data within the governor to be valid but racing with governor change where it ends up being invalid. The releasing of the policy rwsem across POLICY_EXIT allows this to happen today. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelliwrote: Hi Rafael, On 02/02/16 17:35, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli wrote: Hi Viresh, On 02/02/16 16:27, Viresh Kumar wrote: Until now, governors (ondemand/conservative) were using the 'global-attr' or 'freq-attr', depending on the sysfs location where we want to create governor's directory. The problem is that, in case of 'freq-attr', we are forced to use show()/store() present in cpufreq.c, which always take policy->rwsem. And because of that we were facing some ABBA lockups during governor callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT event. That caused further problems and it never worked perfectly. This patch attempts to fix that by creating separate sysfs-ops for cpufreq governors. Because things got much simplified now, we don't need separate show/store callbacks for governor-for-system and governor-per-policy cases. Signed-off-by: Viresh Kumar This patch cleans things up a lot, that's good. One thing I'm still concerned about, though: don't we need some locking in place for some of the store operations on governors attributes? Are store_{ignore_nice_load, sampling_down_fact, etc} safe without locking? That would require some investigation I suppose. It seems that we can call them from different cpus concurrently. Yes, we can. One quick-and-dirty way of dealing with that might be to introduce a "sysfs lock" into struct dbs_data and hold that around the invocation of gattr->store() in the sysfs_ops's ->store callback. There is value in trying to solve this issue by using some of the existing locks, IMHO. Some value - maybe. I'm not sure how much of it, though. Finer-grained locking is generally easier to follow, because the locks tend to be used for specific purposes only. Can't we actually try to use the policy->rwsem (or one of the core locks) + wait_for_completion approach as we do in cpufreq core? No. Too many things depend on that lock already and some of them work by accident rather than by design. Also, wait_for_completion() and complete() is just another way to implement a lock. So, it won't necessarily solve any deadlock issues. I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint). The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves. I'm sorry that I just keep talking about the idea and not sending out the patches. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote: On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki <raf...@kernel.org> wrote: On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan <skan...@codeaurora.org> wrote: On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli <juri.le...@arm.com> wrote: [cut] I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint). The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves. I'm sorry that I just keep talking about the idea and not sending out the patches. I think you have a point, though. The deadlock really is specific to the governors using the code in cpufreq_governor.c. That said no other governors in the tree use any sysfs attributes for tunables AFAICS and the out-of-the tree ones are out of interest here. But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c Also the deadlock happens if one of the tunable attributes is accessed while we're trying to remove it which very well may happen on read access too. Isn't this THE deadlock we are talking about? The removal of the attributes only happen when governors are changes and we send a POLICY_EXIT and or all the cores are hotplugged out. And my suggestion would work just as well there. Why are you prefixing your sentence with "Also"? Is there some other case I'm not considering? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 11/19] cpufreq: assert policy->rwsem is held in __cpufreq_governor
On 02/02/2016 09:02 PM, Viresh Kumar wrote: On 02-02-16, 20:04, Saravana Kannan wrote: What's the s_active lock in CPU1 coming from? That's taken by sysfs core while removing the files. The only reason it's there today is because of the sysfs dir remove. If you move it before the policy->rwsem, you won't have it after the policy->rwsem too. So, I think it will fix the issue. Its complex and we will end up making ugly.. I disagree. I think it's way better and simpler than this patch set. It also doesn't tie into cpufreq_governor.* which is a good thing IMO since it keeps things simpler for sched-dvfs too. For example, EXIT can be called while switching governors. The policy->rwsem is taken at the beginning cpufreq_set_policy(). To decide if we should remove the governor sysfs directory so early (i.e. before taking rwsem) in the call, is going to be difficult. Just check if the governor is changing. And if it is, you just need to remove the policy specific stuff. Over that the same directory might be shared across multiple policies, and all that information is present only with the governor-core. That's why I said the gov needs to register the per pol and system wide attrs list separately. This will also remove the need for ever governor to do this crap. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
On 02/02/2016 05:52 PM, Rafael J. Wysocki wrote: On Wed, Feb 3, 2016 at 2:32 AM, Saravana Kannan <skan...@codeaurora.org> wrote: On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote: On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki <raf...@kernel.org> wrote: On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan <skan...@codeaurora.org> wrote: On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote: On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli <juri.le...@arm.com> wrote: [cut] I also don't like this patch because it forces governors to either implement their own macros and management of their attributes or force them to use the governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO is very ondemand and conservative governor specific and is very irrelevant for sched-dvfs or any other governors (hint hint). The only time this ABBA locking is an issue is when governor are changing and trying to add/remove attributes. That can easily be checked in store_governor and dealt with without holding the policy rwsem if the governors can provide their per sys and per policy attribute arrays as part of registering themselves. I'm sorry that I just keep talking about the idea and not sending out the patches. I think you have a point, though. The deadlock really is specific to the governors using the code in cpufreq_governor.c. That said no other governors in the tree use any sysfs attributes for tunables AFAICS and the out-of-the tree ones are out of interest here. But if we are expecting sched dvfs to come in, why make it worse for it. It would be completely pointless to try and shoehorn sched dvfs to use cpufreq_governor.c Well, do you honestly think that using the existing stuff in it would be a good idea? If not, then why it matters at all? Also the deadlock happens if one of the tunable attributes is accessed while we're trying to remove it which very well may happen on read access too. Isn't this THE deadlock we are talking about? The removal of the attributes only happen when governors are changes and we send a POLICY_EXIT and or all the cores are hotplugged out. It generally happens when the "old" governor is going away, whatever the reason. And my suggestion would work just as well there. Why are you prefixing your sentence with "Also"? Is there some other case I'm not considering? Say someone is reading sampling_rate for a policy with 1 CPU in it and someone else is taking the CPU offline. The governor EXIT code path (that will trigger as a result) will try to remove the sampling_rate attribute and (if it does that under policy->rwsem) it'll wait for the read access to finish. Where exactly would you put the deadlock prevention in this case? This is the hotplug case I mentioned. The sysfs file removals will happen only for the last CPU in that policy (we thankfully optimized that part last year). We also know that multiple CPUs can't be hotplugged at the same time. So, in the start of cpufreq_offline_prepare, we just need to check if this is the last CPU in the policy and if that's the case, do the gov sysfs remove and then grab the policy lock and do all our crap. If a read is going on, that's going to finish before the sysfs attr remove can go ahead and it can grab the policy lock if it needs to and that still won't cause a deadlock because we haven't yet grabbed the policy lock in cpufreq_offline_prepare(). If the read comes after the sysfs remove, then the read is obviously going to fail (we can depend on the sysfs framework on doing its job there). Will that still leave any race conditions in? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 11/19] cpufreq: assert policy->rwsem is held in __cpufreq_governor
On 02/02/2016 06:13 PM, Viresh Kumar wrote: On 02-02-16, 13:37, Saravana Kannan wrote: On 02/01/2016 10:34 PM, Viresh Kumar wrote: What will that solve? It will stay exactly same then as well, as we would be adding/removing these attributes from within the same policy->rwsem .. The problem isn't that you are holding the policy rwsem. The problem is that we are trying to grab the same locks in different order. This is trying to fix that. That's exactly what I was trying to say, sorry for not being very clear. Even if you would move the sysfs file creation thing into the cpufreq core, instead of governor, we will have locks this way: CPU0CPU1 (sysfs read)(sysfs dir remove) s_active lock policy->rwsem policy->rwsem s_active lock (hang) And so I said, nothing will change. What's the s_active lock in CPU1 coming from? The only reason it's there today is because of the sysfs dir remove. If you move it before the policy->rwsem, you won't have it after the policy->rwsem too. So, I think it will fix the issue. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 11/19] cpufreq: assert policy->rwsem is held in __cpufreq_governor
On 02/01/2016 02:22 AM, Rafael J. Wysocki wrote: On Mon, Feb 1, 2016 at 7:09 AM, Viresh Kumar wrote: On 30-01-16, 12:49, Rafael J. Wysocki wrote: On Friday, January 29, 2016 04:33:39 PM Saravana Kannan wrote: AFAIR, the ABBA issue was between the sysfs lock and the policy lock. Yeah, to be precise here it is: CPU0 (sysfs read) CPU1 (exit governor) sysfs-read set_policy()-> lock policy->rwsem sysfs-active lock Remove sysfs files lock policy->rwsem sysfs-active lock Actual read The fix for that issue should not be dropping the lock around POLICY_EXIT. Right. Dropping the lock is a mistake (which I overlooked, sadly). I joined the party at around time of 3.10, and we had this problem and hacky solution then as well. We tried to get rid of it multiple times, but sadly failed. I kind of like your idea of accessing governor attributes without holding the policy rwsem. I'm not sure whose idea you are referring to. Viresh's (I don't think I saw his proposal) or mine. I looked at that code and it seems doable to me. The problem to solve there would be to ensure that the dbs_data pointer is valid when show/store runs for those attributes. The fact that we make the distinction between global and policy governors in there doesn't really help, but it looks like getting rid of that bit wouldn't be too much effort. Let me take a deeper look at that. Anyway, to explain my suggestion better, I'm proposing to make it so that we don't have a need for the AB BA locking. The only reason the governor needs to even grab the sysfs lock is to add/remove the sysfs attribute files. That can be easily achieved if the policy struct has some "gov_attrs" field(s) that each governor populates. Then the framework just has to create them after POLICY_INIT is processed by the governor and remove them before POILICY_EXIT is sent to the governor. That way, we also avoid having to worry about the gov attributes accessed by the show/store disappearing while the files are being accessed. Since we remove those files before we even ask the gov to clean up, that situation can never happen. The current problem is that there is no good place for the governor to populate this "gov_attrs" field(s). Maybe the governor register might be one place for it to provide the data to the framework and the framework can later fill it up itself when switching governors. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 11/19] cpufreq: assert policy->rwsem is held in __cpufreq_governor
On 02/01/2016 02:22 AM, Rafael J. Wysocki wrote: On Mon, Feb 1, 2016 at 7:09 AM, Viresh Kumar <viresh.ku...@linaro.org> wrote: On 30-01-16, 12:49, Rafael J. Wysocki wrote: On Friday, January 29, 2016 04:33:39 PM Saravana Kannan wrote: AFAIR, the ABBA issue was between the sysfs lock and the policy lock. Yeah, to be precise here it is: CPU0 (sysfs read) CPU1 (exit governor) sysfs-read set_policy()-> lock policy->rwsem sysfs-active lock Remove sysfs files lock policy->rwsem sysfs-active lock Actual read The fix for that issue should not be dropping the lock around POLICY_EXIT. Right. Dropping the lock is a mistake (which I overlooked, sadly). I joined the party at around time of 3.10, and we had this problem and hacky solution then as well. We tried to get rid of it multiple times, but sadly failed. I kind of like your idea of accessing governor attributes without holding the policy rwsem. I'm not sure whose idea you are referring to. Viresh's (I don't think I saw his proposal) or mine. I looked at that code and it seems doable to me. The problem to solve there would be to ensure that the dbs_data pointer is valid when show/store runs for those attributes. The fact that we make the distinction between global and policy governors in there doesn't really help, but it looks like getting rid of that bit wouldn't be too much effort. Let me take a deeper look at that. Anyway, to explain my suggestion better, I'm proposing to make it so that we don't have a need for the AB BA locking. The only reason the governor needs to even grab the sysfs lock is to add/remove the sysfs attribute files. That can be easily achieved if the policy struct has some "gov_attrs" field(s) that each governor populates. Then the framework just has to create them after POLICY_INIT is processed by the governor and remove them before POILICY_EXIT is sent to the governor. That way, we also avoid having to worry about the gov attributes accessed by the show/store disappearing while the files are being accessed. Since we remove those files before we even ask the gov to clean up, that situation can never happen. The current problem is that there is no good place for the governor to populate this "gov_attrs" field(s). Maybe the governor register might be one place for it to provide the data to the framework and the framework can later fill it up itself when switching governors. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 00/19] cpufreq locking cleanups and documentation
On 01/11/2016 09:35 AM, Juri Lelli wrote: Hi all, In the context of the ongoing discussion about introducing a simple platform energy model to guide scheduling decisions (Energy Aware Scheduling [1]) concerns have been expressed by Peter about the component in charge of driving clock frequency selection (Steve recently posted an update of such component [2]): https://lkml.org/lkml/2015/8/15/141. The problem is that, with this new approach, cpufreq core functions need to be accessed from scheduler hot-paths and the overhead associated with the current locking scheme might result to be unsustainable. Peter's proposed approach of using RCU logic to reduce locking overhead seems reasonable, but things may not be so straightforward as originally thought. The very first thing I actually realized when I started looking into this is that it was hard for me to understand which locking mechanism was protecting which data structure. As mostly a way to build a better understanding of the current cpufreq locking scheme and also as preparatory work for implementing RCU logic, I came up with this set of patches. In fact, at this stage, I would like each patch to be considered as a question I'm asking rather than a proposed change, thus the RFC tag for the series; with the intent of documenting current locking scheme and modifying it a bit in order to make RCU logic implementation easier. Actually, as you'll soon notice, I didn't really start from scratch. Mike shared with me some patches he has been developing while looking at the same problem. I've given Mike attribution for the patches that I took unchanged from him, with thanks for sharing his findings with me. High level description of patches: o [01-04] cleanup and move code around to make things (hopefully) cleaner o [05-14] insert lockdep assertions and fix uncovered erroneous situations o [15-18] remove overkill usage of locking mechanism o 19 adds documentation for the cleaned up locking scheme With Viresh' tests [3] on both arm TC2 and arm64 Juno boards I'm not seeing anything bad happening. However, coverage is really small (as is my personal confidence of not breaking things for other confs :-)). This set is based on top of linux-pm/linux-next as of today and it is also available from here: git://linux-arm.org/linux-jl.git upstream/cpufreq_cleanups Comments, concerns and rants are the primary goal of this posting; I'm thus looking forward to them. Best, - Juri [1] https://lkml.org/lkml/2015/7/7/754 [2] https://lkml.org/lkml/2015/12/9/35 [3] https://git.linaro.org/people/viresh.kumar/cpufreq-tests.git Juri Lelli (16): cpufreq: kill for_each_policy cpufreq: bring data structures close to their locks cpufreq: assert locking when accessing cpufreq_policy_list cpufreq: always access cpufreq_policy_list while holding cpufreq_driver_lock cpufreq: assert locking when accessing cpufreq_governor_list cpufreq: fix warning for cpufreq_init_policy unlocked access to cpufreq_governor_list cpufreq: fix warning for show_scaling_available_governors unlocked access to cpufreq_governor_list cpufreq: assert policy->rwsem is held in cpufreq_set_policy cpufreq: assert policy->rwsem is held in __cpufreq_governor cpufreq: fix locking of policy->rwsem in cpufreq_init_policy cpufreq: fix locking of policy->rwsem in cpufreq_offline_prepare cpufreq: fix locking of policy->rwsem in cpufreq_offline_finish cpufreq: remove useless usage of cpufreq_governor_mutex in __cpufreq_governor cpufreq: hold policy->rwsem across CPUFREQ_GOV_POLICY_EXIT cpufreq: stop checking for cpufreq_driver being present in cpufreq_cpu_get cpufreq: documentation: document locking scheme Michael Turquette (3): cpufreq: do not expose cpufreq_governor_lock cpufreq: merge governor lock and mutex cpufreq: remove transition_lock Documentation/cpu-freq/core.txt| 44 + drivers/cpufreq/cpufreq.c | 132 +++-- drivers/cpufreq/cpufreq_governor.h | 2 - include/linux/cpufreq.h| 5 -- 4 files changed, 125 insertions(+), 58 deletions(-) Juri, I haven't looked at the cpufreq-tests, but I doubt they do hotplug testing where they remove all the CPUs of a policy (to trigger a policy exit). Can you please add that to your testing? I wouldn't be surprised if some of your clean ups would cause a dead lock. This clean up series is definitely appreciated, but I think the patch series might still be missing some patches that are needed to make things work without deadlocking. I'll try to do a deeper analysis/review/testing, but kinda hard pressed on time here. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 11/19] cpufreq: assert policy->rwsem is held in __cpufreq_governor
On 01/12/2016 02:20 AM, Viresh Kumar wrote: On 11-01-16, 17:35, Juri Lelli wrote: __cpufreq_governor works on policy, so policy->rwsem has to be held. Add assertion for such condition. Cc: "Rafael J. Wysocki" Cc: Viresh Kumar Signed-off-by: Juri Lelli --- drivers/cpufreq/cpufreq.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index f1f9fbc..e7fc5c9 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1950,6 +1950,9 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, /* Don't start any governor operations if we are entering suspend */ if (cpufreq_suspended) return 0; + + lockdep_assert_held(>rwsem); + We had an ABBA problem with the EXIT governor callback and so this rwsem is dropped just before that from set_policy().. commit 955ef4833574 ("cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT") AFAIR, the ABBA issue was between the sysfs lock and the policy lock. The fix for that issue should not be dropping the lock around POLICY_EXIT. The proper fix is to have the governor "export" the attributes it wants to add/remove and have the cpufreq framework do the adding/removing of the attributes from sysfs for the governor. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 00/19] cpufreq locking cleanups and documentation
On 01/11/2016 09:35 AM, Juri Lelli wrote: Hi all, In the context of the ongoing discussion about introducing a simple platform energy model to guide scheduling decisions (Energy Aware Scheduling [1]) concerns have been expressed by Peter about the component in charge of driving clock frequency selection (Steve recently posted an update of such component [2]): https://lkml.org/lkml/2015/8/15/141. The problem is that, with this new approach, cpufreq core functions need to be accessed from scheduler hot-paths and the overhead associated with the current locking scheme might result to be unsustainable. Peter's proposed approach of using RCU logic to reduce locking overhead seems reasonable, but things may not be so straightforward as originally thought. The very first thing I actually realized when I started looking into this is that it was hard for me to understand which locking mechanism was protecting which data structure. As mostly a way to build a better understanding of the current cpufreq locking scheme and also as preparatory work for implementing RCU logic, I came up with this set of patches. In fact, at this stage, I would like each patch to be considered as a question I'm asking rather than a proposed change, thus the RFC tag for the series; with the intent of documenting current locking scheme and modifying it a bit in order to make RCU logic implementation easier. Actually, as you'll soon notice, I didn't really start from scratch. Mike shared with me some patches he has been developing while looking at the same problem. I've given Mike attribution for the patches that I took unchanged from him, with thanks for sharing his findings with me. High level description of patches: o [01-04] cleanup and move code around to make things (hopefully) cleaner o [05-14] insert lockdep assertions and fix uncovered erroneous situations o [15-18] remove overkill usage of locking mechanism o 19 adds documentation for the cleaned up locking scheme With Viresh' tests [3] on both arm TC2 and arm64 Juno boards I'm not seeing anything bad happening. However, coverage is really small (as is my personal confidence of not breaking things for other confs :-)). This set is based on top of linux-pm/linux-next as of today and it is also available from here: git://linux-arm.org/linux-jl.git upstream/cpufreq_cleanups Comments, concerns and rants are the primary goal of this posting; I'm thus looking forward to them. Best, - Juri [1] https://lkml.org/lkml/2015/7/7/754 [2] https://lkml.org/lkml/2015/12/9/35 [3] https://git.linaro.org/people/viresh.kumar/cpufreq-tests.git Juri Lelli (16): cpufreq: kill for_each_policy cpufreq: bring data structures close to their locks cpufreq: assert locking when accessing cpufreq_policy_list cpufreq: always access cpufreq_policy_list while holding cpufreq_driver_lock cpufreq: assert locking when accessing cpufreq_governor_list cpufreq: fix warning for cpufreq_init_policy unlocked access to cpufreq_governor_list cpufreq: fix warning for show_scaling_available_governors unlocked access to cpufreq_governor_list cpufreq: assert policy->rwsem is held in cpufreq_set_policy cpufreq: assert policy->rwsem is held in __cpufreq_governor cpufreq: fix locking of policy->rwsem in cpufreq_init_policy cpufreq: fix locking of policy->rwsem in cpufreq_offline_prepare cpufreq: fix locking of policy->rwsem in cpufreq_offline_finish cpufreq: remove useless usage of cpufreq_governor_mutex in __cpufreq_governor cpufreq: hold policy->rwsem across CPUFREQ_GOV_POLICY_EXIT cpufreq: stop checking for cpufreq_driver being present in cpufreq_cpu_get cpufreq: documentation: document locking scheme Michael Turquette (3): cpufreq: do not expose cpufreq_governor_lock cpufreq: merge governor lock and mutex cpufreq: remove transition_lock Documentation/cpu-freq/core.txt| 44 + drivers/cpufreq/cpufreq.c | 132 +++-- drivers/cpufreq/cpufreq_governor.h | 2 - include/linux/cpufreq.h| 5 -- 4 files changed, 125 insertions(+), 58 deletions(-) Juri, I haven't looked at the cpufreq-tests, but I doubt they do hotplug testing where they remove all the CPUs of a policy (to trigger a policy exit). Can you please add that to your testing? I wouldn't be surprised if some of your clean ups would cause a dead lock. This clean up series is definitely appreciated, but I think the patch series might still be missing some patches that are needed to make things work without deadlocking. I'll try to do a deeper analysis/review/testing, but kinda hard pressed on time here. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [RFC PATCH 11/19] cpufreq: assert policy->rwsem is held in __cpufreq_governor
On 01/12/2016 02:20 AM, Viresh Kumar wrote: On 11-01-16, 17:35, Juri Lelli wrote: __cpufreq_governor works on policy, so policy->rwsem has to be held. Add assertion for such condition. Cc: "Rafael J. Wysocki"Cc: Viresh Kumar Signed-off-by: Juri Lelli --- drivers/cpufreq/cpufreq.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index f1f9fbc..e7fc5c9 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1950,6 +1950,9 @@ static int __cpufreq_governor(struct cpufreq_policy *policy, /* Don't start any governor operations if we are entering suspend */ if (cpufreq_suspended) return 0; + + lockdep_assert_held(>rwsem); + We had an ABBA problem with the EXIT governor callback and so this rwsem is dropped just before that from set_policy().. commit 955ef4833574 ("cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT") AFAIR, the ABBA issue was between the sysfs lock and the policy lock. The fix for that issue should not be dropping the lock around POLICY_EXIT. The proper fix is to have the governor "export" the attributes it wants to add/remove and have the cpufreq framework do the adding/removing of the attributes from sysfs for the governor. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V3 5/5] cpufreq: postfix policy directory with the first CPU in related_cpus
On 10/16/2015 12:11 AM, Viresh Kumar wrote: The sysfs policy directory is postfixed currently with the CPU number for which the policy was created, which isn't necessarily the first CPU in related_cpus mask. To make it more consistent and predictable, lets postfix the policy with the first cpu in related-cpus mask. Suggested-by: Saravana Kannan Signed-off-by: Viresh Kumar --- V2->V3: - Fix error path where we may try to put an uninitialized kobject. - Break kobject_init_and_add() to kobject_init() and kobject_add(). drivers/cpufreq/cpufreq.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 4fa2215cc6ec..7c48e7316d91 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1022,7 +1022,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) { struct device *dev = get_cpu_device(cpu); struct cpufreq_policy *policy; - int ret; if (WARN_ON(!dev)) return NULL; @@ -1040,13 +1039,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) if (!zalloc_cpumask_var(>real_cpus, GFP_KERNEL)) goto err_free_rcpumask; - ret = kobject_init_and_add(>kobj, _cpufreq, - cpufreq_global_kobject, "policy%u", cpu); - if (ret) { - pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret); - goto err_free_real_cpus; - } - + kobject_init(>kobj, _cpufreq); Oh yeah, this works better. I forgot kobject has a separate init and add fuctions. INIT_LIST_HEAD(>policy_list); init_rwsem(>rwsem); spin_lock_init(>transition_lock); @@ -1057,8 +1050,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) policy->cpu = cpu; return policy; -err_free_real_cpus: - free_cpumask_var(policy->real_cpus); err_free_rcpumask: free_cpumask_var(policy->related_cpus); err_free_cpumask: @@ -1163,6 +1154,16 @@ static int cpufreq_online(unsigned int cpu) cpumask_copy(policy->related_cpus, policy->cpus); /* Remember CPUs present at the policy creation time. */ cpumask_and(policy->real_cpus, policy->cpus, cpu_present_mask); + + /* Name and add the kobject */ + ret = kobject_add(>kobj, cpufreq_global_kobject, + "policy%u", + cpumask_first(policy->related_cpus)); + if (ret) { + pr_err("%s: failed to add policy->kobj: %d\n", __func__, + ret); + goto out_exit_policy; + } Another out of patch issue that I see while reviewing this patch: I think the existing error handling gotos aren't really cleaning things up well. In the lines that follow this code we set the per_cpu(cpufreq_cpu_data) to point to the new policy. But if the subsequent cpu->get() fails, we goto out_exit_policy. But that label doesn't clean up the per_cpu(cpufreq_cpu_data). So, I think we need another label to jump to if ->get() fails } /* Reviewed-by: Saravana Kannan -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 5/5] cpufreq: postfix policy directory with the first CPU in related_cpus
On 10/16/2015 12:11 AM, Viresh Kumar wrote: The sysfs policy directory is postfixed currently with the CPU number for which the policy was created, which isn't necessarily the first CPU in related_cpus mask. To make it more consistent and predictable, lets postfix the policy with the first cpu in related-cpus mask. Suggested-by: Saravana Kannan <skan...@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.ku...@linaro.org> --- V2->V3: - Fix error path where we may try to put an uninitialized kobject. - Break kobject_init_and_add() to kobject_init() and kobject_add(). drivers/cpufreq/cpufreq.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 4fa2215cc6ec..7c48e7316d91 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1022,7 +1022,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) { struct device *dev = get_cpu_device(cpu); struct cpufreq_policy *policy; - int ret; if (WARN_ON(!dev)) return NULL; @@ -1040,13 +1039,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) if (!zalloc_cpumask_var(>real_cpus, GFP_KERNEL)) goto err_free_rcpumask; - ret = kobject_init_and_add(>kobj, _cpufreq, - cpufreq_global_kobject, "policy%u", cpu); - if (ret) { - pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret); - goto err_free_real_cpus; - } - + kobject_init(>kobj, _cpufreq); Oh yeah, this works better. I forgot kobject has a separate init and add fuctions. INIT_LIST_HEAD(>policy_list); init_rwsem(>rwsem); spin_lock_init(>transition_lock); @@ -1057,8 +1050,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) policy->cpu = cpu; return policy; -err_free_real_cpus: - free_cpumask_var(policy->real_cpus); err_free_rcpumask: free_cpumask_var(policy->related_cpus); err_free_cpumask: @@ -1163,6 +1154,16 @@ static int cpufreq_online(unsigned int cpu) cpumask_copy(policy->related_cpus, policy->cpus); /* Remember CPUs present at the policy creation time. */ cpumask_and(policy->real_cpus, policy->cpus, cpu_present_mask); + + /* Name and add the kobject */ + ret = kobject_add(>kobj, cpufreq_global_kobject, + "policy%u", + cpumask_first(policy->related_cpus)); + if (ret) { + pr_err("%s: failed to add policy->kobj: %d\n", __func__, + ret); + goto out_exit_policy; + } Another out of patch issue that I see while reviewing this patch: I think the existing error handling gotos aren't really cleaning things up well. In the lines that follow this code we set the per_cpu(cpufreq_cpu_data) to point to the new policy. But if the subsequent cpu->get() fails, we goto out_exit_policy. But that label doesn't clean up the per_cpu(cpufreq_cpu_data). So, I think we need another label to jump to if ->get() fails } /* Reviewed-by: Saravana Kannan <skan...@codeaurora.org> -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] cpufreq: create cpu/cpufreq/policyX directories
On 10/14/2015 11:55 PM, Viresh Kumar wrote: On 13-10-15, 12:29, Saravana Kannan wrote: But we don't need to track track of "present-cpus" separately though. We could do the for_each_cpu_and() when we create the symlinks for the first time. And after that, we can just use the subsystem interface callbacks (cpufreq_add_dev() and cpufreq_remove_dev()) to keep the symlinks updated. I don't see any place where keeping track of this separately is more efficient. This would save some memory savings when the number of CPUs is large and also simplify the code because we won't have to keep another field up to date. It is still required to track when can we free the policy. Ok, I'm not very familiar with this new field's uses. I'll take a closer look later and respond if I have other thoughts. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 4/5] cpufreq: create cpu/cpufreq/policyX directories
On 10/15/2015 09:05 AM, Viresh Kumar wrote: The cpufreq sysfs interface had been a bit inconsistent as one of the CPUs for a policy had a real directory within its sysfs 'cpuX' directory and all other CPUs had links to it. That also made the code a bit complex as we need to take care of moving the sysfs directory if the CPU containing the real directory is getting physically hot-unplugged. Solve this by creating 'policyX' directories (per-policy) in /sys/devices/system/cpu/cpufreq/ directory, where X is the CPU for which the policy was first created. This also removes the need of keeping kobj_cpu and we can remove it now. Suggested-by: Saravana Kannan Signed-off-by: Viresh Kumar Since you've added a separate patch for making policyX more consistent: Reviewed-by: Saravana Kannan Btw, does a Review-by have an implicit Acked-by? --- drivers/cpufreq/cpufreq.c | 34 -- include/linux/cpufreq.h | 1 - 2 files changed, 4 insertions(+), 31 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 04222e7bbc73..4fa2215cc6ec 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -910,9 +910,6 @@ static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy) /* Some related CPUs might not be present (physically hotplugged) */ for_each_cpu(j, policy->real_cpus) { - if (j == policy->kobj_cpu) - continue; - ret = add_cpu_dev_symlink(policy, j); if (ret) break; Kinda unrelated to this patch, but shouldn't this function undo the symlinks is has created so far before returning? Otherwise, we'd be leaving around broken symlinks. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 5/5] cpufreq: postfix policy directory with the first CPU in related_cpus
On 10/15/2015 09:05 AM, Viresh Kumar wrote: The sysfs policy directory is postfixed currently with the CPU number for which the policy was created, which isn't necessarily the first CPU in related_cpus mask. To make it more consistent and predictable, lets postfix the policy with the first cpu in related-cpus mask. Suggested-by: Saravana Kannan Signed-off-by: Viresh Kumar --- drivers/cpufreq/cpufreq.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 4fa2215cc6ec..3fe13875565d 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1022,7 +1022,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) { struct device *dev = get_cpu_device(cpu); struct cpufreq_policy *policy; - int ret; if (WARN_ON(!dev)) return NULL; @@ -1040,13 +1039,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) if (!zalloc_cpumask_var(>real_cpus, GFP_KERNEL)) goto err_free_rcpumask; - ret = kobject_init_and_add(>kobj, _cpufreq, - cpufreq_global_kobject, "policy%u", cpu); - if (ret) { - pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret); - goto err_free_real_cpus; - } - INIT_LIST_HEAD(>policy_list); init_rwsem(>rwsem); spin_lock_init(>transition_lock); @@ -1057,7 +1049,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) policy->cpu = cpu; return policy; -err_free_real_cpus: free_cpumask_var(policy->real_cpus); Delete this line too? Does GCC not complain about unreachable code? err_free_rcpumask: free_cpumask_var(policy->related_cpus); @@ -1163,6 +1154,16 @@ static int cpufreq_online(unsigned int cpu) cpumask_copy(policy->related_cpus, policy->cpus); /* Remember CPUs present at the policy creation time. */ cpumask_and(policy->real_cpus, policy->cpus, cpu_present_mask); + + /* Initialize the kobject */ + ret = kobject_init_and_add(>kobj, _cpufreq, + cpufreq_global_kobject, "policy%u", + cpumask_first(policy->related_cpus)); + if (ret) { + pr_err("%s: failed to init policy->kobj: %d\n", + __func__, ret); + goto out_exit_policy; out_exit_policy label includes a call to cpufreq_policy_free(). That function needs to be changed to not call cpufreq_policy_put_kobj() in this case so that we don't try to kobject_put() an unallocated kobj. Maybe you an call cpufreq_policy_put_kobj() in the error handling path of this function? Basically split out kojb alloc and free from policy alloc and free and alloc/free them around the same time (cpufreq_remove_Dev() will have to also call cpufreq_policy_put_kobj() when real_cpus is empty(). The refactor is just a suggestion. I'm looking at the latest code in a gitweb and making comments. So, I might have missed some corner cases in the refactor. Also, it might be better to move the notifier from within cpufreq_policy_put_kobj() to cpufreq_policy_free()? Seems more appropriate. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 4/5] cpufreq: create cpu/cpufreq/policyX directories
On 10/15/2015 09:05 AM, Viresh Kumar wrote: The cpufreq sysfs interface had been a bit inconsistent as one of the CPUs for a policy had a real directory within its sysfs 'cpuX' directory and all other CPUs had links to it. That also made the code a bit complex as we need to take care of moving the sysfs directory if the CPU containing the real directory is getting physically hot-unplugged. Solve this by creating 'policyX' directories (per-policy) in /sys/devices/system/cpu/cpufreq/ directory, where X is the CPU for which the policy was first created. This also removes the need of keeping kobj_cpu and we can remove it now. Suggested-by: Saravana Kannan <skan...@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.ku...@linaro.org> Since you've added a separate patch for making policyX more consistent: Reviewed-by: Saravana Kannan <skan...@codeaurora.org> Btw, does a Review-by have an implicit Acked-by? --- drivers/cpufreq/cpufreq.c | 34 -- include/linux/cpufreq.h | 1 - 2 files changed, 4 insertions(+), 31 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 04222e7bbc73..4fa2215cc6ec 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -910,9 +910,6 @@ static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy) /* Some related CPUs might not be present (physically hotplugged) */ for_each_cpu(j, policy->real_cpus) { - if (j == policy->kobj_cpu) - continue; - ret = add_cpu_dev_symlink(policy, j); if (ret) break; Kinda unrelated to this patch, but shouldn't this function undo the symlinks is has created so far before returning? Otherwise, we'd be leaving around broken symlinks. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] cpufreq: create cpu/cpufreq/policyX directories
On 10/14/2015 11:55 PM, Viresh Kumar wrote: On 13-10-15, 12:29, Saravana Kannan wrote: But we don't need to track track of "present-cpus" separately though. We could do the for_each_cpu_and() when we create the symlinks for the first time. And after that, we can just use the subsystem interface callbacks (cpufreq_add_dev() and cpufreq_remove_dev()) to keep the symlinks updated. I don't see any place where keeping track of this separately is more efficient. This would save some memory savings when the number of CPUs is large and also simplify the code because we won't have to keep another field up to date. It is still required to track when can we free the policy. Ok, I'm not very familiar with this new field's uses. I'll take a closer look later and respond if I have other thoughts. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 5/5] cpufreq: postfix policy directory with the first CPU in related_cpus
On 10/15/2015 09:05 AM, Viresh Kumar wrote: The sysfs policy directory is postfixed currently with the CPU number for which the policy was created, which isn't necessarily the first CPU in related_cpus mask. To make it more consistent and predictable, lets postfix the policy with the first cpu in related-cpus mask. Suggested-by: Saravana Kannan <skan...@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.ku...@linaro.org> --- drivers/cpufreq/cpufreq.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 4fa2215cc6ec..3fe13875565d 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1022,7 +1022,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) { struct device *dev = get_cpu_device(cpu); struct cpufreq_policy *policy; - int ret; if (WARN_ON(!dev)) return NULL; @@ -1040,13 +1039,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) if (!zalloc_cpumask_var(>real_cpus, GFP_KERNEL)) goto err_free_rcpumask; - ret = kobject_init_and_add(>kobj, _cpufreq, - cpufreq_global_kobject, "policy%u", cpu); - if (ret) { - pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret); - goto err_free_real_cpus; - } - INIT_LIST_HEAD(>policy_list); init_rwsem(>rwsem); spin_lock_init(>transition_lock); @@ -1057,7 +1049,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) policy->cpu = cpu; return policy; -err_free_real_cpus: free_cpumask_var(policy->real_cpus); Delete this line too? Does GCC not complain about unreachable code? err_free_rcpumask: free_cpumask_var(policy->related_cpus); @@ -1163,6 +1154,16 @@ static int cpufreq_online(unsigned int cpu) cpumask_copy(policy->related_cpus, policy->cpus); /* Remember CPUs present at the policy creation time. */ cpumask_and(policy->real_cpus, policy->cpus, cpu_present_mask); + + /* Initialize the kobject */ + ret = kobject_init_and_add(>kobj, _cpufreq, + cpufreq_global_kobject, "policy%u", + cpumask_first(policy->related_cpus)); + if (ret) { + pr_err("%s: failed to init policy->kobj: %d\n", + __func__, ret); + goto out_exit_policy; out_exit_policy label includes a call to cpufreq_policy_free(). That function needs to be changed to not call cpufreq_policy_put_kobj() in this case so that we don't try to kobject_put() an unallocated kobj. Maybe you an call cpufreq_policy_put_kobj() in the error handling path of this function? Basically split out kojb alloc and free from policy alloc and free and alloc/free them around the same time (cpufreq_remove_Dev() will have to also call cpufreq_policy_put_kobj() when real_cpus is empty(). The refactor is just a suggestion. I'm looking at the latest code in a gitweb and making comments. So, I might have missed some corner cases in the refactor. Also, it might be better to move the notifier from within cpufreq_policy_put_kobj() to cpufreq_policy_free()? Seems more appropriate. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH Resend] cpufreq: Drop redundant check for inactive policies
On 10/14/2015 05:35 PM, Rafael J. Wysocki wrote: On Tuesday, October 13, 2015 10:57:13 AM Viresh Kumar wrote: We just made sure policy->cpu is online and this check will always fail as the policy is active. Drop it. Signed-off-by: Viresh Kumar Acked-by: Saravana Kannan Applied, thanks! Rafael I didn't give a clear ack/review for the series. So, to clarify my ack/review For all patches except 4/5, I'm okay with either/all of this: Reviewed-by: Saravana Kannan Acked-by: Saravana Kannan For 4/5, I would still like us to move the sysfs creating after init. That part shouldn't be too hard. We don't need to create the sysfs file before init. Once that's done, I wouldn't mind giving an Ack. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH Resend] cpufreq: Drop redundant check for inactive policies
On 10/14/2015 05:35 PM, Rafael J. Wysocki wrote: On Tuesday, October 13, 2015 10:57:13 AM Viresh Kumar wrote: We just made sure policy->cpu is online and this check will always fail as the policy is active. Drop it. Signed-off-by: Viresh Kumar <viresh.ku...@linaro.org> Acked-by: Saravana Kannan <skan...@codeaurora.org> Applied, thanks! Rafael I didn't give a clear ack/review for the series. So, to clarify my ack/review For all patches except 4/5, I'm okay with either/all of this: Reviewed-by: Saravana Kannan <skan...@codeaurora.org> Acked-by: Saravana Kannan <skan...@codeaurora.org> For 4/5, I would still like us to move the sysfs creating after init. That part shouldn't be too hard. We don't need to create the sysfs file before init. Once that's done, I wouldn't mind giving an Ack. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] cpufreq: create cpu/cpufreq/policyX directories
On 10/12/2015 08:39 PM, Viresh Kumar wrote: On 12-10-15, 12:31, Saravana Kannan wrote: Can we use the first CPU in the related CPUs mask? Instead of the first CPU that the policy got created on? The policyX numbering would be a bit more consistent that way. Okay.. Suggested-by: ? Will add. Though me/Rafael thought about it long back, but then dropped the idea :) Didn't notice when this got added. Do we really need this anymore if we don't care about moving the directory and creating symlinks? I don't think we need it anymore. And if we really need to know related - offline, we can use for_each_cpu_and(related, online/present mask) Its about tracking present-cpus, for which the link is present. So, it is still required. But we don't need to track track of "present-cpus" separately though. We could do the for_each_cpu_and() when we create the symlinks for the first time. And after that, we can just use the subsystem interface callbacks (cpufreq_add_dev() and cpufreq_remove_dev()) to keep the symlinks updated. I don't see any place where keeping track of this separately is more efficient. This would save some memory savings when the number of CPUs is large and also simplify the code because we won't have to keep another field up to date. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] cpufreq: create cpu/cpufreq/policyX directories
On 10/12/2015 11:15 PM, Viresh Kumar wrote: On 12-10-15, 12:31, Saravana Kannan wrote: Can we use the first CPU in the related CPUs mask? Instead of the first CPU that the policy got created on? The policyX numbering would be a bit more consistent that way. Okay, checked this again. The problem is that ->init() isn't called yet and we are very early in the initialization sequence. So, we can't really know related_cpus yet. So I will keep it unchanged for now. Can we move the sysfs add to the end so that by the time we add sysfs, we'll have all the details? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
On 10/12/2015 08:23 PM, Viresh Kumar wrote: On 12-10-15, 12:12, Saravana Kannan wrote: if (new_policy) { /* related_cpus should at least include policy->cpus. */ - cpumask_or(policy->related_cpus, policy->related_cpus, policy->cpus); + cpumask_copy(policy->related_cpus, policy->cpus); Again, why? It actually seems wrong. A 4 core cluster could come up with just 2 cores when the policy is added. But the related CPUs would be 4 CPUs. Firstly, the patch hasn't changed anything at all. related_cpus was empty until this point, and orring or setting it with ->cpus will result in the same output. I was under the impression that the CPUfreq drivers were expected to fill in related_cpus and the or-ing was a safety net. If that's not the case, then this change is fine. Secondly, this is what we always wanted. related_cpus should contain the mask of all possible CPUs for that cluster. I think the confusion was that I thought the drivers are supposed to do this. If this doesn't mess up other CPUfreq drivers that I'm not familiar with, then I don't have concerns. Can you still explain the why in the commit text though? If it's just that related_cpus is always empty and copying is more efficient than or-ing, then mention that? Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] cpufreq: create cpu/cpufreq/policyX directories
On 10/12/2015 08:39 PM, Viresh Kumar wrote: On 12-10-15, 12:31, Saravana Kannan wrote: Can we use the first CPU in the related CPUs mask? Instead of the first CPU that the policy got created on? The policyX numbering would be a bit more consistent that way. Okay.. Suggested-by: ? Will add. Though me/Rafael thought about it long back, but then dropped the idea :) Didn't notice when this got added. Do we really need this anymore if we don't care about moving the directory and creating symlinks? I don't think we need it anymore. And if we really need to know related - offline, we can use for_each_cpu_and(related, online/present mask) Its about tracking present-cpus, for which the link is present. So, it is still required. But we don't need to track track of "present-cpus" separately though. We could do the for_each_cpu_and() when we create the symlinks for the first time. And after that, we can just use the subsystem interface callbacks (cpufreq_add_dev() and cpufreq_remove_dev()) to keep the symlinks updated. I don't see any place where keeping track of this separately is more efficient. This would save some memory savings when the number of CPUs is large and also simplify the code because we won't have to keep another field up to date. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] cpufreq: create cpu/cpufreq/policyX directories
On 10/12/2015 11:15 PM, Viresh Kumar wrote: On 12-10-15, 12:31, Saravana Kannan wrote: Can we use the first CPU in the related CPUs mask? Instead of the first CPU that the policy got created on? The policyX numbering would be a bit more consistent that way. Okay, checked this again. The problem is that ->init() isn't called yet and we are very early in the initialization sequence. So, we can't really know related_cpus yet. So I will keep it unchanged for now. Can we move the sysfs add to the end so that by the time we add sysfs, we'll have all the details? -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
On 10/12/2015 08:23 PM, Viresh Kumar wrote: On 12-10-15, 12:12, Saravana Kannan wrote: if (new_policy) { /* related_cpus should at least include policy->cpus. */ - cpumask_or(policy->related_cpus, policy->related_cpus, policy->cpus); + cpumask_copy(policy->related_cpus, policy->cpus); Again, why? It actually seems wrong. A 4 core cluster could come up with just 2 cores when the policy is added. But the related CPUs would be 4 CPUs. Firstly, the patch hasn't changed anything at all. related_cpus was empty until this point, and orring or setting it with ->cpus will result in the same output. I was under the impression that the CPUfreq drivers were expected to fill in related_cpus and the or-ing was a safety net. If that's not the case, then this change is fine. Secondly, this is what we always wanted. related_cpus should contain the mask of all possible CPUs for that cluster. I think the confusion was that I thought the drivers are supposed to do this. If this doesn't mess up other CPUfreq drivers that I'm not familiar with, then I don't have concerns. Can you still explain the why in the commit text though? If it's just that related_cpus is always empty and copying is more efficient than or-ing, then mention that? Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] cpufreq: Drop redundant check for inactive policies
On 10/11/2015 10:21 AM, Viresh Kumar wrote: We just made sure policy->cpu is online and this check will always fail as the policy is active. Drop it. Signed-off-by: Viresh Kumar --- drivers/cpufreq/cpufreq.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 58aabe0f2d2c..4fa2215cc6ec 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -843,18 +843,11 @@ static ssize_t store(struct kobject *kobj, struct attribute *attr, down_write(>rwsem); - /* Updating inactive policies is invalid, so avoid doing that. */ - if (unlikely(policy_is_inactive(policy))) { - ret = -EBUSY; - goto unlock_policy_rwsem; - } - if (fattr->store) ret = fattr->store(policy, buf, count); else ret = -EIO; -unlock_policy_rwsem: up_write(>rwsem); unlock: put_online_cpus(); Doesn't really seem related to the sysfs reorg/clean up. Should it be a separate patch outside of this series? Acked-by: Saravana Kannan -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] cpufreq: create cpu/cpufreq/policyX directories
On 10/11/2015 10:21 AM, Viresh Kumar wrote: The cpufreq sysfs interface had been a bit inconsistent as one of the CPUs for a policy had a real directory within its sysfs 'cpuX' directory and all other CPUs had links to it. That also made the code a bit complex as we need to take care of moving the sysfs directory if the CPU containing the real directory is getting physically hot-unplugged. This should actually make hotplug a bit faster too. Solve this by creating 'policyX' directories (per-policy) in /sys/devices/system/cpu/cpufreq/ directory, where X is the CPU for which the policy was first created. Can we use the first CPU in the related CPUs mask? Instead of the first CPU that the policy got created on? The policyX numbering would be a bit more consistent that way. This also removes the need of keeping kobj_cpu and we can remove it now. Signed-off-by: Viresh Kumar Suggested-by: ? --- drivers/cpufreq/cpufreq.c | 34 -- include/linux/cpufreq.h | 1 - 2 files changed, 4 insertions(+), 31 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 2cb0e3b8170e..58aabe0f2d2c 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -917,9 +917,6 @@ static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy) /* Some related CPUs might not be present (physically hotplugged) */ for_each_cpu(j, policy->real_cpus) { Didn't notice when this got added. Do we really need this anymore if we don't care about moving the directory and creating symlinks? I don't think we need it anymore. And if we really need to know related - offline, we can use for_each_cpu_and(related, online/present mask) - if (j == policy->kobj_cpu) - continue; - ret = add_cpu_dev_symlink(policy, j); if (ret) break; @@ -933,12 +930,8 @@ static void cpufreq_remove_dev_symlink(struct cpufreq_policy *policy) unsigned int j; /* Some related CPUs might not be present (physically hotplugged) */ - for_each_cpu(j, policy->real_cpus) { - if (j == policy->kobj_cpu) - continue; - + for_each_cpu(j, policy->real_cpus) remove_cpu_dev_symlink(policy, j); - } } static int cpufreq_add_dev_interface(struct cpufreq_policy *policy) @@ -1054,8 +1047,8 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) if (!zalloc_cpumask_var(>real_cpus, GFP_KERNEL)) goto err_free_rcpumask; - ret = kobject_init_and_add(>kobj, _cpufreq, >kobj, - "cpufreq"); + ret = kobject_init_and_add(>kobj, _cpufreq, + cpufreq_global_kobject, "policy%u", cpu); if (ret) { pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret); goto err_free_real_cpus; @@ -1069,10 +1062,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) INIT_WORK(>update, handle_update); policy->cpu = cpu; - - /* Set this once on allocation */ - policy->kobj_cpu = cpu; - return policy; err_free_real_cpus: @@ -1424,22 +1413,7 @@ static void cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) return; } - if (cpu != policy->kobj_cpu) { - remove_cpu_dev_symlink(policy, cpu); - } else { - /* -* The CPU owning the policy object is going away. Move it to -* another suitable CPU. -*/ - unsigned int new_cpu = cpumask_first(policy->real_cpus); - struct device *new_dev = get_cpu_device(new_cpu); - - dev_dbg(dev, "%s: Moving policy object to CPU%u\n", __func__, new_cpu); - - sysfs_remove_link(_dev->kobj, "cpufreq"); - policy->kobj_cpu = new_cpu; - WARN_ON(kobject_move(>kobj, _dev->kobj)); - } + remove_cpu_dev_symlink(policy, cpu); } static void handle_update(struct work_struct *work) diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 9623218d996a..ef4c5b1a860f 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -65,7 +65,6 @@ struct cpufreq_policy { unsigned intshared_type; /* ACPI: ANY or ALL affected CPUs should set cpufreq */ unsigned intcpu;/* cpu managing this policy, must be online */ - unsigned intkobj_cpu; /* cpu managing sysfs files, can be offline */ struct clk *clk; struct cpufreq_cpuinfo cpuinfo;/* see above */ -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe
Re: [PATCH 1/5] cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
On 10/11/2015 10:21 AM, Viresh Kumar wrote: Signed-off-by: Viresh Kumar The commit text should explain the why you are doing this. --- drivers/cpufreq/cpufreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 25c4c15103a0..b32521432db4 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1221,7 +1221,7 @@ static int cpufreq_online(unsigned int cpu) if (new_policy) { /* related_cpus should at least include policy->cpus. */ - cpumask_or(policy->related_cpus, policy->related_cpus, policy->cpus); + cpumask_copy(policy->related_cpus, policy->cpus); Again, why? It actually seems wrong. A 4 core cluster could come up with just 2 cores when the policy is added. But the related CPUs would be 4 CPUs. /* Remember CPUs present at the policy creation time. */ cpumask_and(policy->real_cpus, policy->cpus, cpu_present_mask); } -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
On 10/11/2015 10:21 AM, Viresh Kumar wrote: Signed-off-by: Viresh KumarThe commit text should explain the why you are doing this. --- drivers/cpufreq/cpufreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 25c4c15103a0..b32521432db4 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1221,7 +1221,7 @@ static int cpufreq_online(unsigned int cpu) if (new_policy) { /* related_cpus should at least include policy->cpus. */ - cpumask_or(policy->related_cpus, policy->related_cpus, policy->cpus); + cpumask_copy(policy->related_cpus, policy->cpus); Again, why? It actually seems wrong. A 4 core cluster could come up with just 2 cores when the policy is added. But the related CPUs would be 4 CPUs. /* Remember CPUs present at the policy creation time. */ cpumask_and(policy->real_cpus, policy->cpus, cpu_present_mask); } -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] cpufreq: Drop redundant check for inactive policies
On 10/11/2015 10:21 AM, Viresh Kumar wrote: We just made sure policy->cpu is online and this check will always fail as the policy is active. Drop it. Signed-off-by: Viresh Kumar <viresh.ku...@linaro.org> --- drivers/cpufreq/cpufreq.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 58aabe0f2d2c..4fa2215cc6ec 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -843,18 +843,11 @@ static ssize_t store(struct kobject *kobj, struct attribute *attr, down_write(>rwsem); - /* Updating inactive policies is invalid, so avoid doing that. */ - if (unlikely(policy_is_inactive(policy))) { - ret = -EBUSY; - goto unlock_policy_rwsem; - } - if (fattr->store) ret = fattr->store(policy, buf, count); else ret = -EIO; -unlock_policy_rwsem: up_write(>rwsem); unlock: put_online_cpus(); Doesn't really seem related to the sysfs reorg/clean up. Should it be a separate patch outside of this series? Acked-by: Saravana Kannan <skan...@codeaurora.org> -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] cpufreq: create cpu/cpufreq/policyX directories
On 10/11/2015 10:21 AM, Viresh Kumar wrote: The cpufreq sysfs interface had been a bit inconsistent as one of the CPUs for a policy had a real directory within its sysfs 'cpuX' directory and all other CPUs had links to it. That also made the code a bit complex as we need to take care of moving the sysfs directory if the CPU containing the real directory is getting physically hot-unplugged. This should actually make hotplug a bit faster too. Solve this by creating 'policyX' directories (per-policy) in /sys/devices/system/cpu/cpufreq/ directory, where X is the CPU for which the policy was first created. Can we use the first CPU in the related CPUs mask? Instead of the first CPU that the policy got created on? The policyX numbering would be a bit more consistent that way. This also removes the need of keeping kobj_cpu and we can remove it now. Signed-off-by: Viresh KumarSuggested-by: ? --- drivers/cpufreq/cpufreq.c | 34 -- include/linux/cpufreq.h | 1 - 2 files changed, 4 insertions(+), 31 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 2cb0e3b8170e..58aabe0f2d2c 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -917,9 +917,6 @@ static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy) /* Some related CPUs might not be present (physically hotplugged) */ for_each_cpu(j, policy->real_cpus) { Didn't notice when this got added. Do we really need this anymore if we don't care about moving the directory and creating symlinks? I don't think we need it anymore. And if we really need to know related - offline, we can use for_each_cpu_and(related, online/present mask) - if (j == policy->kobj_cpu) - continue; - ret = add_cpu_dev_symlink(policy, j); if (ret) break; @@ -933,12 +930,8 @@ static void cpufreq_remove_dev_symlink(struct cpufreq_policy *policy) unsigned int j; /* Some related CPUs might not be present (physically hotplugged) */ - for_each_cpu(j, policy->real_cpus) { - if (j == policy->kobj_cpu) - continue; - + for_each_cpu(j, policy->real_cpus) remove_cpu_dev_symlink(policy, j); - } } static int cpufreq_add_dev_interface(struct cpufreq_policy *policy) @@ -1054,8 +1047,8 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) if (!zalloc_cpumask_var(>real_cpus, GFP_KERNEL)) goto err_free_rcpumask; - ret = kobject_init_and_add(>kobj, _cpufreq, >kobj, - "cpufreq"); + ret = kobject_init_and_add(>kobj, _cpufreq, + cpufreq_global_kobject, "policy%u", cpu); if (ret) { pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret); goto err_free_real_cpus; @@ -1069,10 +1062,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) INIT_WORK(>update, handle_update); policy->cpu = cpu; - - /* Set this once on allocation */ - policy->kobj_cpu = cpu; - return policy; err_free_real_cpus: @@ -1424,22 +1413,7 @@ static void cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) return; } - if (cpu != policy->kobj_cpu) { - remove_cpu_dev_symlink(policy, cpu); - } else { - /* -* The CPU owning the policy object is going away. Move it to -* another suitable CPU. -*/ - unsigned int new_cpu = cpumask_first(policy->real_cpus); - struct device *new_dev = get_cpu_device(new_cpu); - - dev_dbg(dev, "%s: Moving policy object to CPU%u\n", __func__, new_cpu); - - sysfs_remove_link(_dev->kobj, "cpufreq"); - policy->kobj_cpu = new_cpu; - WARN_ON(kobject_move(>kobj, _dev->kobj)); - } + remove_cpu_dev_symlink(policy, cpu); } static void handle_update(struct work_struct *work) diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 9623218d996a..ef4c5b1a860f 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -65,7 +65,6 @@ struct cpufreq_policy { unsigned intshared_type; /* ACPI: ANY or ALL affected CPUs should set cpufreq */ unsigned intcpu;/* cpu managing this policy, must be online */ - unsigned intkobj_cpu; /* cpu managing sysfs files, can be offline */ struct clk *clk; struct cpufreq_cpuinfo cpuinfo;/* see above */ -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative
Re: [PATCH v2 RESEND/RFC] timer: make deferrable cpu unbound timers really not bound to a cpu
On 03/19/2015 05:10 PM, Saravana Kannan wrote: On 09/23/2014 11:33 AM, Thomas Gleixner wrote: On Mon, 15 Sep 2014, Joonwoo Park wrote: +#ifdef CONFIG_SMP +static struct tvec_base *tvec_base_deferral = _tvec_bases; +#endif In principle I like the idea of a deferrable wheel, but this implementation is going to go nowhere. Hi Thomas, To give some more context: This bug is a serious pain in the a** for anyone using deferrable timers or deferrable workqueues today for some periodic work and don't care for which CPU the code runs in. Couple of examples of such issues in existing code: 1) In a SMP system, CPUfreq governors (ondemand and conservative) end up queueing a deferrable work on every single CPU and the first one to run the deferrable workqueue cancels the work on all other CPUS, runs the work and then sets up a workqueue on all the CPUs again for the next sampling point. 2) Devfreq actually doesn't work well today because it doesn't do this nasty hack like CPUfreq. So, if the devfreq work happens to run on a CPU that's typically idle, then it doesn't run for a long time. I've actually seen logs where the devfreq polling interval is set to 20ms, but ends up running 800ms later. Don't know how many other drivers are suffering from this bug. Yes, IMHO, this is a bug. When the timer is CPU unbound, anyone who hasn't looked at the actual timer code would assume that it'll run as long as any CPU is busy. First of all making it SMP only is silly. The deferrable stuff is a pain in other places as well. But whats way worse is: +static inline void __run_timers(struct tvec_base *base, bool try) { struct timer_list *timer; -spin_lock_irq(>lock); +if (!try) +spin_lock_irq(>lock); +else if (!spin_trylock_irq(>lock)) +return; Yuck. All cpus fighting about a single spinlock roughly at the same time? You just created a proper thundering herd cacheline bouncing issue. I agree, This is not good. I think a simple way to fix this is to have only the CPU that's the jiffy owner to run through this deferrable timer list. That should address any unnecessary cache bouncing issues. Would that be an acceptable solution to this? No way. We have already mechanisms in place to deal with such problems, you just have to use them. I'm not sure which problem you are referring to here. Or what the already existing solutions are. I don't think you were referring to the "deferrable timer getting delayed for long periods despite CPUs being active" problem, because I don't think we have a better solution than this patch (with the jiffy owner CPU fix). Asking every driver that doesn't care which CPU the work runs on to queue a work or set up a timer on every CPU is definitely not a good solution -- that's spreading the complexity to every other driver instead of fixing it in one place. And that causes unnecessary cache pollution too. Thoughts? I would really like to see a fix for this go in soon so that we can simplify cpufreq and have devfreq and other drivers work correctly. Thanks, Saravana Thomas, Bump. -Saravana -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 RESEND/RFC] timer: make deferrable cpu unbound timers really not bound to a cpu
On 03/19/2015 05:10 PM, Saravana Kannan wrote: On 09/23/2014 11:33 AM, Thomas Gleixner wrote: On Mon, 15 Sep 2014, Joonwoo Park wrote: +#ifdef CONFIG_SMP +static struct tvec_base *tvec_base_deferral = boot_tvec_bases; +#endif In principle I like the idea of a deferrable wheel, but this implementation is going to go nowhere. Hi Thomas, To give some more context: This bug is a serious pain in the a** for anyone using deferrable timers or deferrable workqueues today for some periodic work and don't care for which CPU the code runs in. Couple of examples of such issues in existing code: 1) In a SMP system, CPUfreq governors (ondemand and conservative) end up queueing a deferrable work on every single CPU and the first one to run the deferrable workqueue cancels the work on all other CPUS, runs the work and then sets up a workqueue on all the CPUs again for the next sampling point. 2) Devfreq actually doesn't work well today because it doesn't do this nasty hack like CPUfreq. So, if the devfreq work happens to run on a CPU that's typically idle, then it doesn't run for a long time. I've actually seen logs where the devfreq polling interval is set to 20ms, but ends up running 800ms later. Don't know how many other drivers are suffering from this bug. Yes, IMHO, this is a bug. When the timer is CPU unbound, anyone who hasn't looked at the actual timer code would assume that it'll run as long as any CPU is busy. First of all making it SMP only is silly. The deferrable stuff is a pain in other places as well. But whats way worse is: +static inline void __run_timers(struct tvec_base *base, bool try) { struct timer_list *timer; -spin_lock_irq(base-lock); +if (!try) +spin_lock_irq(base-lock); +else if (!spin_trylock_irq(base-lock)) +return; Yuck. All cpus fighting about a single spinlock roughly at the same time? You just created a proper thundering herd cacheline bouncing issue. I agree, This is not good. I think a simple way to fix this is to have only the CPU that's the jiffy owner to run through this deferrable timer list. That should address any unnecessary cache bouncing issues. Would that be an acceptable solution to this? No way. We have already mechanisms in place to deal with such problems, you just have to use them. I'm not sure which problem you are referring to here. Or what the already existing solutions are. I don't think you were referring to the deferrable timer getting delayed for long periods despite CPUs being active problem, because I don't think we have a better solution than this patch (with the jiffy owner CPU fix). Asking every driver that doesn't care which CPU the work runs on to queue a work or set up a timer on every CPU is definitely not a good solution -- that's spreading the complexity to every other driver instead of fixing it in one place. And that causes unnecessary cache pollution too. Thoughts? I would really like to see a fix for this go in soon so that we can simplify cpufreq and have devfreq and other drivers work correctly. Thanks, Saravana Thomas, Bump. -Saravana -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [PATCH v2 RESEND/RFC] timer: make deferrable cpu unbound timers really not bound to a cpu
On 09/23/2014 11:33 AM, Thomas Gleixner wrote: On Mon, 15 Sep 2014, Joonwoo Park wrote: +#ifdef CONFIG_SMP +static struct tvec_base *tvec_base_deferral = _tvec_bases; +#endif In principle I like the idea of a deferrable wheel, but this implementation is going to go nowhere. Hi Thomas, To give some more context: This bug is a serious pain in the a** for anyone using deferrable timers or deferrable workqueues today for some periodic work and don't care for which CPU the code runs in. Couple of examples of such issues in existing code: 1) In a SMP system, CPUfreq governors (ondemand and conservative) end up queueing a deferrable work on every single CPU and the first one to run the deferrable workqueue cancels the work on all other CPUS, runs the work and then sets up a workqueue on all the CPUs again for the next sampling point. 2) Devfreq actually doesn't work well today because it doesn't do this nasty hack like CPUfreq. So, if the devfreq work happens to run on a CPU that's typically idle, then it doesn't run for a long time. I've actually seen logs where the devfreq polling interval is set to 20ms, but ends up running 800ms later. Don't know how many other drivers are suffering from this bug. Yes, IMHO, this is a bug. When the timer is CPU unbound, anyone who hasn't looked at the actual timer code would assume that it'll run as long as any CPU is busy. First of all making it SMP only is silly. The deferrable stuff is a pain in other places as well. But whats way worse is: +static inline void __run_timers(struct tvec_base *base, bool try) { struct timer_list *timer; - spin_lock_irq(>lock); + if (!try) + spin_lock_irq(>lock); + else if (!spin_trylock_irq(>lock)) + return; Yuck. All cpus fighting about a single spinlock roughly at the same time? You just created a proper thundering herd cacheline bouncing issue. I agree, This is not good. I think a simple way to fix this is to have only the CPU that's the jiffy owner to run through this deferrable timer list. That should address any unnecessary cache bouncing issues. Would that be an acceptable solution to this? No way. We have already mechanisms in place to deal with such problems, you just have to use them. I'm not sure which problem you are referring to here. Or what the already existing solutions are. I don't think you were referring to the "deferrable timer getting delayed for long periods despite CPUs being active" problem, because I don't think we have a better solution than this patch (with the jiffy owner CPU fix). Asking every driver that doesn't care which CPU the work runs on to queue a work or set up a timer on every CPU is definitely not a good solution -- that's spreading the complexity to every other driver instead of fixing it in one place. And that causes unnecessary cache pollution too. Thoughts? I would really like to see a fix for this go in soon so that we can simplify cpufreq and have devfreq and other drivers work correctly. Thanks, Saravana -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [PATCH v2 RESEND/RFC] timer: make deferrable cpu unbound timers really not bound to a cpu
On 09/23/2014 11:33 AM, Thomas Gleixner wrote: On Mon, 15 Sep 2014, Joonwoo Park wrote: +#ifdef CONFIG_SMP +static struct tvec_base *tvec_base_deferral = boot_tvec_bases; +#endif In principle I like the idea of a deferrable wheel, but this implementation is going to go nowhere. Hi Thomas, To give some more context: This bug is a serious pain in the a** for anyone using deferrable timers or deferrable workqueues today for some periodic work and don't care for which CPU the code runs in. Couple of examples of such issues in existing code: 1) In a SMP system, CPUfreq governors (ondemand and conservative) end up queueing a deferrable work on every single CPU and the first one to run the deferrable workqueue cancels the work on all other CPUS, runs the work and then sets up a workqueue on all the CPUs again for the next sampling point. 2) Devfreq actually doesn't work well today because it doesn't do this nasty hack like CPUfreq. So, if the devfreq work happens to run on a CPU that's typically idle, then it doesn't run for a long time. I've actually seen logs where the devfreq polling interval is set to 20ms, but ends up running 800ms later. Don't know how many other drivers are suffering from this bug. Yes, IMHO, this is a bug. When the timer is CPU unbound, anyone who hasn't looked at the actual timer code would assume that it'll run as long as any CPU is busy. First of all making it SMP only is silly. The deferrable stuff is a pain in other places as well. But whats way worse is: +static inline void __run_timers(struct tvec_base *base, bool try) { struct timer_list *timer; - spin_lock_irq(base-lock); + if (!try) + spin_lock_irq(base-lock); + else if (!spin_trylock_irq(base-lock)) + return; Yuck. All cpus fighting about a single spinlock roughly at the same time? You just created a proper thundering herd cacheline bouncing issue. I agree, This is not good. I think a simple way to fix this is to have only the CPU that's the jiffy owner to run through this deferrable timer list. That should address any unnecessary cache bouncing issues. Would that be an acceptable solution to this? No way. We have already mechanisms in place to deal with such problems, you just have to use them. I'm not sure which problem you are referring to here. Or what the already existing solutions are. I don't think you were referring to the deferrable timer getting delayed for long periods despite CPUs being active problem, because I don't think we have a better solution than this patch (with the jiffy owner CPU fix). Asking every driver that doesn't care which CPU the work runs on to queue a work or set up a timer on every CPU is definitely not a good solution -- that's spreading the complexity to every other driver instead of fixing it in one place. And that causes unnecessary cache pollution too. Thoughts? I would really like to see a fix for this go in soon so that we can simplify cpufreq and have devfreq and other drivers work correctly. Thanks, Saravana -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] cpufreq, fix locking around CPUFREQ_GOV_POLICY_EXIT calls
On 11/11/2014 05:07 AM, Viresh Kumar wrote: On 11 November 2014 17:45, Prarit Bhargava wrote: the deadlock in commit 955ef4833574636819cd269cfbae12f79cbde63a [ 75.471265]CPU0CPU1 [ 75.476327] [ 75.481385] lock(>rwsem); [ 75.485307]lock(s_active#219); [ 75.491857]lock(>rwsem); [ 75.498592] lock(s_active#219); [ 75.502331] [ 75.502331] *** DEADLOCK *** I wanted to understand how this deadlock is prevented by a simple change to trylock.. And also your changelog talks about accessing invalid pointers without the trylock change, how can that be possible? After the read lock is taken, all the pointers should be valid. consider the following very simple case: the governor is ondemand. cpu 0 reads cpuinfo_cur_freq. cpu0 expects to get the current cpu freq for the ondemand governor. Name it A. simultaneously, cpu1 changes the governor from ondemand to userspace. Name it B. the two threads will race for the policy->mutex suppose cpu0 gets it first. then there is no problem. the userspace program for cpu0 gets exactly the data it is expecting. Now suppose cpu1 gets the lock and starts to write ... cpu0 is blocked. cpu1 completes the governor change, and cpu0 gets the mutex ... and returns bogus data at this point. What do you mean by bogus here? That userspace wouldn't be able to know if the value is for which governor? If that's the case than it can still happen. Issue both above commands at almost the same time. You will never be able to differentiate if the sequence is: - A followed by B - B followed by A - A waited for B and so returned -EBUSY (Only this will be clear) And the value read can still be bogus. So, we haven't solved the problem at all. Ah, we are on this topic again I see. I didn't read the patch/thread fully, but I can guess where this is going by reading the partial set of patches. Prarit, You can't just try lock to avoid the deadlock. If you do, then the userspace API becomes a mess. Writes to scaling_governor (or anything else) will no longer by guaranteed to work. Userspace will have to read back, check and retry. That would break a ton of existing userpace scripts. Viresh, The deadlock scenario is read. That's why the code is what it is today. All, IMO, the right way to fix this is to have the governor have over it's list of attributes it want to expose thru sysfs to the cpufreq framework. Then the framework can add/remove this in the right order when the governors are changed. The framework can do this outside of the policy lock being held when the governors are switched. This would allow avoid the original deadlock between sysfs locks and the policy lock without just ever having to fail userspace writes to scaling_governor. -Saravana -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] cpufreq, fix locking around CPUFREQ_GOV_POLICY_EXIT calls
On 11/11/2014 05:07 AM, Viresh Kumar wrote: On 11 November 2014 17:45, Prarit Bhargava pra...@redhat.com wrote: the deadlock in commit 955ef4833574636819cd269cfbae12f79cbde63a [ 75.471265]CPU0CPU1 [ 75.476327] [ 75.481385] lock(policy-rwsem); [ 75.485307]lock(s_active#219); [ 75.491857]lock(policy-rwsem); [ 75.498592] lock(s_active#219); [ 75.502331] [ 75.502331] *** DEADLOCK *** I wanted to understand how this deadlock is prevented by a simple change to trylock.. And also your changelog talks about accessing invalid pointers without the trylock change, how can that be possible? After the read lock is taken, all the pointers should be valid. consider the following very simple case: the governor is ondemand. cpu 0 reads cpuinfo_cur_freq. cpu0 expects to get the current cpu freq for the ondemand governor. Name it A. simultaneously, cpu1 changes the governor from ondemand to userspace. Name it B. the two threads will race for the policy-mutex suppose cpu0 gets it first. then there is no problem. the userspace program for cpu0 gets exactly the data it is expecting. Now suppose cpu1 gets the lock and starts to write ... cpu0 is blocked. cpu1 completes the governor change, and cpu0 gets the mutex ... and returns bogus data at this point. What do you mean by bogus here? That userspace wouldn't be able to know if the value is for which governor? If that's the case than it can still happen. Issue both above commands at almost the same time. You will never be able to differentiate if the sequence is: - A followed by B - B followed by A - A waited for B and so returned -EBUSY (Only this will be clear) And the value read can still be bogus. So, we haven't solved the problem at all. Ah, we are on this topic again I see. I didn't read the patch/thread fully, but I can guess where this is going by reading the partial set of patches. Prarit, You can't just try lock to avoid the deadlock. If you do, then the userspace API becomes a mess. Writes to scaling_governor (or anything else) will no longer by guaranteed to work. Userspace will have to read back, check and retry. That would break a ton of existing userpace scripts. Viresh, The deadlock scenario is read. That's why the code is what it is today. All, IMO, the right way to fix this is to have the governor have over it's list of attributes it want to expose thru sysfs to the cpufreq framework. Then the framework can add/remove this in the right order when the governors are changed. The framework can do this outside of the policy lock being held when the governors are switched. This would allow avoid the original deadlock between sysfs locks and the policy lock without just ever having to fail userspace writes to scaling_governor. -Saravana -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 0/5] Simplify hotplug/suspend handling
On 10/16/2014 01:53 AM, Viresh Kumar wrote: On 25 July 2014 06:37, Saravana Kannan wrote: Series of patchs to simplify policy/sysfs/kobj/locking handling across suspend/resume The following have been tested so far on a 2x2 cluster environment: - Boot with 2 cpus and no cpufreq driver. - mod probe driver and see cpufreq sysfs files show up only for the 1st cluster. - Online the rest of the 2 CPUs and have files show up correctly. - rmmod the driver and see the files go away. - modprobe again (or back and forth multiples times) and see it work. - suspend/resume works as expected. - When a cluster is offline, all read/writes to its sysfs files return an error v4 - Split it up into smaller patches - Will handle physical CPU removal correctly - Fixed earlier mistake of deleting code under !recover_policy - Dropped some code refactor that reuses a lot of code between add/remove - Dropped fix for exiting hotplug race with cpufreq driver probe/rmmod - Dropped changes will come later once this series is acked. Hi Saravana, Any updates on this? We might need some of this soon or should somebody else start working on this ? Hey, Sorry for the delay. Got side tracked with some commercial stuff. I'm still invested in finishing this up. I'll try to send out something within a week. I did notice (didn't read mych) the "Locking issues with cpufreq and sysfs" thread. I think my patches should side step most of it. Thanks, Saravana -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 0/5] Simplify hotplug/suspend handling
On 10/16/2014 01:53 AM, Viresh Kumar wrote: On 25 July 2014 06:37, Saravana Kannan skan...@codeaurora.org wrote: Series of patchs to simplify policy/sysfs/kobj/locking handling across suspend/resume The following have been tested so far on a 2x2 cluster environment: - Boot with 2 cpus and no cpufreq driver. - mod probe driver and see cpufreq sysfs files show up only for the 1st cluster. - Online the rest of the 2 CPUs and have files show up correctly. - rmmod the driver and see the files go away. - modprobe again (or back and forth multiples times) and see it work. - suspend/resume works as expected. - When a cluster is offline, all read/writes to its sysfs files return an error v4 - Split it up into smaller patches - Will handle physical CPU removal correctly - Fixed earlier mistake of deleting code under !recover_policy - Dropped some code refactor that reuses a lot of code between add/remove - Dropped fix for exiting hotplug race with cpufreq driver probe/rmmod - Dropped changes will come later once this series is acked. Hi Saravana, Any updates on this? We might need some of this soon or should somebody else start working on this ? Hey, Sorry for the delay. Got side tracked with some commercial stuff. I'm still invested in finishing this up. I'll try to send out something within a week. I did notice (didn't read mych) the Locking issues with cpufreq and sysfs thread. I think my patches should side step most of it. Thanks, Saravana -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/