Re: [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK
Oleg Drokin writes: >>> The second patch that I am not sure if we wnat, but it seems to be useful >>> until struct cpumask is fully dynamic is to convert what looks like >>> whole-set operations e.g. copies, namely: >>> cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS >>> bits to ensure there's no stale garbage left in the mask should the >>> cpu count increases later. >> You can't do this, because dynamically allocated cpumasks don't have >> NR_CPUS bits. > > Well, right now they certainly have. As in, it's a static define and we > allocate > a bitmap to fit the (in my case) up to 8192 bits into such off-stack masks. You're right, we should fix that properly. Right now, cpumask_size() has: /* FIXME: Once all cpumask assignments are eliminated, this * can be nr_cpumask_bits */ return BITS_TO_LONGS(NR_CPUS) * sizeof(long); >> Let's just kill all the cpus_ functions. This wasn't done originally >> because archs which didn't care about offline cpumasks didn't want the >> churn. In particular, they must not copy struct cpumask by assignment, >> and fixing those is a fair bit of churn. > > Ah, copy masks by assignment, I see. > Well, I guess we can eliminate the users outside of the affected arch trees > (I assume in x86 there would be no objections?) and perhaps add a warning to > checkpatch.pl? OK... I have done a sweep. It's not *that* bad with spatch. I'm going to remove all the old functions. Expect a series RSN. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK
Oleg Drokin gr...@linuxhacker.ru writes: The second patch that I am not sure if we wnat, but it seems to be useful until struct cpumask is fully dynamic is to convert what looks like whole-set operations e.g. copies, namely: cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS bits to ensure there's no stale garbage left in the mask should the cpu count increases later. You can't do this, because dynamically allocated cpumasks don't have NR_CPUS bits. Well, right now they certainly have. As in, it's a static define and we allocate a bitmap to fit the (in my case) up to 8192 bits into such off-stack masks. You're right, we should fix that properly. Right now, cpumask_size() has: /* FIXME: Once all cpumask assignments are eliminated, this * can be nr_cpumask_bits */ return BITS_TO_LONGS(NR_CPUS) * sizeof(long); Let's just kill all the cpus_ functions. This wasn't done originally because archs which didn't care about offline cpumasks didn't want the churn. In particular, they must not copy struct cpumask by assignment, and fixing those is a fair bit of churn. Ah, copy masks by assignment, I see. Well, I guess we can eliminate the users outside of the affected arch trees (I assume in x86 there would be no objections?) and perhaps add a warning to checkpatch.pl? OK... I have done a sweep. It's not *that* bad with spatch. I'm going to remove all the old functions. Expect a series RSN. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK
Hello! On Feb 27, 2015, at 6:46 AM, Rusty Russell wrote: > gr...@linuxhacker.ru writes: >> From: Oleg Drokin >> >> I just got a report today from Tyson Whitehead >> that Lustre crashes when CPUMASK_OFFSTACK is enabled. >> >> A little investigation revealed that this code: >>cpumask_t mask; >> ... >>cpumask_copy(, topology_thread_cpumask(0)); >>weight = cpus_weight(mask); > Yes. cpumask_weight should have been used here. The old cpus_* are > deprecated. Oh! I guess we missed the announcement. I'll convert it over. Should I also do a patch converting all other users and removing the deprecated functions while I am at it? >> The second patch that I am not sure if we wnat, but it seems to be useful >> until struct cpumask is fully dynamic is to convert what looks like >> whole-set operations e.g. copies, namely: >> cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS >> bits to ensure there's no stale garbage left in the mask should the >> cpu count increases later. > You can't do this, because dynamically allocated cpumasks don't have > NR_CPUS bits. Well, right now they certainly have. As in, it's a static define and we allocate a bitmap to fit the (in my case) up to 8192 bits into such off-stack masks. The concern is since number of cpus is not really a fixed variable, when you do cpumask initialization, and then number of cpus grows, the end of the mask could be garbage? Am I overthinking this and it's not really a problem? > Let's just kill all the cpus_ functions. This wasn't done originally > because archs which didn't care about offline cpumasks didn't want the > churn. In particular, they must not copy struct cpumask by assignment, > and fixing those is a fair bit of churn. Ah, copy masks by assignment, I see. Well, I guess we can eliminate the users outside of the affected arch trees (I assume in x86 there would be no objections?) and perhaps add a warning to checkpatch.pl? Thanks! Bye, Oleg-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK
gr...@linuxhacker.ru writes: > From: Oleg Drokin > > I just got a report today from Tyson Whitehead > that Lustre crashes when CPUMASK_OFFSTACK is enabled. > > A little investigation revealed that this code: > cpumask_t mask; > ... > cpumask_copy(, topology_thread_cpumask(0)); > weight = cpus_weight(mask); Yes. cpumask_weight should have been used here. The old cpus_* are deprecated. > The second patch that I am not sure if we wnat, but it seems to be useful > until struct cpumask is fully dynamic is to convert what looks like > whole-set operations e.g. copies, namely: > cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS > bits to ensure there's no stale garbage left in the mask should the > cpu count increases later. You can't do this, because dynamically allocated cpumasks don't have NR_CPUS bits. Let's just kill all the cpus_ functions. This wasn't done originally because archs which didn't care about offline cpumasks didn't want the churn. In particular, they must not copy struct cpumask by assignment, and fixing those is a fair bit of churn. The following is the minimal fix: Cheers, Rusty. CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS: set if CPUMASK_OFFSTACK. Using these functions with offstack cpus is unsafe. They use all NR_CPUS bits, unstead of nr_cpumask_bits. Signed-off-by: Rusty Russell diff --git a/lib/Kconfig b/lib/Kconfig index 87da53bb1fef..51b4210f3da9 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -398,8 +398,7 @@ config CPUMASK_OFFSTACK stack overflow. config DISABLE_OBSOLETE_CPUMASK_FUNCTIONS - bool "Disable obsolete cpumask functions" if DEBUG_PER_CPU_MAPS - depends on BROKEN + bool "Disable obsolete cpumask functions" if CPUMASK_OFFSTACK config CPU_RMAP bool -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK
Hello! On Feb 27, 2015, at 6:46 AM, Rusty Russell wrote: gr...@linuxhacker.ru writes: From: Oleg Drokin gr...@linuxhacker.ru I just got a report today from Tyson Whitehead twhiteh...@gmail.com that Lustre crashes when CPUMASK_OFFSTACK is enabled. A little investigation revealed that this code: cpumask_t mask; ... cpumask_copy(mask, topology_thread_cpumask(0)); weight = cpus_weight(mask); Yes. cpumask_weight should have been used here. The old cpus_* are deprecated. Oh! I guess we missed the announcement. I'll convert it over. Should I also do a patch converting all other users and removing the deprecated functions while I am at it? The second patch that I am not sure if we wnat, but it seems to be useful until struct cpumask is fully dynamic is to convert what looks like whole-set operations e.g. copies, namely: cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS bits to ensure there's no stale garbage left in the mask should the cpu count increases later. You can't do this, because dynamically allocated cpumasks don't have NR_CPUS bits. Well, right now they certainly have. As in, it's a static define and we allocate a bitmap to fit the (in my case) up to 8192 bits into such off-stack masks. The concern is since number of cpus is not really a fixed variable, when you do cpumask initialization, and then number of cpus grows, the end of the mask could be garbage? Am I overthinking this and it's not really a problem? Let's just kill all the cpus_ functions. This wasn't done originally because archs which didn't care about offline cpumasks didn't want the churn. In particular, they must not copy struct cpumask by assignment, and fixing those is a fair bit of churn. Ah, copy masks by assignment, I see. Well, I guess we can eliminate the users outside of the affected arch trees (I assume in x86 there would be no objections?) and perhaps add a warning to checkpatch.pl? Thanks! Bye, Oleg-- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK
gr...@linuxhacker.ru writes: From: Oleg Drokin gr...@linuxhacker.ru I just got a report today from Tyson Whitehead twhiteh...@gmail.com that Lustre crashes when CPUMASK_OFFSTACK is enabled. A little investigation revealed that this code: cpumask_t mask; ... cpumask_copy(mask, topology_thread_cpumask(0)); weight = cpus_weight(mask); Yes. cpumask_weight should have been used here. The old cpus_* are deprecated. The second patch that I am not sure if we wnat, but it seems to be useful until struct cpumask is fully dynamic is to convert what looks like whole-set operations e.g. copies, namely: cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS bits to ensure there's no stale garbage left in the mask should the cpu count increases later. You can't do this, because dynamically allocated cpumasks don't have NR_CPUS bits. Let's just kill all the cpus_ functions. This wasn't done originally because archs which didn't care about offline cpumasks didn't want the churn. In particular, they must not copy struct cpumask by assignment, and fixing those is a fair bit of churn. The following is the minimal fix: Cheers, Rusty. CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS: set if CPUMASK_OFFSTACK. Using these functions with offstack cpus is unsafe. They use all NR_CPUS bits, unstead of nr_cpumask_bits. Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/lib/Kconfig b/lib/Kconfig index 87da53bb1fef..51b4210f3da9 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -398,8 +398,7 @@ config CPUMASK_OFFSTACK stack overflow. config DISABLE_OBSOLETE_CPUMASK_FUNCTIONS - bool Disable obsolete cpumask functions if DEBUG_PER_CPU_MAPS - depends on BROKEN + bool Disable obsolete cpumask functions if CPUMASK_OFFSTACK config CPU_RMAP bool -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK
From: Oleg Drokin I just got a report today from Tyson Whitehead that Lustre crashes when CPUMASK_OFFSTACK is enabled. A little investigation revealed that this code: cpumask_t mask; ... cpumask_copy(, topology_thread_cpumask(0)); weight = cpus_weight(mask); that was supposed to calculate number of cpu siblings/partitions returns a crazy high number over 3000 which is impossible as I only have 8 cpu cores on my system. So after a bit of digging I found out that: cpumask_copy only copies up to nr_cpumask_bits (actual number of cpus I have in the system), where as cpumask_weight actully tries to count bits up to NR_CPUS. Not only calculating up to NR_CPUS is wasteful in this case, and since we know how many cpus we have in the system - it only makes sense to calculate only this much anyway, it's wrong because the copy only copied 8 bits to our variable and the rest of it is some random stack garbage. So I propose two patches here, the first one I am more certain about - operations that operate on current cpuset like cpus_weight, but also cpus_empty, cpus_$LOGICALop cpus_$BINARYop are converted from NR_CPUS to nr_cpumask_bits (this is ok when CONFIG_CPUMASK_OFFSTACK is not set as it's then defined to NR_CPUS anyway). I am leaving __cpus_setall __cpus_clear out of it as these two look like they deal with entire set and it would be useful for them to operate on all NR_CPUS bits for the case if more cpus are added later and such. The second patch that I am not sure if we wnat, but it seems to be useful until struct cpumask is fully dynamic is to convert what looks like whole-set operations e.g. copies, namely: cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS bits to ensure there's no stale garbage left in the mask should the cpu count increases later. I checked the code and allocating cpumasks on stack is not all that uncommon in the code, so this should be a worthwhile fix. Please consider. Oleg Drokin (2): cpumask: Properly calculate cpumask values cpumask: make whole cpumask operations like copy to work with NR_CPUS bits include/linux/cpumask.h | 37 + 1 file changed, 21 insertions(+), 16 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK
From: Oleg Drokin gr...@linuxhacker.ru I just got a report today from Tyson Whitehead twhiteh...@gmail.com that Lustre crashes when CPUMASK_OFFSTACK is enabled. A little investigation revealed that this code: cpumask_t mask; ... cpumask_copy(mask, topology_thread_cpumask(0)); weight = cpus_weight(mask); that was supposed to calculate number of cpu siblings/partitions returns a crazy high number over 3000 which is impossible as I only have 8 cpu cores on my system. So after a bit of digging I found out that: cpumask_copy only copies up to nr_cpumask_bits (actual number of cpus I have in the system), where as cpumask_weight actully tries to count bits up to NR_CPUS. Not only calculating up to NR_CPUS is wasteful in this case, and since we know how many cpus we have in the system - it only makes sense to calculate only this much anyway, it's wrong because the copy only copied 8 bits to our variable and the rest of it is some random stack garbage. So I propose two patches here, the first one I am more certain about - operations that operate on current cpuset like cpus_weight, but also cpus_empty, cpus_$LOGICALop cpus_$BINARYop are converted from NR_CPUS to nr_cpumask_bits (this is ok when CONFIG_CPUMASK_OFFSTACK is not set as it's then defined to NR_CPUS anyway). I am leaving __cpus_setall __cpus_clear out of it as these two look like they deal with entire set and it would be useful for them to operate on all NR_CPUS bits for the case if more cpus are added later and such. The second patch that I am not sure if we wnat, but it seems to be useful until struct cpumask is fully dynamic is to convert what looks like whole-set operations e.g. copies, namely: cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS bits to ensure there's no stale garbage left in the mask should the cpu count increases later. I checked the code and allocating cpumasks on stack is not all that uncommon in the code, so this should be a worthwhile fix. Please consider. Oleg Drokin (2): cpumask: Properly calculate cpumask values cpumask: make whole cpumask operations like copy to work with NR_CPUS bits include/linux/cpumask.h | 37 + 1 file changed, 21 insertions(+), 16 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/