Re: [PATCH 0/8] v5 De-couple sysfs memory directories from memory sections
On 08/31/2010 04:57 PM, Anton Blanchard wrote: Hi Nathan, This set of patches de-couples the idea that there is a single directory in sysfs for each memory section. The intent of the patches is to reduce the number of sysfs directories created to resolve a boot-time performance issue. On very large systems boot time are getting very long (as seen on powerpc hardware) due to the enormous number of sysfs directories being created. On a system with 1 TB of memory we create ~63,000 directories. For even larger systems boot times are being measured in hours. This set of patches allows for each directory created in sysfs to cover more than one memory section. The default behavior for sysfs directory creation is the same, in that each directory represents a single memory section. A new file 'end_phys_index' in each directory contains the physical_id of the last memory section covered by the directory so that users can easily determine the memory section range of a directory. I tested this on a POWER7 with 2TB memory and the boot time improved from greater than 6 hours (I gave up), to under 5 minutes. Nice! Thanks for testing this out. I was able to test this on a 1 TB system and saw memory sysfs creation times go from 10 minutes to a few seconds. It's good to see the difference for a 2 TB system. -Nathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v5 De-couple sysfs memory directories from memory sections
On Mon, 2010-08-16 at 09:34 -0500, Nathan Fontenot wrote: It's not an unresolvable issue, as this is a must-fix problem. But you should tell us what your proposal is to prevent breakage of existing installations. A Kconfig option would be good, but a boot-time kernel command line option which selects the new format would be much better. This shouldn't break existing installations, unless an architecture chooses to do so. With my patch only the powerpc/pseries arch is updated such that what is seen in userspace is different. Even if an arch defines the override for the sysfs dir size, I still don't think this breaks anything (it shouldn't). We move _all_ of the directories over, all at once, to a single, uniform size. The only apparent change to a user moving kernels would be a larger block_size_bytes (which is certainly not changing the ABI) and a new sysfs file for the end of the section. The new sysfs file is _completely_ redundant at this point. The architecture is only supposed to bump up the directory size when it *KNOWS* that all operations will be done at the larger section size, such as if the specific hardware has physical DIMMs which are much larger than SECTION_SIZE. Let's say we have a system with 20MB of memory, SECTION_SIZE of 1MB and a sysfs dir size of 4MB. Before the patch, we have 20 directories: one for each section. After this patch, we have 5 directories. The thing that I think is the next step, but that we _will_ probably need eventually is this, take the 5 sysfs dirs in the above case: 0-3, 4-7, 8-11, 12-15, 16-19 and turn that into a single one: 0-19 *That* will require changing the ABI, but we could certainly have some bloated and slow, but backward-compatible mode. -- Dave ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v5 De-couple sysfs memory directories from memory sections
Hi Nathan, This set of patches de-couples the idea that there is a single directory in sysfs for each memory section. The intent of the patches is to reduce the number of sysfs directories created to resolve a boot-time performance issue. On very large systems boot time are getting very long (as seen on powerpc hardware) due to the enormous number of sysfs directories being created. On a system with 1 TB of memory we create ~63,000 directories. For even larger systems boot times are being measured in hours. This set of patches allows for each directory created in sysfs to cover more than one memory section. The default behavior for sysfs directory creation is the same, in that each directory represents a single memory section. A new file 'end_phys_index' in each directory contains the physical_id of the last memory section covered by the directory so that users can easily determine the memory section range of a directory. I tested this on a POWER7 with 2TB memory and the boot time improved from greater than 6 hours (I gave up), to under 5 minutes. Nice! Tested-by: Anton Blanchard an...@samba.org Anton ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v5 De-couple sysfs memory directories from memory sections
On 08/12/2010 02:08 PM, Andrew Morton wrote: On Mon, 09 Aug 2010 12:53:00 -0500 Nathan Fontenot nf...@austin.ibm.com wrote: This set of patches de-couples the idea that there is a single directory in sysfs for each memory section. The intent of the patches is to reduce the number of sysfs directories created to resolve a boot-time performance issue. On very large systems boot time are getting very long (as seen on powerpc hardware) due to the enormous number of sysfs directories being created. On a system with 1 TB of memory we create ~63,000 directories. For even larger systems boot times are being measured in hours. And those hours are mainly due to this problem, I assume. Yes, those hours are spent creating the sysfs directories for each of the memory sections. This set of patches allows for each directory created in sysfs to cover more than one memory section. The default behavior for sysfs directory creation is the same, in that each directory represents a single memory section. A new file 'end_phys_index' in each directory contains the physical_id of the last memory section covered by the directory so that users can easily determine the memory section range of a directory. What you're proposing appears to be a non-back-compatible userspace-visible change. This is a big issue! It's not an unresolvable issue, as this is a must-fix problem. But you should tell us what your proposal is to prevent breakage of existing installations. A Kconfig option would be good, but a boot-time kernel command line option which selects the new format would be much better. This shouldn't break existing installations, unless an architecture chooses to do so. With my patch only the powerpc/pseries arch is updated such that what is seen in userspace is different. The default behavior is maintained for all architectures unless they define their own version of memory_block_size_bytes(). The default definition of this routine (defined as __weak in Patch 5/8) sets the memory block size to the same size it currently is, and thus preserving the exisitng 1 sysfs directory per memory section. The only change that will be seen is a new propery for memory section, end_phys_addr, which will have the same value as the existing 'phys_addr' property. However you didn't mention this issue at all, and it's the most important one. Updates for version 5 of the patchset include the following: Patch 4/8 Add mutex for add/remove of memory blocks - Define the mutex using DEFINE_MUTEX macro. Patch 8/8 Update memory-hotplug documentation - Add information concerning memory holes in phys_index..end_phys_index. And you forgot to tell us how long those machines boot with the patchset applied, which is the entire point of the patchset! Yes, I am working on getting more time on our large systems to get performance numbers with this patch. I'll post them when I get them. -Nathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v5 De-couple sysfs memory directories from memory sections
On Mon, 09 Aug 2010 12:53:00 -0500 Nathan Fontenot nf...@austin.ibm.com wrote: This set of patches de-couples the idea that there is a single directory in sysfs for each memory section. The intent of the patches is to reduce the number of sysfs directories created to resolve a boot-time performance issue. On very large systems boot time are getting very long (as seen on powerpc hardware) due to the enormous number of sysfs directories being created. On a system with 1 TB of memory we create ~63,000 directories. For even larger systems boot times are being measured in hours. And those hours are mainly due to this problem, I assume. This set of patches allows for each directory created in sysfs to cover more than one memory section. The default behavior for sysfs directory creation is the same, in that each directory represents a single memory section. A new file 'end_phys_index' in each directory contains the physical_id of the last memory section covered by the directory so that users can easily determine the memory section range of a directory. What you're proposing appears to be a non-back-compatible userspace-visible change. This is a big issue! It's not an unresolvable issue, as this is a must-fix problem. But you should tell us what your proposal is to prevent breakage of existing installations. A Kconfig option would be good, but a boot-time kernel command line option which selects the new format would be much better. However you didn't mention this issue at all, and it's the most important one. Updates for version 5 of the patchset include the following: Patch 4/8 Add mutex for add/remove of memory blocks - Define the mutex using DEFINE_MUTEX macro. Patch 8/8 Update memory-hotplug documentation - Add information concerning memory holes in phys_index..end_phys_index. And you forgot to tell us how long those machines boot with the patchset applied, which is the entire point of the patchset! ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v5 De-couple sysfs memory directories from memory sections
On Thu, 2010-08-12 at 12:08 -0700, Andrew Morton wrote: This set of patches allows for each directory created in sysfs to cover more than one memory section. The default behavior for sysfs directory creation is the same, in that each directory represents a single memory section. A new file 'end_phys_index' in each directory contains the physical_id of the last memory section covered by the directory so that users can easily determine the memory section range of a directory. What you're proposing appears to be a non-back-compatible userspace-visible change. This is a big issue! Nathan, one thought to get around this at the moment would be to bump up the size that we export in /sys/devices/system/memory/block_size_bytes. I think you have already done most of the hard work to accomplish this. You can still add the end_phys_index stuff. But, for now, it would always be equal to start_phys_index. -- Dave ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v5 De-couple sysfs memory directories from memory sections
On Mon, 2010-08-09 at 12:53 -0500, Nathan Fontenot wrote: This set of patches de-couples the idea that there is a single directory in sysfs for each memory section. The intent of the patches is to reduce the number of sysfs directories created to resolve a boot-time performance issue. On very large systems boot time are getting very long (as seen on powerpc hardware) due to the enormous number of sysfs directories being created. On a system with 1 TB of memory we create ~63,000 directories. For even larger systems boot times are being measured in hours. Hi Nathan, The set is looking pretty good to me. We _might_ want to up the ante in the future and allow it to be even more dynamic than this, but this looks like a good start to me. BTW, have you taken a look at what the hotplug events look like if only a single section (not filling up a whole block) is added? Feel free to add my: Acked-by: Dave Hansen d...@linux.vnet.ibm.com -- Dave ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/8] v5 De-couple sysfs memory directories from memory sections
This set of patches de-couples the idea that there is a single directory in sysfs for each memory section. The intent of the patches is to reduce the number of sysfs directories created to resolve a boot-time performance issue. On very large systems boot time are getting very long (as seen on powerpc hardware) due to the enormous number of sysfs directories being created. On a system with 1 TB of memory we create ~63,000 directories. For even larger systems boot times are being measured in hours. This set of patches allows for each directory created in sysfs to cover more than one memory section. The default behavior for sysfs directory creation is the same, in that each directory represents a single memory section. A new file 'end_phys_index' in each directory contains the physical_id of the last memory section covered by the directory so that users can easily determine the memory section range of a directory. Updates for version 5 of the patchset include the following: Patch 4/8 Add mutex for add/remove of memory blocks - Define the mutex using DEFINE_MUTEX macro. Patch 8/8 Update memory-hotplug documentation - Add information concerning memory holes in phys_index..end_phys_index. Thanks, Nathan Fontenot ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev