Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On 09/29/2010 02:37 PM, Greg KH wrote: >> > Thankfully things like rpm, hald, and other miscellaneous commands scan >> > that information. >> >> Really? Why? Why would rpm care about this? hald is dead now so we >> don't need to worry about that anymore, > > That's not what compatiblity means. We can't just support > latest-and-greatest userspace on latest-and-greatest kernels. Oh, I know that, that's not what I was getting at at all here, sorry if it came across that way. I wanted to know so we could go fix programs that are mucking around in these files, as odds are, the shouldn't be doing that in the first place. Like rpm, why would it matter what the memory in the system looks like? I see, thanks. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On Wed, Sep 29, 2010 at 02:28:30PM -0500, Robin Holt wrote: > On Tue, Sep 28, 2010 at 01:17:33PM -0500, Nathan Fontenot wrote: ... > My next task is to implement a x86_64 SGI UV specific chunk of code > to memory_block_size_bytes(). Would you consider adding that to your > patch set? I expect to have that either later today or early tomorrow. The patch is below. I left things at a u32, but I would really like it if you changed to an unsigned long and adjusted my patch for me. Thanks, Robin Subject: [Patch] Implement memory_block_size_bytes for x86_64 when CONFIG_X86_UV Nathan Fontenot has implemented a patch set for large memory configuration systems which will combine drivers/base/memory.c memory sections together into memory blocks with the default behavior being unchanged from the current behavior. In his patch set, he implements a memory_block_size_bytes() function for PPC. This is the equivalent patch for x86_64 when it has CONFIG_X86_UV set. Signed-off-by: Robin Holt Signed-off-by: Jack Steiner To: Nathan Fontenot Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Peter Anvin" Cc: lkml --- arch/x86/mm/init_64.c | 15 +++ 1 file changed, 15 insertions(+) Index: memory_block/arch/x86/mm/init_64.c === --- memory_block.orig/arch/x86/mm/init_64.c 2010-09-29 14:46:50.711824616 -0500 +++ memory_block/arch/x86/mm/init_64.c 2010-09-29 14:46:55.683997672 -0500 @@ -50,6 +50,7 @@ #include #include #include +#include #include static unsigned long dma_reserve __initdata; @@ -928,6 +929,20 @@ const char *arch_vma_name(struct vm_area return NULL; } +#ifdef CONFIG_X86_UV +#define MIN_MEMORY_BLOCK_SIZE (1 << SECTION_SIZE_BITS) + +u32 memory_block_size_bytes(void) +{ + if (is_uv_system()) { + printk("UV: memory block size 2GB\n"); + return 2UL * 1024 * 1024 * 1024; + } + return MIN_MEMORY_BLOCK_SIZE; +} +#endif + + #ifdef CONFIG_SPARSEMEM_VMEMMAP /* * Initialise the sparsemem vmemmap using huge-pages at the PMD level. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On 09/29/2010 02:28 PM, Robin Holt wrote: > On Tue, Sep 28, 2010 at 01:17:33PM -0500, Nathan Fontenot wrote: >> On 09/28/2010 07:38 AM, Robin Holt wrote: >>> I was tasked with looking at a slowdown in similar sized SGI machines >>> booting x86_64. Jack Steiner had already looked into the memory_dev_init. >>> I was looking at link_mem_sections(). >>> >>> I made a dramatic improvement on a 16TB machine in that function by >>> merely caching the most recent memory section and checking to see if >>> the next memory section happens to be the subsequent in the linked list >>> of kobjects. >>> >>> That simple cache reduced the time for link_mem_sections from 1 hour 27 >>> minutes down to 46 seconds. >> >> Nice! >> >>> >>> I would like to propose we implement something along those lines also, >>> but I am currently swamped. I can probably get you a patch tomorrow >>> afternoon that applies at the end of this set. >> >> Should this be done as a separate patch? This patch set concentrates on >> updates to the memory code with the node updates only being done due to the >> memory changes. >> >> I think its a good idea to do the caching and have no problem adding on to >> this patchset if no one else has any objections. > > I am sorry. I had meant to include you on the Cc: list. I just posted a > set of patches (3 small patches) which implement the cache most recent bit > I aluded to above. Search for a subject of "Speed up link_mem_sections > during boot" and you will find them. I did add you to the Cc: list for > the next time I end up sending the set. > > My next task is to implement a x86_64 SGI UV specific chunk of code > to memory_block_size_bytes(). Would you consider adding that to your > patch set? I expect to have that either later today or early tomorrow. > No problem. I'm putting together a new patch set with updates from all of the comments now so go ahead and send it to me when you have it ready. -Nathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On Tue, Sep 28, 2010 at 01:17:33PM -0500, Nathan Fontenot wrote: > On 09/28/2010 07:38 AM, Robin Holt wrote: > > I was tasked with looking at a slowdown in similar sized SGI machines > > booting x86_64. Jack Steiner had already looked into the memory_dev_init. > > I was looking at link_mem_sections(). > > > > I made a dramatic improvement on a 16TB machine in that function by > > merely caching the most recent memory section and checking to see if > > the next memory section happens to be the subsequent in the linked list > > of kobjects. > > > > That simple cache reduced the time for link_mem_sections from 1 hour 27 > > minutes down to 46 seconds. > > Nice! > > > > > I would like to propose we implement something along those lines also, > > but I am currently swamped. I can probably get you a patch tomorrow > > afternoon that applies at the end of this set. > > Should this be done as a separate patch? This patch set concentrates on > updates to the memory code with the node updates only being done due to the > memory changes. > > I think its a good idea to do the caching and have no problem adding on to > this patchset if no one else has any objections. I am sorry. I had meant to include you on the Cc: list. I just posted a set of patches (3 small patches) which implement the cache most recent bit I aluded to above. Search for a subject of "Speed up link_mem_sections during boot" and you will find them. I did add you to the Cc: list for the next time I end up sending the set. My next task is to implement a x86_64 SGI UV specific chunk of code to memory_block_size_bytes(). Would you consider adding that to your patch set? I expect to have that either later today or early tomorrow. Robin ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On Wed, Sep 29, 2010 at 14:37, Greg KH wrote: > On Wed, Sep 29, 2010 at 10:32:34AM +0200, Avi Kivity wrote: >> On 09/29/2010 04:50 AM, Greg KH wrote: >>> > >>> > Because the old ABI creates 129,000+ entries inside >>> > /sys/devices/system/memory with their associated links from >>> > /sys/devices/system/node/node*/ back to those directory entries. >>> > >>> > Thankfully things like rpm, hald, and other miscellaneous commands scan >>> > that information. >>> >>> Really? Why? Why would rpm care about this? hald is dead now so we >>> don't need to worry about that anymore, >> >> That's not what compatiblity means. We can't just support >> latest-and-greatest userspace on latest-and-greatest kernels. > > Oh, I know that, that's not what I was getting at at all here, sorry if > it came across that way. > > I wanted to know so we could go fix programs that are mucking around in > these files, as odds are, the shouldn't be doing that in the first > place. > > Like rpm, why would it matter what the memory in the system looks like? HAL does many inefficient things, but I don't think it's using /sys/system/, besides that it may check the cpufreq govenors state there. Kay ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On Wed, Sep 29, 2010 at 10:32:34AM +0200, Avi Kivity wrote: > On 09/29/2010 04:50 AM, Greg KH wrote: >> > >> > Because the old ABI creates 129,000+ entries inside >> > /sys/devices/system/memory with their associated links from >> > /sys/devices/system/node/node*/ back to those directory entries. >> > >> > Thankfully things like rpm, hald, and other miscellaneous commands scan >> > that information. >> >> Really? Why? Why would rpm care about this? hald is dead now so we >> don't need to worry about that anymore, > > That's not what compatiblity means. We can't just support > latest-and-greatest userspace on latest-and-greatest kernels. Oh, I know that, that's not what I was getting at at all here, sorry if it came across that way. I wanted to know so we could go fix programs that are mucking around in these files, as odds are, the shouldn't be doing that in the first place. Like rpm, why would it matter what the memory in the system looks like? thanks, greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On 09/29/2010 04:50 AM, Greg KH wrote: > > Because the old ABI creates 129,000+ entries inside > /sys/devices/system/memory with their associated links from > /sys/devices/system/node/node*/ back to those directory entries. > > Thankfully things like rpm, hald, and other miscellaneous commands scan > that information. Really? Why? Why would rpm care about this? hald is dead now so we don't need to worry about that anymore, That's not what compatiblity means. We can't just support latest-and-greatest userspace on latest-and-greatest kernels. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On Tue, Sep 28, 2010 at 10:12:18AM -0500, Robin Holt wrote: > On Tue, Sep 28, 2010 at 02:44:40PM +0200, Avi Kivity wrote: > > On 09/27/2010 09:09 PM, Nathan Fontenot wrote: > > >This set of patches decouples the concept that a single memory > > >section corresponds to a single directory in > > >/sys/devices/system/memory/. On systems > > >with large amounts of memory (1+ TB) there are perfomance issues > > >related to creating the large number of sysfs directories. For > > >a powerpc machine with 1 TB of memory we are creating 63,000+ > > >directories. This is resulting in boot times of around 45-50 > > >minutes for systems with 1 TB of memory and 8 hours for systems > > >with 2 TB of memory. With this patch set applied I am now seeing > > >boot times of 5 minutes or less. > > > > > >The root of this issue is in sysfs directory creation. Every time > > >a directory is created a string compare is done against all sibling > > >directories to ensure we do not create duplicates. The list of > > >directory nodes in sysfs is kept as an unsorted list which results > > >in this being an exponentially longer operation as the number of > > >directories are created. > > > > > >The solution solved by this patch set is to allow a single > > >directory in sysfs to span multiple memory sections. This is > > >controlled by an optional architecturally defined function > > >memory_block_size_bytes(). The default definition of this > > >routine returns a memory block size equal to the memory section > > >size. This maintains the current layout of sysfs memory > > >directories as it appears to userspace to remain the same as it > > >is today. > > > > > > > Why not update sysfs directory creation to be fast, for example by > > using an rbtree instead of a linked list. This fixes an > > implementation problem in the kernel instead of working around it > > and creating a new ABI. > > Because the old ABI creates 129,000+ entries inside > /sys/devices/system/memory with their associated links from > /sys/devices/system/node/node*/ back to those directory entries. > > Thankfully things like rpm, hald, and other miscellaneous commands scan > that information. Really? Why? Why would rpm care about this? hald is dead now so we don't need to worry about that anymore, but what other commands/programs read this information? thanks, greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On 09/28/2010 07:38 AM, Robin Holt wrote: > I was tasked with looking at a slowdown in similar sized SGI machines > booting x86_64. Jack Steiner had already looked into the memory_dev_init. > I was looking at link_mem_sections(). > > I made a dramatic improvement on a 16TB machine in that function by > merely caching the most recent memory section and checking to see if > the next memory section happens to be the subsequent in the linked list > of kobjects. > > That simple cache reduced the time for link_mem_sections from 1 hour 27 > minutes down to 46 seconds. Nice! > > I would like to propose we implement something along those lines also, > but I am currently swamped. I can probably get you a patch tomorrow > afternoon that applies at the end of this set. Should this be done as a separate patch? This patch set concentrates on updates to the memory code with the node updates only being done due to the memory changes. I think its a good idea to do the caching and have no problem adding on to this patchset if no one else has any objections. -Nathan > > Thanks, > Robin > > On Mon, Sep 27, 2010 at 02:09:31PM -0500, Nathan Fontenot wrote: >> This set of patches decouples the concept that a single memory >> section corresponds to a single directory in >> /sys/devices/system/memory/. On systems >> with large amounts of memory (1+ TB) there are perfomance issues >> related to creating the large number of sysfs directories. For >> a powerpc machine with 1 TB of memory we are creating 63,000+ >> directories. This is resulting in boot times of around 45-50 >> minutes for systems with 1 TB of memory and 8 hours for systems >> with 2 TB of memory. With this patch set applied I am now seeing >> boot times of 5 minutes or less. >> >> The root of this issue is in sysfs directory creation. Every time >> a directory is created a string compare is done against all sibling >> directories to ensure we do not create duplicates. The list of >> directory nodes in sysfs is kept as an unsorted list which results >> in this being an exponentially longer operation as the number of >> directories are created. >> >> The solution solved by this patch set is to allow a single >> directory in sysfs to span multiple memory sections. This is >> controlled by an optional architecturally defined function >> memory_block_size_bytes(). The default definition of this >> routine returns a memory block size equal to the memory section >> size. This maintains the current layout of sysfs memory >> directories as it appears to userspace to remain the same as it >> is today. >> >> For architectures that define their own version of this routine, >> as is done for powerpc in this patchset, the view in userspace >> would change such that each memoryXXX directory would span >> multiple memory sections. The number of sections spanned would >> depend on the value reported by memory_block_size_bytes. >> >> In both cases a new file 'end_phys_index' is created in each >> memoryXXX directory. This file will contain the physical id >> of the last memory section covered by the sysfs directory. For >> the default case, the value in 'end_phys_index' will be the same >> as in the existing 'phys_index' file. >> >> This version of the patch set includes an update to to properly >> report block_size_bytes, phys_index, and end_phys_index. Additionally, >> the patch that adds the end_phys_index sysfs file is now patch 5/8 >> instead of being patch 2/8 as in the previous version of the patches. >> >> -Nathan Fontenot >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On 09/28/2010 05:12 PM, Robin Holt wrote: > Why not update sysfs directory creation to be fast, for example by > using an rbtree instead of a linked list. This fixes an > implementation problem in the kernel instead of working around it > and creating a new ABI. Because the old ABI creates 129,000+ entries inside /sys/devices/system/memory with their associated links from /sys/devices/system/node/node*/ back to those directory entries. Thankfully things like rpm, hald, and other miscellaneous commands scan that information. On our 8 TB test machine, hald runs continuously following boot for nearly an hour mostly scanning useless information from /sys/ I see - so the problem wasn't just kernel internal; the ABI itself was unsuitable. Too bad this wasn't considered at the time it was added. (129k entries / 1 hour = 35 entries/sec; not very impressive) -- error compiling committee.c: too many arguments to function ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On Tue, 2010-09-28 at 14:44 +0200, Avi Kivity wrote: > Why not update sysfs directory creation to be fast, for example by using > an rbtree instead of a linked list. This fixes an implementation > problem in the kernel instead of working around it and creating a new ABI. > > New ABIs mean old tools won't work, and new tools need to understand > both ABIs. Just to be clear _these_ patches do not change the existing ABI. They do add a new ABI: the end_phys_index file. But, it is completely redundant at the moment. It could be taken out of these patches. That said, fixing the directory creation speed is probably a worthwhile endeavor too. -- Dave ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On Tue, Sep 28, 2010 at 02:44:40PM +0200, Avi Kivity wrote: > On 09/27/2010 09:09 PM, Nathan Fontenot wrote: > >This set of patches decouples the concept that a single memory > >section corresponds to a single directory in > >/sys/devices/system/memory/. On systems > >with large amounts of memory (1+ TB) there are perfomance issues > >related to creating the large number of sysfs directories. For > >a powerpc machine with 1 TB of memory we are creating 63,000+ > >directories. This is resulting in boot times of around 45-50 > >minutes for systems with 1 TB of memory and 8 hours for systems > >with 2 TB of memory. With this patch set applied I am now seeing > >boot times of 5 minutes or less. > > > >The root of this issue is in sysfs directory creation. Every time > >a directory is created a string compare is done against all sibling > >directories to ensure we do not create duplicates. The list of > >directory nodes in sysfs is kept as an unsorted list which results > >in this being an exponentially longer operation as the number of > >directories are created. > > > >The solution solved by this patch set is to allow a single > >directory in sysfs to span multiple memory sections. This is > >controlled by an optional architecturally defined function > >memory_block_size_bytes(). The default definition of this > >routine returns a memory block size equal to the memory section > >size. This maintains the current layout of sysfs memory > >directories as it appears to userspace to remain the same as it > >is today. > > > > Why not update sysfs directory creation to be fast, for example by > using an rbtree instead of a linked list. This fixes an > implementation problem in the kernel instead of working around it > and creating a new ABI. Because the old ABI creates 129,000+ entries inside /sys/devices/system/memory with their associated links from /sys/devices/system/node/node*/ back to those directory entries. Thankfully things like rpm, hald, and other miscellaneous commands scan that information. On our 8 TB test machine, hald runs continuously following boot for nearly an hour mostly scanning useless information from /sys/ Robin > > New ABIs mean old tools won't work, and new tools need to understand > both ABIs. > > -- > error compiling committee.c: too many arguments to function > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
On 09/27/2010 09:09 PM, Nathan Fontenot wrote: This set of patches decouples the concept that a single memory section corresponds to a single directory in /sys/devices/system/memory/. On systems with large amounts of memory (1+ TB) there are perfomance issues related to creating the large number of sysfs directories. For a powerpc machine with 1 TB of memory we are creating 63,000+ directories. This is resulting in boot times of around 45-50 minutes for systems with 1 TB of memory and 8 hours for systems with 2 TB of memory. With this patch set applied I am now seeing boot times of 5 minutes or less. The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against all sibling directories to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. The solution solved by this patch set is to allow a single directory in sysfs to span multiple memory sections. This is controlled by an optional architecturally defined function memory_block_size_bytes(). The default definition of this routine returns a memory block size equal to the memory section size. This maintains the current layout of sysfs memory directories as it appears to userspace to remain the same as it is today. Why not update sysfs directory creation to be fast, for example by using an rbtree instead of a linked list. This fixes an implementation problem in the kernel instead of working around it and creating a new ABI. New ABIs mean old tools won't work, and new tools need to understand both ABIs. -- error compiling committee.c: too many arguments to function ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
I was tasked with looking at a slowdown in similar sized SGI machines booting x86_64. Jack Steiner had already looked into the memory_dev_init. I was looking at link_mem_sections(). I made a dramatic improvement on a 16TB machine in that function by merely caching the most recent memory section and checking to see if the next memory section happens to be the subsequent in the linked list of kobjects. That simple cache reduced the time for link_mem_sections from 1 hour 27 minutes down to 46 seconds. I would like to propose we implement something along those lines also, but I am currently swamped. I can probably get you a patch tomorrow afternoon that applies at the end of this set. Thanks, Robin On Mon, Sep 27, 2010 at 02:09:31PM -0500, Nathan Fontenot wrote: > This set of patches decouples the concept that a single memory > section corresponds to a single directory in > /sys/devices/system/memory/. On systems > with large amounts of memory (1+ TB) there are perfomance issues > related to creating the large number of sysfs directories. For > a powerpc machine with 1 TB of memory we are creating 63,000+ > directories. This is resulting in boot times of around 45-50 > minutes for systems with 1 TB of memory and 8 hours for systems > with 2 TB of memory. With this patch set applied I am now seeing > boot times of 5 minutes or less. > > The root of this issue is in sysfs directory creation. Every time > a directory is created a string compare is done against all sibling > directories to ensure we do not create duplicates. The list of > directory nodes in sysfs is kept as an unsorted list which results > in this being an exponentially longer operation as the number of > directories are created. > > The solution solved by this patch set is to allow a single > directory in sysfs to span multiple memory sections. This is > controlled by an optional architecturally defined function > memory_block_size_bytes(). The default definition of this > routine returns a memory block size equal to the memory section > size. This maintains the current layout of sysfs memory > directories as it appears to userspace to remain the same as it > is today. > > For architectures that define their own version of this routine, > as is done for powerpc in this patchset, the view in userspace > would change such that each memoryXXX directory would span > multiple memory sections. The number of sections spanned would > depend on the value reported by memory_block_size_bytes. > > In both cases a new file 'end_phys_index' is created in each > memoryXXX directory. This file will contain the physical id > of the last memory section covered by the sysfs directory. For > the default case, the value in 'end_phys_index' will be the same > as in the existing 'phys_index' file. > > This version of the patch set includes an update to to properly > report block_size_bytes, phys_index, and end_phys_index. Additionally, > the patch that adds the end_phys_index sysfs file is now patch 5/8 > instead of being patch 2/8 as in the previous version of the patches. > > -Nathan Fontenot > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/8] v2 De-Couple sysfs memory directories from memory sections
This set of patches decouples the concept that a single memory section corresponds to a single directory in /sys/devices/system/memory/. On systems with large amounts of memory (1+ TB) there are perfomance issues related to creating the large number of sysfs directories. For a powerpc machine with 1 TB of memory we are creating 63,000+ directories. This is resulting in boot times of around 45-50 minutes for systems with 1 TB of memory and 8 hours for systems with 2 TB of memory. With this patch set applied I am now seeing boot times of 5 minutes or less. The root of this issue is in sysfs directory creation. Every time a directory is created a string compare is done against all sibling directories to ensure we do not create duplicates. The list of directory nodes in sysfs is kept as an unsorted list which results in this being an exponentially longer operation as the number of directories are created. The solution solved by this patch set is to allow a single directory in sysfs to span multiple memory sections. This is controlled by an optional architecturally defined function memory_block_size_bytes(). The default definition of this routine returns a memory block size equal to the memory section size. This maintains the current layout of sysfs memory directories as it appears to userspace to remain the same as it is today. For architectures that define their own version of this routine, as is done for powerpc in this patchset, the view in userspace would change such that each memoryXXX directory would span multiple memory sections. The number of sections spanned would depend on the value reported by memory_block_size_bytes. In both cases a new file 'end_phys_index' is created in each memoryXXX directory. This file will contain the physical id of the last memory section covered by the sysfs directory. For the default case, the value in 'end_phys_index' will be the same as in the existing 'phys_index' file. This version of the patch set includes an update to to properly report block_size_bytes, phys_index, and end_phys_index. Additionally, the patch that adds the end_phys_index sysfs file is now patch 5/8 instead of being patch 2/8 as in the previous version of the patches. -Nathan Fontenot ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev