Re: [PATCH] Use 1TB segments
On Wed, 2007-10-03 at 13:13 +1000, Paul Mackerras wrote: Will Schmidt writes: I still need to test this code for performance issues, and this version could still use some cosmetic touchups, so I dont think we want this to go into a tree yet. I am reposting this primarily to indicate the prior version isnt quite right, and so Jon can rebase to this version. :-) The way we scan the ibm,processor-segment-sizes property could be nicer. Where there any other cosmetic touchups you were thinking of, and if so what were they? I didn't notice any leftover debugging printks or anything else that obviously needed cleaning up. Correct.. nothing in the patch really *needs* to be cleaned up. This is mostly me being way more nit-picky than I need to be. :-) I don't have any real issues with the patch (being candidate for) going into a tree. The only obvious is the MMU_SEGSIZE_* #define's in mmu-hash64.h appear to be duplicated. The rest I can follow up on later, none of it affects the code outside of #ifdef DEBUG's and it should be a separate patch anyway. Thanks, -Will Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Use 1TB segments
Olof Johansson writes: This makes the kernel use 1TB segments for all kernel mappings and for user addresses of 1TB and above, on machines which support them (currently POWER5+ and POWER6). PA6T supports them as well :) In the patch, we don't actually hard-code the CPU_FTR_1T_SEGMENT bit in the cputable entry for any processor; instead we look in the ibm,processor-segment-sizes property in the cpu node(s) in the device tree. Do you want us to add the CPU_FTR_1T_SEGMENT bit to the cputable entry for the PA6T, or will your firmware gives the ibm,processor-segment-sizes property in the device tree? Wouldn't it be possible to stick with 1TB segments for the low range for 64-bit processes as well, and have them allocate their hugepages at 1TB? You mean, forbid hugepages below 1TB? That would be a user-visible ABI change. There are linker scripts for generating executables whose text and/or data can go in hugepages, and I believe they put the text/data below 1TB. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Use 1TB segments
Will Schmidt writes: I still need to test this code for performance issues, and this version could still use some cosmetic touchups, so I dont think we want this to go into a tree yet. I am reposting this primarily to indicate the prior version isnt quite right, and so Jon can rebase to this version. :-) The way we scan the ibm,processor-segment-sizes property could be nicer. Where there any other cosmetic touchups you were thinking of, and if so what were they? I didn't notice any leftover debugging printks or anything else that obviously needed cleaning up. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Use 1TB segments
On Wed, Oct 03, 2007 at 01:07:58PM +1000, Paul Mackerras wrote: Olof Johansson writes: This makes the kernel use 1TB segments for all kernel mappings and for user addresses of 1TB and above, on machines which support them (currently POWER5+ and POWER6). PA6T supports them as well :) In the patch, we don't actually hard-code the CPU_FTR_1T_SEGMENT bit in the cputable entry for any processor; instead we look in the ibm,processor-segment-sizes property in the cpu node(s) in the device tree. Yep, I see that. I just wanted to clarify it for the (future) commit message. Do you want us to add the CPU_FTR_1T_SEGMENT bit to the cputable entry for the PA6T, or will your firmware gives the ibm,processor-segment-sizes property in the device tree? Thanks, but we've already got the properties there so it just works. Wouldn't it be possible to stick with 1TB segments for the low range for 64-bit processes as well, and have them allocate their hugepages at 1TB? You mean, forbid hugepages below 1TB? That would be a user-visible ABI change. There are linker scripts for generating executables whose text and/or data can go in hugepages, and I believe they put the text/data below 1TB. Hm, good point. Bummer. -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Use 1TB segments
Paul Mackerras wrote: diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c A couple of hunks fail in this file when applying to the current tree. ... diff --git a/include/asm-powerpc/mmu-hash64.h b/include/asm-powerpc/mmu-hash64.h index 695962f..053f86b 100644 --- a/include/asm-powerpc/mmu-hash64.h +++ b/include/asm-powerpc/mmu-hash64.h @@ -47,6 +47,8 @@ extern char initial_stab[]; /* Bits in the SLB VSID word */ #define SLB_VSID_SHIFT 12 +#define SLB_VSID_SHIFT_1T24 +#define SLB_VSID_SSIZE_SHIFT 62 #define SLB_VSID_B ASM_CONST(0xc000) #define SLB_VSID_B_256M ASM_CONST(0x) #define SLB_VSID_B_1TASM_CONST(0x4000) @@ -66,6 +68,7 @@ extern char initial_stab[]; #define SLB_VSID_USER(SLB_VSID_KP|SLB_VSID_KS|SLB_VSID_C) #define SLBIE_C (0x0800) +#define SLBIE_SSIZE_SHIFT25 /* * Hash table @@ -77,7 +80,7 @@ extern char initial_stab[]; #define HPTE_V_AVPN_SHIFT7 #define HPTE_V_AVPN ASM_CONST(0x3f80) #define HPTE_V_AVPN_VAL(x) (((x) HPTE_V_AVPN) HPTE_V_AVPN_SHIFT) -#define HPTE_V_COMPARE(x,y) (!(((x) ^ (y)) HPTE_V_AVPN)) +#define HPTE_V_COMPARE(x,y) (!(((x) ^ (y)) 0xff80)) #define HPTE_V_BOLTEDASM_CONST(0x0010) #define HPTE_V_LOCK ASM_CONST(0x0008) #define HPTE_V_LARGE ASM_CONST(0x0004) @@ -164,16 +167,25 @@ struct mmu_psize_def #define MMU_SEGSIZE_256M 0 #define MMU_SEGSIZE_1T 1 +/* + * Supported segment sizes + */ +#define MMU_SEGSIZE_256M 0 +#define MMU_SEGSIZE_1T 1 It looks like this is repeating the definitions just above it. Jon ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Use 1TB segments
Hi Paul, just a few questions. On Wed, 2007-08-01 at 12:04 +1000, Paul Mackerras wrote: This makes the kernel use 1TB segments for all kernel mappings and for user addresses of 1TB and above, on machines which support them (currently POWER5+ and POWER6). We don't currently use 1TB segments for user addresses 1T, since that would effectively prevent 32-bit processes from using huge pages unless we also had a way to revert to using 256MB segments. I think I have a question about user address 1T.. once I think on it a bit more it'll either click, or I'll have a question articulated. :-) -static inline void __tlbiel(unsigned long va, unsigned int psize) +static inline void __tlbiel(unsigned long va, int psize, int ssize) -static inline void tlbie(unsigned long va, int psize, int local) +static inline void tlbie(unsigned long va, int psize, int ssize, int local) static long native_hpte_insert(unsigned long hpte_group, unsigned long va, unsigned long pa, unsigned long rflags, - unsigned long vflags, int psize) + unsigned long vflags, int psize, int ssize) static long native_hpte_updatepp(unsigned long slot, unsigned long newpp, - unsigned long va, int psize, int local) + unsigned long va, int psize, int ssize, + int local) Is there technical reason why the 'local' variable remains at the end of the parm list for these? In other cases 'ssize' simply gets added to the end of the parm list. +static int __init htab_dt_scan_seg_sizes(unsigned long node, + const char *uname, int depth, + void *data) +{ + char *type = of_get_flat_dt_prop(node, device_type, NULL); + u32 *prop; + unsigned long size = 0; + + /* We are scanning cpu nodes only */ + if (type == NULL || strcmp(type, cpu) != 0) + return 0; + + prop = (u32 *)of_get_flat_dt_prop(node, ibm,processor-segment-sizes, + size); + if (prop != NULL size = 8) { + if (prop[0] == 0x1c prop[1] == 0x28) { This is 0x1c indicating 2^28 for 256M; and 0x28 indicating 2^40 for 1TB segments. Will there ever be a segment size between the two? Or will the representation every vary from this? i.e. wondering if prop[0] will always be for 256M and prop[1] for 1TB. +#define slb_vsid_shift(ssize)\ + ((ssize) == MMU_SEGSIZE_256M? SLB_VSID_SHIFT: SLB_VSID_SHIFT_1T) @@ -100,12 +106,13 @@ void slb_flush_and_rebolt(void) vflags = SLB_VSID_KERNEL | vmalloc_llp; ksp_esid_data = mk_esid_data(get_paca()-kstack, 2); - if ((ksp_esid_data ESID_MASK) == PAGE_OFFSET) + mask = (mmu_kernel_ssize == MMU_SEGSIZE_256M)? ESID_MASK: ESID_MASK_1T; Is this one worthy of a #define like the slb_vsid_shift() above? +#define VSID_MULTIPLIER_256M ASM_CONST(200730139)/* 28-bit prime */ +#define VSID_MULTIPLIER_1T ASM_CONST(12538073) /* 24-bit prime */ Anything special in how this 24-bit prime value was selected? (same question could be for the 28-bit prime, though I see that value was updated at least once a few years back) -Will ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Use 1TB segments
Is there technical reason why the 'local' variable remains at the end of the parm list for these? In other cases 'ssize' simply gets added to the end of the parm list. Looks nicer to have psize and ssize together :-) Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Use 1TB segments
On Fri, 2007-08-03 at 08:37 +1000, Benjamin Herrenschmidt wrote: Is there technical reason why the 'local' variable remains at the end of the parm list for these? In other cases 'ssize' simply gets added to the end of the parm list. Looks nicer to have psize and ssize together :-) Aah! And here I thought there was some obscure register usage optimization going on.. :-) -Will Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Use 1TB segments
On Thu, Aug 02, 2007 at 03:41:23PM -0500, Will Schmidt wrote: Hi Paul, just a few questions. [snip] +#define VSID_MULTIPLIER_256M ASM_CONST(200730139)/* 28-bit prime */ +#define VSID_MULTIPLIER_1T ASM_CONST(12538073) /* 24-bit prime */ Anything special in how this 24-bit prime value was selected? (same question could be for the 28-bit prime, though I see that value was updated at least once a few years back) I can't speak to the 24-bit value specifically, but I can speak to the 28-bit one: I did the rewrite of the SLB miss handler to use that multiplicative hash, and changed the prime value when a bug report showed problems in the original choice. Afaict, the value of the prime doesn't matter all that much - in fact it doesn't strictly even need to be prime, just co-prime to (2^36-1). Originally, I picked the largest 28-bit prime, on the basis that a large multiplier should give better scattering/folding. That turned out to cause problems on some iSeries machines (which couldn't do 16M pages) - we were filling up hash buckets with the linear mapping. I figured out that that was because a very large 28-bit number was in the relevant modulus, sort of equivalent to a small negative number. For a big linear mapping, the hash bucket selected strided gradually backwards through the table - there were also differences in the high bits of the hash, but they were lost because of the limited size of the hash table. So, I changed the multiplier to the median 28-bit prime, in the hopes that that would give better hash scattering across as many bits of the VSID as possible. I don't have much in the way of theoretical justification for that, but it fixed the iSeries problem and I've never heard of any regressions, so apparently it's not too bad. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev