Re: [PATCH 03/25] xen: Add nosegneg capability to the vsyscall page notes
Roland McGrath wrote: >> I have to admit I still don't really understand all this. Is it >> documented somewhere? >> > > I have explained it in public more than once, but I don't know off hand > anywhere that was helpfully recorded. > Thanks very much. I'd been poking about, but the closest I came to an actual description was various patches fixing bugs, so it was a little incomplete. > For example, a Xen-enabled kernel can use a single vDSO image (or a single > pair of int80/sysenter images), containing the "nosegneg" hwcap note. When > there is no need for it (native or hvm or 64-bit hv or whatever), it just > clears the mask word. If you actually do this, you'll want to modify the > NOTE_KERNELCAP_BEGIN macro to define a global label you can use with VDSO_SYM. > Thanks for the pointer. I'd been getting a bit of heat for enabling the nonegseg flag unconditionally. If I can make Xen-specific then that will be one less source of complaints. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/25] xen: Add nosegneg capability to the vsyscall page notes
Roland McGrath wrote: I have to admit I still don't really understand all this. Is it documented somewhere? I have explained it in public more than once, but I don't know off hand anywhere that was helpfully recorded. Thanks very much. I'd been poking about, but the closest I came to an actual description was various patches fixing bugs, so it was a little incomplete. For example, a Xen-enabled kernel can use a single vDSO image (or a single pair of int80/sysenter images), containing the nosegneg hwcap note. When there is no need for it (native or hvm or 64-bit hv or whatever), it just clears the mask word. If you actually do this, you'll want to modify the NOTE_KERNELCAP_BEGIN macro to define a global label you can use with VDSO_SYM. Thanks for the pointer. I'd been getting a bit of heat for enabling the nonegseg flag unconditionally. If I can make Xen-specific then that will be one less source of complaints. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/25] xen: Add nosegneg capability to the vsyscall page notes
> I have to admit I still don't really understand all this. Is it > documented somewhere? I have explained it in public more than once, but I don't know off hand anywhere that was helpfully recorded. > What does "hwcap 0 nosegneg" actually mean? What does the "0" mean here? ldconfig is usually run at library install time. It reads ld.so.conf (and its include files, usually found in /etc/ld.so.conf.d/*.conf). ldconfig finds libraries on the disk and stores their names in ld.so.cache. For libraries in "hwcap directories", it records in ld.so.cache a bitmask of hwcap bits for each library, based on which hwcap names appeared in the library file name. The hard-wired hwcap names are things such as "mmx" and "sse2", a chosen subset of the AT_HWCAP bits the kernel provides. When the dynamic linker is finding a library at runtime, it uses a match from ld.so.cache unless none was found or LD_LIBRARY_PATH was set. When there is no cache hit, it searches a directory path for the library's name. The subset of hwcap names whose AT_HWCAP bits are set yields a list of subdirectories to try under each directory in the path. To see the list: $ LD_LIBRARY_PATH=/lib:/usr/lib LD_DEBUG=libs /bin/true 21491: find library=libc.so.6 [0]; searching 21491: search path=/lib/tls/i686/sse2:/lib/tls/i686:/lib/tls/sse2:/lib/tls:/lib/i686/sse2:/lib/i686:/lib/sse2:/lib:/usr/lib/tls/i686/sse2:/usr/lib/tls/i686:/usr/lib/tls/sse2:/usr/lib/tls:/usr/lib/i686/sse2:/usr/lib/i686:/usr/lib/sse2:/usr/lib (system search path) (Here you can notice that "tls" is used as a pseudo-hwcap; it is in fact hard-wired into the list if the dynamic linker supports ELF TLS, which all recent ones do. Also you'll notice "i686" is not a hwcap bit, but is the AT_PLATFORM string, which is treated similarly.) The hwcap bitmasks in ld.so.cache are intended to make the single cache file equivalent to this varying search path depending on runtime hwcap bits set. A cache entry whose bitmask has bits not set at runtime is ignored. Running "ldconfig -p | grep hwcap" (read-only, need not be root) will show you any entries in your ld.so.cache that have an hwcap bitmask set. The "hwcap" directive in ld.so.conf tells ldconfig to understand a new hwcap name that is not in the hard-wired set. There is some number of extra bits available; "hwcap 0" assigns the first extra bit, "hwcap 1" the second, and so on. The name is what to use as a subdirectory name, analogous to "sse2" et al. On my system with an ld.so.conf.d file installed doing "hwcap 0 nosegneg": $ ldconfig -p | grep libc.so.6 libc.so.6 (libc6, hwcap: 0x0018, OS ABI: Linux 2.6.9) => /lib/i686/nosegneg/libc.so.6 libc.so.6 (libc6, OS ABI: Linux 2.6.9) => /lib/libc.so.6 (There are two bits set because the "tls" pseudo-bit is also set.) With this in ld.so.cache, the libc.so.6 lookup will find /lib/i686/nosegneg/libc.so.6 first, but only if the hwcap bit set. > In the ELF note, what does the "nosegneg" string mean? How is it used? > Is it compared to the "nosegneg" in ld.so.conf? How does this relate to > the bitfields? Each bit + string element in the note (there's just the one in what we have) establishes for the dynamic linker at runtime the association between the "extra" pseudo-hwcap bit number and the name. If that pseudo-hwcap is enabled, then that string will figure into the directory search path as "sse2" does in the example above. This string is never consulted when looking in ld.so.cache. The mask field in NOTE_KERNELCAP_BEGIN says which "extra" bits are enabled. If the corresponding bit is not set here, then it's just like a hard-wired hwcap bit like "sse2" when that bit was not set in AT_HWCAP. That is, a cache lookup will ignore entries with that hwcap bit in their bitmask, and that hwcap name will not be used in constructing the directory search path. I put this bitmask in so that the kernel has the option of using a single vDSO image for multiple different runtime configurations. It can simply modify the bitmask in the image at setup time to disable some entries. For example, a Xen-enabled kernel can use a single vDSO image (or a single pair of int80/sysenter images), containing the "nosegneg" hwcap note. When there is no need for it (native or hvm or 64-bit hv or whatever), it just clears the mask word. If you actually do this, you'll want to modify the NOTE_KERNELCAP_BEGIN macro to define a global label you can use with VDSO_SYM. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/25] xen: Add nosegneg capability to the vsyscall page notes
Roland McGrath wrote: >> + * It should contain: >> + * hwcap 0 nosegneg >> + * to match the mapping of bit to name that we give here. >> > > This needs to be "hwcap 0 nosegneg" to match: > > >> +NOTE_KERNELCAP_BEGIN(1, 2) >> +NOTE_KERNELCAP(1, "nosegneg") >> +NOTE_KERNELCAP_END >> > > The actual bits you are using should be fine. (You're intentionally > skipping bit 0 to work around hold glibc bugs, which you might want to add > to the comments. Also a comment or perhaps using 1<<1 syntax would make it > more clear that "2" is the bit mask containing bit 1 and that's why it has > to be 2, and not because of some other magical property of 2.) But if > kernel packagers don't write the matching bit number in their ld.so.conf.d > files, then ld.so.cache lookups won't work right. I have to admit I still don't really understand all this. Is it documented somewhere? What does "hwcap 0 nosegneg" actually mean? What does the "0" mean here? In the ELF note, what does the "nosegneg" string mean? How is it used? Is it compared to the "nosegneg" in ld.so.conf? How does this relate to the bitfields? Thanks, J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/25] xen: Add nosegneg capability to the vsyscall page notes
> + * It should contain: > + * hwcap 0 nosegneg > + * to match the mapping of bit to name that we give here. This needs to be "hwcap 0 nosegneg" to match: > +NOTE_KERNELCAP_BEGIN(1, 2) > +NOTE_KERNELCAP(1, "nosegneg") > +NOTE_KERNELCAP_END The actual bits you are using should be fine. (You're intentionally skipping bit 0 to work around hold glibc bugs, which you might want to add to the comments. Also a comment or perhaps using 1<<1 syntax would make it more clear that "2" is the bit mask containing bit 1 and that's why it has to be 2, and not because of some other magical property of 2.) But if kernel packagers don't write the matching bit number in their ld.so.conf.d files, then ld.so.cache lookups won't work right. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/25] xen: Add nosegneg capability to the vsyscall page notes
Add the "nosegneg" fake capabilty to the vsyscall page notes. This is used by the runtime linker to select a glibc version which then disables negative-offset accesses to the thread-local segment via %gs. These accesses require emulation in Xen (because segments are truncated to protect the hypervisor address space) and avoiding them provides a measurable performance boost. Signed-off-by: Ian Pratt <[EMAIL PROTECTED]> Signed-off-by: Christian Limpach <[EMAIL PROTECTED]> Signed-off-by: Chris Wright <[EMAIL PROTECTED]> Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Acked-by: Zachary Amsden <[EMAIL PROTECTED]> Cc: Roland McGrath <[EMAIL PROTECTED]> Cc: Ulrich Drepper <[EMAIL PROTECTED]> --- arch/i386/kernel/vsyscall-note.S | 28 1 file changed, 28 insertions(+) === --- a/arch/i386/kernel/vsyscall-note.S +++ b/arch/i386/kernel/vsyscall-note.S @@ -23,3 +24,31 @@ 3: .balign 4; /* pad out section */ ASM_ELF_NOTE_BEGIN(".note.kernel-version", "a", UTS_SYSNAME, 0) .long LINUX_VERSION_CODE ASM_ELF_NOTE_END + +#ifdef CONFIG_XEN +/* + * Add a special note telling glibc's dynamic linker a fake hardware + * flavor that it will use to choose the search path for libraries in the + * same way it uses real hardware capabilities like "mmx". + * We supply "nosegneg" as the fake capability, to indicate that we + * do not like negative offsets in instructions using segment overrides, + * since we implement those inefficiently. This makes it possible to + * install libraries optimized to avoid those access patterns in someplace + * like /lib/i686/tls/nosegneg. Note that an /etc/ld.so.conf.d/file + * corresponding to the bits here is needed to make ldconfig work right. + * It should contain: + * hwcap 0 nosegneg + * to match the mapping of bit to name that we give here. + */ +#define NOTE_KERNELCAP_BEGIN(ncaps, mask) \ + ASM_ELF_NOTE_BEGIN(".note.kernelcap", "a", "GNU", 2) \ + .long ncaps, mask +#define NOTE_KERNELCAP(bit, name) \ + .byte bit; .asciz name +#define NOTE_KERNELCAP_END ASM_ELF_NOTE_END + +NOTE_KERNELCAP_BEGIN(1, 2) +NOTE_KERNELCAP(1, "nosegneg") +NOTE_KERNELCAP_END +#endif + -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/25] xen: Add nosegneg capability to the vsyscall page notes
Add the nosegneg fake capabilty to the vsyscall page notes. This is used by the runtime linker to select a glibc version which then disables negative-offset accesses to the thread-local segment via %gs. These accesses require emulation in Xen (because segments are truncated to protect the hypervisor address space) and avoiding them provides a measurable performance boost. Signed-off-by: Ian Pratt [EMAIL PROTECTED] Signed-off-by: Christian Limpach [EMAIL PROTECTED] Signed-off-by: Chris Wright [EMAIL PROTECTED] Signed-off-by: Jeremy Fitzhardinge [EMAIL PROTECTED] Acked-by: Zachary Amsden [EMAIL PROTECTED] Cc: Roland McGrath [EMAIL PROTECTED] Cc: Ulrich Drepper [EMAIL PROTECTED] --- arch/i386/kernel/vsyscall-note.S | 28 1 file changed, 28 insertions(+) === --- a/arch/i386/kernel/vsyscall-note.S +++ b/arch/i386/kernel/vsyscall-note.S @@ -23,3 +24,31 @@ 3: .balign 4; /* pad out section */ ASM_ELF_NOTE_BEGIN(.note.kernel-version, a, UTS_SYSNAME, 0) .long LINUX_VERSION_CODE ASM_ELF_NOTE_END + +#ifdef CONFIG_XEN +/* + * Add a special note telling glibc's dynamic linker a fake hardware + * flavor that it will use to choose the search path for libraries in the + * same way it uses real hardware capabilities like mmx. + * We supply nosegneg as the fake capability, to indicate that we + * do not like negative offsets in instructions using segment overrides, + * since we implement those inefficiently. This makes it possible to + * install libraries optimized to avoid those access patterns in someplace + * like /lib/i686/tls/nosegneg. Note that an /etc/ld.so.conf.d/file + * corresponding to the bits here is needed to make ldconfig work right. + * It should contain: + * hwcap 0 nosegneg + * to match the mapping of bit to name that we give here. + */ +#define NOTE_KERNELCAP_BEGIN(ncaps, mask) \ + ASM_ELF_NOTE_BEGIN(.note.kernelcap, a, GNU, 2) \ + .long ncaps, mask +#define NOTE_KERNELCAP(bit, name) \ + .byte bit; .asciz name +#define NOTE_KERNELCAP_END ASM_ELF_NOTE_END + +NOTE_KERNELCAP_BEGIN(1, 2) +NOTE_KERNELCAP(1, nosegneg) +NOTE_KERNELCAP_END +#endif + -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/25] xen: Add nosegneg capability to the vsyscall page notes
+ * It should contain: + * hwcap 0 nosegneg + * to match the mapping of bit to name that we give here. This needs to be hwcap 0 nosegneg to match: +NOTE_KERNELCAP_BEGIN(1, 2) +NOTE_KERNELCAP(1, nosegneg) +NOTE_KERNELCAP_END The actual bits you are using should be fine. (You're intentionally skipping bit 0 to work around hold glibc bugs, which you might want to add to the comments. Also a comment or perhaps using 11 syntax would make it more clear that 2 is the bit mask containing bit 1 and that's why it has to be 2, and not because of some other magical property of 2.) But if kernel packagers don't write the matching bit number in their ld.so.conf.d files, then ld.so.cache lookups won't work right. Thanks, Roland - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/25] xen: Add nosegneg capability to the vsyscall page notes
Roland McGrath wrote: + * It should contain: + * hwcap 0 nosegneg + * to match the mapping of bit to name that we give here. This needs to be hwcap 0 nosegneg to match: +NOTE_KERNELCAP_BEGIN(1, 2) +NOTE_KERNELCAP(1, nosegneg) +NOTE_KERNELCAP_END The actual bits you are using should be fine. (You're intentionally skipping bit 0 to work around hold glibc bugs, which you might want to add to the comments. Also a comment or perhaps using 11 syntax would make it more clear that 2 is the bit mask containing bit 1 and that's why it has to be 2, and not because of some other magical property of 2.) But if kernel packagers don't write the matching bit number in their ld.so.conf.d files, then ld.so.cache lookups won't work right. I have to admit I still don't really understand all this. Is it documented somewhere? What does hwcap 0 nosegneg actually mean? What does the 0 mean here? In the ELF note, what does the nosegneg string mean? How is it used? Is it compared to the nosegneg in ld.so.conf? How does this relate to the bitfields? Thanks, J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/25] xen: Add nosegneg capability to the vsyscall page notes
I have to admit I still don't really understand all this. Is it documented somewhere? I have explained it in public more than once, but I don't know off hand anywhere that was helpfully recorded. What does hwcap 0 nosegneg actually mean? What does the 0 mean here? ldconfig is usually run at library install time. It reads ld.so.conf (and its include files, usually found in /etc/ld.so.conf.d/*.conf). ldconfig finds libraries on the disk and stores their names in ld.so.cache. For libraries in hwcap directories, it records in ld.so.cache a bitmask of hwcap bits for each library, based on which hwcap names appeared in the library file name. The hard-wired hwcap names are things such as mmx and sse2, a chosen subset of the AT_HWCAP bits the kernel provides. When the dynamic linker is finding a library at runtime, it uses a match from ld.so.cache unless none was found or LD_LIBRARY_PATH was set. When there is no cache hit, it searches a directory path for the library's name. The subset of hwcap names whose AT_HWCAP bits are set yields a list of subdirectories to try under each directory in the path. To see the list: $ LD_LIBRARY_PATH=/lib:/usr/lib LD_DEBUG=libs /bin/true 21491: find library=libc.so.6 [0]; searching 21491: search path=/lib/tls/i686/sse2:/lib/tls/i686:/lib/tls/sse2:/lib/tls:/lib/i686/sse2:/lib/i686:/lib/sse2:/lib:/usr/lib/tls/i686/sse2:/usr/lib/tls/i686:/usr/lib/tls/sse2:/usr/lib/tls:/usr/lib/i686/sse2:/usr/lib/i686:/usr/lib/sse2:/usr/lib (system search path) (Here you can notice that tls is used as a pseudo-hwcap; it is in fact hard-wired into the list if the dynamic linker supports ELF TLS, which all recent ones do. Also you'll notice i686 is not a hwcap bit, but is the AT_PLATFORM string, which is treated similarly.) The hwcap bitmasks in ld.so.cache are intended to make the single cache file equivalent to this varying search path depending on runtime hwcap bits set. A cache entry whose bitmask has bits not set at runtime is ignored. Running ldconfig -p | grep hwcap (read-only, need not be root) will show you any entries in your ld.so.cache that have an hwcap bitmask set. The hwcap directive in ld.so.conf tells ldconfig to understand a new hwcap name that is not in the hard-wired set. There is some number of extra bits available; hwcap 0 assigns the first extra bit, hwcap 1 the second, and so on. The name is what to use as a subdirectory name, analogous to sse2 et al. On my system with an ld.so.conf.d file installed doing hwcap 0 nosegneg: $ ldconfig -p | grep libc.so.6 libc.so.6 (libc6, hwcap: 0x0018, OS ABI: Linux 2.6.9) = /lib/i686/nosegneg/libc.so.6 libc.so.6 (libc6, OS ABI: Linux 2.6.9) = /lib/libc.so.6 (There are two bits set because the tls pseudo-bit is also set.) With this in ld.so.cache, the libc.so.6 lookup will find /lib/i686/nosegneg/libc.so.6 first, but only if the hwcap bit set. In the ELF note, what does the nosegneg string mean? How is it used? Is it compared to the nosegneg in ld.so.conf? How does this relate to the bitfields? Each bit + string element in the note (there's just the one in what we have) establishes for the dynamic linker at runtime the association between the extra pseudo-hwcap bit number and the name. If that pseudo-hwcap is enabled, then that string will figure into the directory search path as sse2 does in the example above. This string is never consulted when looking in ld.so.cache. The mask field in NOTE_KERNELCAP_BEGIN says which extra bits are enabled. If the corresponding bit is not set here, then it's just like a hard-wired hwcap bit like sse2 when that bit was not set in AT_HWCAP. That is, a cache lookup will ignore entries with that hwcap bit in their bitmask, and that hwcap name will not be used in constructing the directory search path. I put this bitmask in so that the kernel has the option of using a single vDSO image for multiple different runtime configurations. It can simply modify the bitmask in the image at setup time to disable some entries. For example, a Xen-enabled kernel can use a single vDSO image (or a single pair of int80/sysenter images), containing the nosegneg hwcap note. When there is no need for it (native or hvm or 64-bit hv or whatever), it just clears the mask word. If you actually do this, you'll want to modify the NOTE_KERNELCAP_BEGIN macro to define a global label you can use with VDSO_SYM. Thanks, Roland - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/