Re: Bug with multiple help messages, the last one is shown
Sorry for the late answer, Yahoo put your mail in its Spam folder, and I didn't check until now. On Tuesday 22 March 2005 21:00, Roman Zippel wrote: > Hi, > > On Tue, 22 Mar 2005, Blaisorblade wrote: > > I've verified multiple times that if we have a situation like this > > > > bool A > > depends on TRUE > > help > > Bla bla1 > > > > and > > > > bool A > > depends on FALSE > > help > > Bla bla2 > > > > even if the first option is the displayed one, the help text used is the > > one for the second option (the absence of "prompt" is not relevant here)! > > Is this based on a real problem? Yes, look at the multiple help texts in lib/Kconfig.debug in vanilla 2.6.11, or, in the current bk tree, in lib/Kconfig.debug and arch/um/Kconfig for MAGIC_SYSRQ. For UML we need different help texts, so I'd like this solved. If you definitely don't want to fix this, we can use the old 2.4 trick of having CONFIG_MAGIC_SYSRQ2, for instance, with the right help and defining MAGIC_SYSRQ as equal to MAGIC_SYSRQ2. > I know that there's currently one help > text per symbol > and the behaviour for multiple help texts is basically > undefined. Yes, it's what I saw (actually I guess and seem to have verified that the last read text is used). -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [08/08] uml: va_copy fix
On Tuesday 05 April 2005 20:47, Renate Meijer wrote: > On Apr 5, 2005, at 6:48 PM, Greg KH wrote: > > -stable review patch. If anyone has any objections, please let us > > know. > > > > -- > > > > Uses __va_copy instead of va_copy since some old versions of gcc > > (2.95.4 > > for instance) don't accept va_copy. > > Are there many kernels still being built with 2.95.4? It's quite > antiquated, as far as > i'm aware. > > The use of '__' violates compiler namespace. Why? The symbol is defined by the compiler itself. > If 2.95.4 were not easily > replaced by > a much better version (3.3.x? 3.4.x) I would see a reason to disregard > this, but a fix > merely to satisfy an obsolete compiler? Let's not flame, Linus Torvalds said "we support GCC 2.95.3, because the newer versions are worse compilers in most cases". One user complained, even because he uses Debian, and I cannot do less than make sure that we comply with the requirements we have choosen (compiling with that GCC). Please let's not start a flame on this. Consider me as having no opinion on this except not wanting to break on purpose Debian users. If you want, submit a patch removing Gcc 2.95.3 from supported versions, and get ready to fight for it (and probably loose). Also, that GCC has discovered some syscall table errors in UML - I sent a separate patch, which was a bit big sadly (in the reduced version, about 70 lines + description). -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [08/08] uml: va_copy fix
For Jörn Engel and the issue he opened: at the end of this mail I describe another bug caught by 2.95 and not by 3.x. On Tuesday 05 April 2005 22:18, Renate Meijer wrote: > On Apr 5, 2005, at 8:53 PM, Blaisorblade wrote: > > On Tuesday 05 April 2005 20:47, Renate Meijer wrote: > >> On Apr 5, 2005, at 6:48 PM, Greg KH wrote: > >> The use of '__' violates compiler namespace. > > > > Why? The symbol is defined by the compiler itself. > If a function is prefixed with a double underscore, this implies the > function is internal to > the compiler, and may change at any time, since it's not governed by > some sort of standard. > Hence that code may start suffering from bitrot and complaining to the > compiler guys won't help. > They'll just tell you to RTFM. Ok, agreed in general. However, the -stable tree is for "current" GCC. Your objections would better refer to the fact that the same patch has already been merged into the main trunk. Also, they have no point in doing this, probably. And the __va_copy name was used in the draft C99 standard so it's widespread (I've read this on "man 3 va_copy"). > >> If 2.95.4 were not easily > >> replaced by > >> a much better version (3.3.x? 3.4.x) I would see a reason to disregard > >> this, but a fix > >> merely to satisfy an obsolete compiler? > > > > Let's not flame, Linus Torvalds said "we support GCC 2.95.3, because > > the newer > > versions are worse compilers in most cases". > You make it sound as if you were reciting Ye Holy Scribings. When did > Linus Thorvalds say this? In the Redhat-2.96 debacle? Before or after > 3.3? I have searched for that quote, Sorry for the quote marks, it was a resume of what he said (and from re-reading, it's still a correct resume). > but could not find it, and having > suffered under 3.1.1, I can well understand his wearyness for the > earlier versions. I've read the same kerneltrap article you quote. > See > > http://kerneltrap.org/node/4126, halfway down. Ok, read. > For the cold, hard facts... > > http://www.suse.de/~aj/SPEC/ Linus pointed out that SPEC performances are not a good test case for the kernel compilation in that article. Point out a kernel compilation case. > > > > Consider me as having no opinion on this except not wanting to break > > on purpose Debian users. > > If Debian users are stuck with a pretty outdated compiler, i'd > seriously suggest migrating to some > other distro which allows more freedom. I guess they can, if they want, upgrade some selected packages from newer trees, maybe by recompiling (at least, on Gentoo it's trivial, maybe on a binary distro like Debian it's harder). > If linux itself is holding them > back, there's a need for some serious patching. > If there are serious > issues in the gcc compiler, which hinder migration to a more up-to-date > version our efforts should be directed at solving them in that project, > not this. Linus spoke about the compiler speed, which isn't such a bad reason. He's unfair in saying that GCC 3.x does not optimize better than older releases, probably; I guess that the compilation flags (I refer to -fno-strict-aliasing, which disables some optimizations) make some difference, as do the memory barriers (as pointed in the comments). > > If you want, submit a patch removing Gcc 2.95.3 from supported > > versions, and get ready to fight > > for it (and probably loose). > I don't fight over things like that, i'm not interested in politics. I > merely point out the problem. And yes. > I do think support for obsolete compiler should be dumped in favor of a > more modern version. Especially if that compiler requires invasions of > compiler-namespace. The patch, as presented, is not guaranteed to be > portable over versions, and may thus introduce another problem with > future versions of GCC. When and if that will happen, I'll come with an hack. UML already has need for some GCC - version specific behaviour (see arch/um/kernel/gmon_syms.c on a recent BitKeeper snapshot, even -rc1-bk5 has this code). > > Also, that GCC has discovered some syscall table errors in UML - I > > sent a > > separate patch, which was a bit big sadly (in the reduced version, > > about 70 > > lines + description). > I am not quite sure what is intended here... Please explain. I'm reattaching the patch, so that you can look at the changelog (I'm also resending it as a separate email so that it is reviewed and possibly merged). Basically this is an error in GCC 2 and not in GCC 3: int [] list = { [0] = 1, [0] = 1 } (I've not tested the above itsel
Re: [08/08] uml: va_copy fix
On Wednesday 06 April 2005 14:04, Renate Meijer wrote: > On Apr 6, 2005, at 1:32 PM, Jörn Engel wrote: > > On Tue, 5 April 2005 22:18:26 +0200, Renate Meijer wrote: > > > > You did read include/linux/compiler.h, didn't you? > So instead of applying this patch, simply > > #ifdef VERSION_MINOR < WHATEVER > #define va_copy __va_copy > #endif > > in include/linux/compiler_gcc2.h > > Thus solving the problem without having to invade compiler namespace all > over the place, but doing so in *one* place only. About this one: thanks for suggesting this and being constructive, I'll do ASAP (if I don't forget) this for the -bk tree. However, I think that Greg KH for the stable tree would prefer a local tested patch rather than a global one with possible side effects, right Greg? Also, I hope this discussion does not count as a vote for the -stable tree inclusion (since dropping GCC 2 support in the -stable tree is exactly the purpose of this tree, right ;-) ? ). -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fwd: [uml-devel] [UML/2.6] -bk7 tree does not run when compiled as SKAS-only
Andrew, could you please put this in your -rc regressions folder? Thanks. -- Forwarded Message -- Subject: [uml-devel] [UML/2.6] -bk7 tree does not run when compiled as SKAS-only Date: Tuesday 22 March 2005 18:32 From: Blaisorblade <[EMAIL PROTECTED]> To: Jeff Dike <[EMAIL PROTECTED]>, Bodo Stroesser <[EMAIL PROTECTED]> Cc: user-mode-linux-devel@lists.sourceforge.net Just verified that without TT mode enabled, 2.6.11-bk7 tree compiles (when CONFIG_SYSCALL_DEBUG is disabled) but does not run if when compiled TT mode was disabled. I've verified this with a clean compile (I had this doubt), both with static link enabled and disabled. Sample output: ./vmlinux ubd0=~/Uml/toms.rootfs Checking for /proc/mm...found Checking for the skas3 patch in the host...found Checking PROT_EXEC mmap in /tmp...OK [end of output] 2.6.11 works in the same situation (both with static link enabled and disabled). I'm investigating but busy with other stuff, however there are not many patches which went in for this release. Jeff, any ideas? -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade --- This SF.net email is sponsored by: 2005 Windows Mobile Application Contest Submit applications for Windows Mobile(tm)-based Pocket PCs or Smartphones for the chance to win $25,000 and application distribution. Enter today at http://ads.osdn.com/?ad_id=6882&alloc_id=15148&op=click ___ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel --- -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/1] uml: quick fix syscall table [urgent]
CC: <[EMAIL PROTECTED]> I'm resending this for inclusion in the -stable tree. I've deleted whitespace cleanups, and hope this can be merged. I've been asked to split the former patch, I don't know if I must split again this one, even because I don't want to split this correct patch into multiple non-correct ones by mistake. Uml 2.6.11 does not compile with gcc 2.95.4 because some entries are duplicated, and that GCC does not accept this (unlike gcc 3). Plus various other bugs in the syscall table definitions, resulting in probable wrong syscall entries: *) 223 is a syscall hole (i.e. ni_syscall) only on i386, on x86_64 it's a valid syscall (thus a duplicated one). *) __NR_vserver must be only once with sys_ni_syscall, and not multiple times with different values! *) syscalls duplicated in SUBARCHs and in common files (thus assigning twice to the same array entry and causing the GCC 2.95.4 failure mentioned above): sys_utimes, which is common, and sys_fadvise64_64, sys_statfs64, sys_fstatfs64, which exist only on i386. *) syscalls duplicated in each SUBARCH, to put in common files: sys_remap_file_pages, sys_utimes, sys_fadvise64 *) 285 is a syscall hole (i.e. ni_syscall) only on i386, on x86_64 the range does not arrive to that point. *) on x86_64, the macro name is __NR_kexec_load and not __NR_sys_kexec_load. Use the correct name in either case. Note: as you can see, part of the syscall table definition in UML is arch-independent (with everywhere defined syscalls), and part is arch-dependant. This has created confusion (some syscalls are listed in both places, some in the wrong one, some are wrong on one arch or another). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- clean-linux-2.6.11-paolo/arch/um/include/sysdep-i386/syscalls.h | 12 +- clean-linux-2.6.11-paolo/arch/um/include/sysdep-x86_64/syscalls.h |5 clean-linux-2.6.11-paolo/arch/um/kernel/sys_call_table.c | 11 +++-- 3 files changed, 10 insertions(+), 18 deletions(-) diff -puN arch/um/include/sysdep-i386/syscalls.h~uml-quick-fix-syscall-table-for-stable arch/um/include/sysdep-i386/syscalls.h --- clean-linux-2.6.11/arch/um/include/sysdep-i386/syscalls.h~uml-quick-fix-syscall-table-for-stable 2005-04-05 16:56:57.0 +0200 +++ clean-linux-2.6.11-paolo/arch/um/include/sysdep-i386/syscalls.h 2005-04-05 16:56:57.0 +0200 @@ -23,6 +23,9 @@ extern long sys_mmap2(unsigned long addr unsigned long prot, unsigned long flags, unsigned long fd, unsigned long pgoff); +/* On i386 they choose a meaningless naming.*/ +#define __NR_kexec_load __NR_sys_kexec_load + #define ARCH_SYSCALLS \ [ __NR_waitpid ] = (syscall_handler_t *) sys_waitpid, \ [ __NR_break ] = (syscall_handler_t *) sys_ni_syscall, \ @@ -101,15 +104,12 @@ extern long sys_mmap2(unsigned long addr [ 223 ] = (syscall_handler_t *) sys_ni_syscall, \ [ __NR_set_thread_area ] = (syscall_handler_t *) sys_ni_syscall, \ [ __NR_get_thread_area ] = (syscall_handler_t *) sys_ni_syscall, \ - [ __NR_fadvise64 ] = (syscall_handler_t *) sys_fadvise64, \ [ 251 ] = (syscall_handler_t *) sys_ni_syscall, \ -[ __NR_remap_file_pages ] = (syscall_handler_t *) sys_remap_file_pages, \ - [ __NR_utimes ] = (syscall_handler_t *) sys_utimes, \ - [ __NR_vserver ] = (syscall_handler_t *) sys_ni_syscall, - + [ 285 ] = (syscall_handler_t *) sys_ni_syscall, + /* 222 doesn't yet have a name in include/asm-i386/unistd.h */ -#define LAST_ARCH_SYSCALL __NR_vserver +#define LAST_ARCH_SYSCALL 285 /* * Overrides for Emacs so that we follow Linus's tabbing style. diff -puN arch/um/include/sysdep-x86_64/syscalls.h~uml-quick-fix-syscall-table-for-stable arch/um/include/sysdep-x86_64/syscalls.h --- clean-linux-2.6.11/arch/um/include/sysdep-x86_64/syscalls.h~uml-quick-fix-syscall-table-for-stable 2005-04-05 16:56:57.0 +0200 +++ clean-linux-2.6.11-paolo/arch/um/include/sysdep-x86_64/syscalls.h 2005-04-05 16:56:57.0 +0200 @@ -71,12 +71,7 @@ extern syscall_handler_t sys_arch_prctl; [ __NR_iopl ] = (syscall_handler_t *) sys_ni_syscall, \ [ __NR_set_thread_area ] = (syscall_handler_t *) sys_ni_syscall, \ [ __NR_get_thread_area ] = (syscall_handler_t *) sys_ni_syscall, \ -[ __NR_remap_file_pages ] = (syscall_handler_t *) sys_remap_file_pages, \ [ __NR_semtimedop ] = (syscall_handler_t *) sys_semtimedop, \ - [ __NR_fadvise64 ] = (syscall_handler_t *) sys_fadvise64, \ - [ 223 ] = (syscall_handler_t *) sys_ni_syscall, \ - [ __NR_utimes ] = (syscall_handler_t *) sys_utimes, \ - [ __NR_vserver ] = (syscall_handler_t *) sys_ni_syscall, \ [ 251 ] = (syscall_handler_t *) sys_ni_syscall, #define LAST_ARCH_SYSCALL 251 diff -puN a
Re: [uml-devel] [linux-2.6-bk] UML compile broken!
On Wednesday 06 April 2005 15:16, Anton Altaparmakov wrote: > Uml compile is btoken in current linus bk 2.6: > > CC arch/um/kernel/ptrace.o > arch/um/kernel/ptrace.c: In function `send_sigtrap': > arch/um/kernel/ptrace.c:324: warning: implicit declaration of function > `SC_IP' > arch/um/kernel/ptrace.c:324: error: union has no member named `tt' > arch/um/kernel/ptrace.c:324: error: union has no member named `tt' > arch/um/kernel/ptrace.c:324: error: invalid lvalue in unary `&' > make[1]: *** [arch/um/kernel/ptrace.o] Error 1 > make: *** [arch/um/kernel] Error 2 > > My .config is attached. I suspect it is because I am not compiling in > TT support and only SKAS... Well, good guess - you're getting more and more used with UML! Yes, the fix is in -mm. Quoting from -rc2-mm1 announce: +uml-fix-compilation-for-__choose_mode-addition.patch UML fix Andrew, can you merge it now, if you want, after Anton verifies it's the correct fix indeed for his problem? I *do* expect his situation to fail without the patch, but just to be more sure. However, I recall with 2.6.11-bk7 a slightly different problem, when compiling only SKAS mode in, and I don't think this has been fixed: [uml-devel] [UML/2.6] -bk7 tree does not run when compiled as SKAS-only I'm forwarding that mail to LKML and you, Andrew - for your -rc regressions mail folder. -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] [patch 1/1] uml: quick fix syscall table [urgent]
On Wednesday 06 April 2005 22:21, Greg KH wrote: > On Wed, Apr 06, 2005 at 08:38:00PM +0200, [EMAIL PROTECTED] wrote: > > CC: <[EMAIL PROTECTED]> > > > > I'm resending this for inclusion in the -stable tree. I've deleted > > whitespace cleanups, and hope this can be merged. I've been asked to > > split the former patch, I don't know if I must split again this one, even > > because I don't want to split this correct patch into multiple > > non-correct ones by mistake. > > Is this patch already in 2.6.12-rc2? Yes, with whitespace cleanups. -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [08/08] uml: va_copy fix
On Thursday 07 April 2005 11:16, Renate Meijer wrote: > On Apr 6, 2005, at 9:09 PM, Blaisorblade wrote: > > Btw: I've not investigated which one of the two behaviours is the > > buggy one - > > if you know, maybe you or I can report it. > > From a strict ISO-C point of view, both are. It's a gcc-specific > "feature" which (agreed) does come in handy sometimes. Well, for "range" assignments GCC mustn't complain, but for the rest the double assignment laziness is not very useful. Could they at least add a -Wsomething inside -Wall or -W for this problem? > However it makes > it quite hard to say which is the buggy version, since the > "appropriate" behavior > is a question of definition (by the gcc-folks). They may even argue > that, having changed their minds about it, neither is buggy, but both > conform to the specifications (for that specific functionality). > > That's pretty much the trouble with relying on gcc-extensions: since > there's no standard, it's difficult to tell what's wrong and what's > right. I'll dive into it. > > Regards, > > Renate Meijer. -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/1] reiserfs: make resize option auto-get new device size
Cc: <[EMAIL PROTECTED]>, , <[EMAIL PROTECTED]> It's trivial for the resize option to auto-get the underlying device size, while it's harder for the user. I've copied the code from jfs. Since of the different reiserfs option parser (which does not use the superior match_token used by almost every other filesystem), I've had to use the "resize=auto" and not "resize" option to specify this behaviour. Changing the option parser to the kernel one wouldn't be bad but I've no time to do this cleanup in this moment. Btw, the mount(8) man page should be updated to include this option. Cc the relevant people, please (I hope I cc'ed the right people). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.11-paolo/fs/reiserfs/super.c | 21 ++--- 1 files changed, 14 insertions(+), 7 deletions(-) diff -puN fs/reiserfs/super.c~reiserfs-resize-option-like-jfs-auto-get fs/reiserfs/super.c --- linux-2.6.11/fs/reiserfs/super.c~reiserfs-resize-option-like-jfs-auto-get 2005-04-07 20:37:58.0 +0200 +++ linux-2.6.11-paolo/fs/reiserfs/super.c 2005-04-08 01:01:18.0 +0200 @@ -889,12 +889,18 @@ static int reiserfs_parse_options (struc char * p; p = NULL; - /* "resize=NNN" */ - *blocks = simple_strtoul (arg, &p, 0); - if (*p != '\0') { - /* NNN does not look like a number */ - reiserfs_warning (s, "reiserfs_parse_options: bad value %s", arg); - return 0; + /* "resize=NNN" or "resize=auto" */ + + if (!strcmp(arg, "auto")) { + /* From JFS code, to auto-get the size.*/ + *blocks = s->s_bdev->bd_inode->i_size >> s->s_blocksize_bits; + } else { + *blocks = simple_strtoul (arg, &p, 0); + if (*p != '\0') { + /* NNN does not look like a number */ + reiserfs_warning (s, "reiserfs_parse_options: bad value %s", arg); + return 0; + } } } @@ -903,7 +909,8 @@ static int reiserfs_parse_options (struc unsigned long val = simple_strtoul (arg, &p, 0); /* commit=NNN (time in seconds) */ if ( *p != '\0' || val >= (unsigned int)-1) { - reiserfs_warning (s, "reiserfs_parse_options: bad value %s", arg); return 0; + reiserfs_warning (s, "reiserfs_parse_options: bad value %s", arg); + return 0; } *commit_max_age = (unsigned int)val; } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/1] reiserfs: make resize option auto-get new device size
On Friday 08 April 2005 10:10, Alex Zarochentsev wrote: > Hi, > > On Fri, Apr 08, 2005 at 06:55:50AM +0200, [EMAIL PROTECTED] wrote: > > Cc: <[EMAIL PROTECTED]>, , > > <[EMAIL PROTECTED]> > > > > It's trivial for the resize option to auto-get the underlying device > > size, while it's harder for the user. I've copied the code from jfs. > > > > Since of the different reiserfs option parser (which does not use the > > superior match_token used by almost every other filesystem), I've had to > > use the "resize=auto" and not "resize" option to specify this behaviour. > > Changing the option parser to the kernel one wouldn't be bad but I've no > > time to do this cleanup in this moment. > > do people really need it? Note we are speaking of 2 lines of code. And there's no point in omitting this. > user-level utility reisize_reiserfs, being called w/o size argument, > calculates the device size and uses resize mount option with correct value. Yes, I know this. But the old versions (the one shipped on Mdk) didn't work for online resizing (this was verified by me with lots of warnings and a Oops on reiserfs code); in fact, this ability is so new that is not even documented in manpages. -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/1] uml: add nfsd syscall when nfsd is modular
CC: <[EMAIL PROTECTED]> This trick is useless, because sys_ni.c will handle this problem by itself, like it does even on UML for other syscalls. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- clean-linux-2.6.11-paolo/arch/um/kernel/sys_call_table.c |8 +--- 1 files changed, 1 insertion(+), 7 deletions(-) diff -puN arch/um/kernel/sys_call_table.c~uml-nfsd-syscall arch/um/kernel/sys_call_table.c --- clean-linux-2.6.11/arch/um/kernel/sys_call_table.c~uml-nfsd-syscall 2005-04-10 13:50:29.0 +0200 +++ clean-linux-2.6.11-paolo/arch/um/kernel/sys_call_table.c2005-04-10 13:51:19.0 +0200 @@ -14,12 +14,6 @@ #include "sysdep/syscalls.h" #include "kern_util.h" -#ifdef CONFIG_NFSD -#define NFSSERVCTL sys_nfsservctl -#else -#define NFSSERVCTL sys_ni_syscall -#endif - #define LAST_GENERIC_SYSCALL __NR_keyctl #if LAST_GENERIC_SYSCALL > LAST_ARCH_SYSCALL @@ -190,7 +184,7 @@ syscall_handler_t *sys_call_table[] = { [ __NR_getresuid ] = (syscall_handler_t *) sys_getresuid16, [ __NR_query_module ] = (syscall_handler_t *) sys_ni_syscall, [ __NR_poll ] = (syscall_handler_t *) sys_poll, - [ __NR_nfsservctl ] = (syscall_handler_t *) NFSSERVCTL, + [ __NR_nfsservctl ] = (syscall_handler_t *) sys_nfsservctl, [ __NR_setresgid ] = (syscall_handler_t *) sys_setresgid16, [ __NR_getresgid ] = (syscall_handler_t *) sys_getresgid16, [ __NR_prctl ] = (syscall_handler_t *) sys_prctl, _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/1] uml: add nfsd syscall when nfsd is modular
CC: <[EMAIL PROTECTED]> This trick is useless, because sys_ni.c will handle this problem by itself, like it does even on UML for other syscalls. Also, it does not provide the NFSD syscall when NFSD is compiled as a module, which is a big problem. This should be merged currently in both 2.6.11-stable and the current tree. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- clean-linux-2.6.11-paolo/arch/um/kernel/sys_call_table.c |8 +--- 1 files changed, 1 insertion(+), 7 deletions(-) diff -puN arch/um/kernel/sys_call_table.c~uml-nfsd-syscall arch/um/kernel/sys_call_table.c --- clean-linux-2.6.11/arch/um/kernel/sys_call_table.c~uml-nfsd-syscall 2005-04-10 13:50:29.0 +0200 +++ clean-linux-2.6.11-paolo/arch/um/kernel/sys_call_table.c2005-04-10 13:51:19.0 +0200 @@ -14,12 +14,6 @@ #include "sysdep/syscalls.h" #include "kern_util.h" -#ifdef CONFIG_NFSD -#define NFSSERVCTL sys_nfsservctl -#else -#define NFSSERVCTL sys_ni_syscall -#endif - #define LAST_GENERIC_SYSCALL __NR_keyctl #if LAST_GENERIC_SYSCALL > LAST_ARCH_SYSCALL @@ -190,7 +184,7 @@ syscall_handler_t *sys_call_table[] = { [ __NR_getresuid ] = (syscall_handler_t *) sys_getresuid16, [ __NR_query_module ] = (syscall_handler_t *) sys_ni_syscall, [ __NR_poll ] = (syscall_handler_t *) sys_poll, - [ __NR_nfsservctl ] = (syscall_handler_t *) NFSSERVCTL, + [ __NR_nfsservctl ] = (syscall_handler_t *) sys_nfsservctl, [ __NR_setresgid ] = (syscall_handler_t *) sys_setresgid16, [ __NR_getresgid ] = (syscall_handler_t *) sys_getresgid16, [ __NR_prctl ] = (syscall_handler_t *) sys_prctl, _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
CONFIG_REGPARM - prevent_tail_call doubts (context: SKAS3 bug in detail)
I just diagnosed (and announced) a big bug affecting the SKAS3 patch: namely, syscall parameter values stored in registers may be corrupted for some syscalls on return, when called through int 0x80, and when CONFIG_REGPARM is enabled. Ok, the diagnosys of the SKAS3 bug I just noticed is that simply, this construct: int do_foo(params...) { } asmlinkage int sys_foo(params...) { return do_foo(a_new_param, params...); } does not work, because sys_foo() is optimized to reorder parameters on the stack and to tail-call do_foo. The corrupted parameters on the stack will then be restored (when calling with int $0x80) inside the userspace registers. From entry.S, especially from this comment: /* if something modifies registers it must also disable sysexit */ it's clear that when using SYSENTER registers are not restored (even verified through sys_iopl() code, which touched EFLAGS). I've used prevent_tail_call to fix this, and it works (verified with tests and assembly inspection). I even think I've understood why it works... it's clear why it disallows tail call, but I thought that GCC could create a normal call reusing some space from the stack frame of sys_foo, to create the stack frame of do_foo... it's just that it wouldn't improve speed. This construct is used for four syscalls (sys_mmap2, old_mmap, sys_mprotect, sys_modify_ldt) and I verified the bug for all sys_mmap2 and sys_mprotect, and I'm sure about modify_ldt because the compiled code is identical to sys_mprotect(). I initially noticed this with the errno-vs-NPTL fix I and Al Viro discussed some time ago: it used indeed mmap2() and triggered the bug. Luckily, strace reads the correct data (since syscall params are read before the syscall is done) so I couldn't do anything else than understand something bad was happening. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [uml-devel] Re: [patch 1/1] uml: fix lvalue for gcc4
On Saturday 09 July 2005 13:07, Russell King wrote: > On Sat, Jul 09, 2005 at 01:01:33PM +0200, [EMAIL PROTECTED] wrote: > > diff -puN arch/um/sys-x86_64/signal.c~uml-fix-for-gcc4-lvalue > > arch/um/sys-x86_64/signal.c --- > > linux-2.6.git/arch/um/sys-x86_64/signal.c~uml-fix-for-gcc4-lvalue 2005-07 > >-09 13:01:03.0 +0200 +++ > > linux-2.6.git-paolo/arch/um/sys-x86_64/signal.c 2005-07-09 > > 13:01:03.0 +0200 @@ -168,7 +168,7 @@ int > > setup_signal_stack_si(unsigned long > > > > frame = (struct rt_sigframe __user *) > > round_down(stack_top - sizeof(struct rt_sigframe), 16) - 8; > > - ((unsigned char *) frame) -= 128; > > + frame -= 128 / sizeof(frame); > > Are you sure these two are identical? SORRY, I've become crazy, I meant sizeof(*frame)... thanks for noticing. > The above code fragment looks suspicious anyway, particularly: > > frame = (struct rt_sigframe __user *) > round_down(stack_top - sizeof(struct rt_sigframe), 16) - 8; > > which will put the frame at 8 * sizeof(struct rt_sigframe) below > the point which round_down() would return (which would be 1 struct > rt_sigframe below stack_top, rounded down). You're completely right. The code is copied from arch/x86_64/kernel/signal.c:setup_rt_frame(), so it should make some sense; but in the source, the cast is to (void*). Surely Jeff, seeing that the result is assigned to a struct rt_sigframe __user, "fixed" it. The line I'm patching is new from Jeff, and I don't know what's about (I just remember that Also, the below access_ok() called on fp (which is still NULL) is surely completely wrong, though it won't fail (after all, NULL is under TASK_SIZE. right?). On x86_64 the code is always used from arch/um/kernel/signal_kern.c, since CONFIG_whatever is not enabled. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [uml-devel] unregister_netdevice: waiting for tap24 to become free
On Monday 11 July 2005 21:12, Peter wrote: > Hi. I am hitting a bug that manifests in an unregister_netdevice error > message. After the problem is triggered processes like ifconfig, tunctl > and route refuse to exit, even with killed. Even from the "D" state below, it's clear that there was a deadlock on some semaphore, related to tap24... Could you search your kernel logs for traces of an Oops? > And the only solution I > have found to regaining control of the server is issuing a reboot. > The server is running a number of tap devices. (It is a UML host server > running the skas patches http://www.user-mode-linux.org/~blaisorblade/). > > Regards, Peter > > # uname -r > 2.6.11.7-skas3-v8 > > unregister_netdevice: waiting for tap24 to become free. Usage count = 1 > unregister_netdevice: waiting for tap24 to become free. Usage count = 1 > unregister_netdevice: waiting for tap24 to become free. Usage count = 1 > unregister_netdevice: waiting for tap24 to become free. Usage count = 1 > unregister_netdevice: waiting for tap24 to become free. Usage count = 1 > unregister_netdevice: waiting for tap24 to become free. Usage count = 1 > unregister_netdevice: waiting for tap24 to become free. Usage count = 1 > > > 30684 ?DW 0:45 \_ [tunctl] > 31974 ?S 0:00 /bin/bash ./monitorbw.sh > 31976 ?S 0:00 \_ /bin/bash ./monitorbw.sh > 31978 ?D 0:00 \_ /sbin/ifconfig > 31979 ?S 0:00 \_ grep \(tap\)\|\(RX bytes\) > 32052 ?S 0:00 /bin/bash /opt/uml/umlcontrol.sh start --user > gildersleeve.de > 32112 ?S 0:00 \_ /bin/bash /opt/uml/umlrun.sh --user > gildersleeve.de > 32152 ?S 0:00 \_ /bin/bash ./umlnetworksetup.sh > --check --user gildersleeve.de > 32176 ?D 0:00 \_ tunctl -u gildersleeve.de -t tap24 > > > --- > This SF.Net email is sponsored by the 'Do More With Dual!' webinar > happening July 14 at 8am PDT/11am EDT. We invite you to explore the latest > in dual core and dual graphics technology at this free one hour event > hosted by HP, AMD, and NVIDIA. To register visit > http://www.hp.com/go/dualwebinar > ___ > User-mode-linux-devel mailing list > User-mode-linux-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.beta.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unregister_netdevice: waiting for tap24 to become free
On Tuesday 12 July 2005 00:26, Peter wrote: > Nothing in the logs prior to the first error message. > > I've hit this before at different times on other servers. If there are > some commands I can run to gather more diagnostics on the problem, > please let me know and I'll capture more information next time. > > I see the error was reported with older 2.6 kernels and a patch was > floating around. I'm not sure if that is integrated into the current > 2.6.11 kernel. The patch named there has been integrated, verifyable at http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED] However this time the bug is probably due to something entirely different, the message is not very specific. Tried 2.6.12? SKAS has been already updated (plus there's an important update for SKAS, from -V8 to -V8.2). > http://www.google.com/search?q=unregister_netdevice%3A+waiting > > Regards, Peter -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/1] uml: fix TT mode by reverting "use fork instead of clone"
From: Jeff Dike <[EMAIL PROTECTED]>, Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Revert the following patch, because of miscompilation problems in different environments leading to UML not working *at all* in TT mode; it was merged lately in 2.6 development cycle, a little after being written, and has caused problems to lots of people; I know it's a bit too long, but it shouldn't have been merged in first place, so I still apply for inclusion in the -stable tree. Anyone using this feature currently is either using some older kernel (some reports even used 2.6.12-rc4-mm2) or using this patch, as included in my -bs patchset. For now there's not yet a fix for this patch, so for now the best thing is to drop it (which was widely reported to give a working kernel). "Convert the boot-time host ptrace testing from clone to fork. They were essentially doing fork anyway. This cleans up the code a bit, and makes valgrind a bit happier about grinding it." URL: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=98fdffccea6cc3fe9dba32c0fcc310bcb5d71529 Signed-off-by: Jeff Dike <[EMAIL PROTECTED]> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- vanilla-linux-2.6.12-paolo/arch/um/kernel/process.c | 48 1 files changed, 29 insertions(+), 19 deletions(-) diff -puN arch/um/kernel/process.c~uml-revert-fork-instead-of-clone arch/um/kernel/process.c --- vanilla-linux-2.6.12/arch/um/kernel/process.c~uml-revert-fork-instead-of-clone 2005-07-12 18:22:03.0 +0200 +++ vanilla-linux-2.6.12-paolo/arch/um/kernel/process.c 2005-07-12 18:22:03.0 +0200 @@ -130,7 +130,7 @@ int start_fork_tramp(void *thread_arg, u return(arg.pid); } -static int ptrace_child(void) +static int ptrace_child(void *arg) { int ret; int pid = os_getpid(), ppid = getppid(); @@ -159,16 +159,20 @@ static int ptrace_child(void) _exit(ret); } -static int start_ptraced_child(void) +static int start_ptraced_child(void **stack_out) { + void *stack; + unsigned long sp; int pid, n, status; - pid = fork(); - if(pid == 0) - ptrace_child(); - + stack = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC, +MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if(stack == MAP_FAILED) + panic("check_ptrace : mmap failed, errno = %d", errno); + sp = (unsigned long) stack + PAGE_SIZE - sizeof(void *); + pid = clone(ptrace_child, (void *) sp, SIGCHLD, NULL); if(pid < 0) - panic("check_ptrace : fork failed, errno = %d", errno); + panic("check_ptrace : clone failed, errno = %d", errno); CATCH_EINTR(n = waitpid(pid, &status, WUNTRACED)); if(n < 0) panic("check_ptrace : wait failed, errno = %d", errno); @@ -176,6 +180,7 @@ static int start_ptraced_child(void) panic("check_ptrace : expected SIGSTOP, got status = %d", status); + *stack_out = stack; return(pid); } @@ -183,12 +188,12 @@ static int start_ptraced_child(void) * just avoid using sysemu, not panic, but only if SYSEMU features are broken. * So only for SYSEMU features we test mustpanic, while normal host features * must work anyway!*/ -static int stop_ptraced_child(int pid, int exitcode, int mustexit) +static int stop_ptraced_child(int pid, void *stack, int exitcode, int mustpanic) { int status, n, ret = 0; if(ptrace(PTRACE_CONT, pid, 0, 0) < 0) - panic("stop_ptraced_child : ptrace failed, errno = %d", errno); + panic("check_ptrace : ptrace failed, errno = %d", errno); CATCH_EINTR(n = waitpid(pid, &status, 0)); if(!WIFEXITED(status) || (WEXITSTATUS(status) != exitcode)) { int exit_with = WEXITSTATUS(status); @@ -199,13 +204,15 @@ static int stop_ptraced_child(int pid, i printk("check_ptrace : child exited with exitcode %d, while " "expecting %d; status 0x%x", exit_with, exitcode, status); - if (mustexit) + if (mustpanic) panic("\n"); else printk("\n"); ret = -1; } + if(munmap(stack, PAGE_SIZE) < 0) + panic("check_ptrace : munmap failed, errno = %d", errno); return ret; } @@ -227,11 +234,12 @@ __uml_setup("nosysemu", nosysemu_cmd_par static void __init check_sysemu(void) { + void *stack; int pid, syscall, n, status, count=0; printk("Checking syscall emulation patch for ptrace..."); sysemu_
Re: [stable] [patch 1/1] uml: fix TT mode by reverting "use fork instead of clone"
On Tuesday 12 July 2005 20:50, Chris Wright wrote: > * [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote: > > Revert the following patch, because of miscompilation problems in > > different environments leading to UML not working *at all* in TT mode; it > > was merged lately in 2.6 development cycle, a little after being written, > > and has caused problems to lots of people; I know it's a bit too long, > > but it shouldn't have been merged in first place, so I still apply for > > inclusion in the -stable tree. Anyone using this feature currently is > > either using some older kernel (some reports even used 2.6.12-rc4-mm2) or > > using this patch, as included in my -bs patchset. > > For now there's not yet a fix for this patch, so for now the best thing > > is to drop it (which was widely reported to give a working kernel). > And upstream will leave this in, working to real fix? Preferably yes, but this depends on whether the fix is found. Otherwise this exact patch will be merged upstream too. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/9] uml: consolidate modify_ldt
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> *) Reorganize the two cases of sys_modify_ldt to share all the reasonably common code. *) Avoid memory allocation when unneeded (i.e. when we are writing and the passed buffer size is known), thus not returning ENOMEM (which isn't allowed for this syscall, even if there is no strict "specification"). *) Add copy_{from,to}_user to modify_ldt for TT mode. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-broken-paolo/arch/um/sys-i386/ldt.c | 112 +++--- linux-2.6.git-broken-paolo/include/asm-um/ldt.h |5 2 files changed, 66 insertions(+), 51 deletions(-) diff -puN arch/um/sys-i386/ldt.c~uml-modify-ldt-consolidate arch/um/sys-i386/ldt.c --- linux-2.6.git-broken/arch/um/sys-i386/ldt.c~uml-modify-ldt-consolidate 2005-07-13 19:41:00.0 +0200 +++ linux-2.6.git-broken-paolo/arch/um/sys-i386/ldt.c 2005-07-13 19:41:00.0 +0200 @@ -4,96 +4,106 @@ */ #include "linux/config.h" +#include "linux/sched.h" #include "linux/slab.h" +#include "linux/types.h" #include "asm/uaccess.h" #include "asm/ptrace.h" +#include "asm/smp.h" +#include "asm/ldt.h" #include "choose-mode.h" #include "kern.h" +#include "mode_kern.h" #ifdef CONFIG_MODE_TT -extern int modify_ldt(int func, void *ptr, unsigned long bytecount); -/* XXX this needs copy_to_user and copy_from_user */ +extern int modify_ldt(int func, void *ptr, unsigned long bytecount); -int sys_modify_ldt_tt(int func, void __user *ptr, unsigned long bytecount) +static int do_modify_ldt_tt(int func, void *ptr, unsigned long bytecount) { - if (!access_ok(VERIFY_READ, ptr, bytecount)) - return -EFAULT; - return modify_ldt(func, ptr, bytecount); } + #endif #ifdef CONFIG_MODE_SKAS -extern int userspace_pid[]; +#include "skas.h" #include "skas_ptrace.h" -int sys_modify_ldt_skas(int func, void __user *ptr, unsigned long bytecount) +static int do_modify_ldt_skas(int func, void *ptr, unsigned long bytecount) { struct ptrace_ldt ldt; - void *buf; - int res, n; + u32 cpu; + int res; + + ldt = ((struct ptrace_ldt) { .func = func, +.ptr = ptr, +.bytecount = bytecount }); - buf = kmalloc(bytecount, GFP_KERNEL); - if(buf == NULL) - return(-ENOMEM); + cpu = get_cpu(); + res = ptrace(PTRACE_LDT, userspace_pid[cpu], 0, (unsigned long) &ldt); + put_cpu(); - res = 0; + return res; +} +#endif + +int sys_modify_ldt(int func, void __user *ptr, unsigned long bytecount) +{ + struct user_desc info; + int res = 0; + void *buf = NULL; + void *p = NULL; /* What we pass to host. */ switch(func){ case 1: - case 0x11: - res = copy_from_user(buf, ptr, bytecount); + case 0x11: /* write_ldt */ + /* Do this check now to avoid overflows. */ + if (bytecount != sizeof(struct user_desc)) { + res = -EINVAL; + goto out; + } + + if(copy_from_user(&info, ptr, sizeof(info))) { + res = -EFAULT; + goto out; + } + + p = &info; break; - } + case 0: + case 2: /* read_ldt */ - if(res != 0){ - res = -EFAULT; + /* The use of info avoids kmalloc on the write case, not on the +* read one. */ + buf = kmalloc(bytecount, GFP_KERNEL); + if (!buf) { + res = -ENOMEM; + goto out; + } + p = buf; + default: + res = -ENOSYS; goto out; } - ldt = ((struct ptrace_ldt) { .func = func, -.ptr = buf, -.bytecount = bytecount }); -#warning Need to look up userspace_pid by cpu - res = ptrace(PTRACE_LDT, userspace_pid[0], 0, (unsigned long) &ldt); + res = CHOOSE_MODE_PROC(do_modify_ldt_tt, do_modify_ldt_skas, func, + p, bytecount); if(res < 0) goto out; switch(func){ case 0: case 2: - n = res; - res = copy_to_user(ptr, buf, n); - if(res != 0) + /* Modify_ldt was for reading and returned the number of read +* bytes.*/ + if(copy_to_user(ptr, p, res)) res = -EFAULT; - else - res = n;
[patch 2/9] uml: workaround host bug in "TT mode vs. NPTL link fix"
A big bug has been diagnosed on hosts running the SKAS patch and built with CONFIG_REGPARM, due to some missing prevent_tail_call(). On these hosts, this workaround is needed to avoid triggering that bug, because "to" is kept by GCC only in EBX, which is corrupted at the return of mmap2(). Since to trigger this bug int 0x80 must be used when doing the call, it rarely manifests itself, so I'd prefer to get this merged to workaround that host bug, since it should cause no functional change. Still, you might prefer to drop it, I'll leave this to you. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-broken-paolo/arch/um/sys-i386/unmap.c |2 +- linux-2.6.git-broken-paolo/arch/um/sys-x86_64/unmap.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff -puN arch/um/sys-i386/unmap.c~uml-fix-link-tt-mode-against-nptl arch/um/sys-i386/unmap.c --- linux-2.6.git-broken/arch/um/sys-i386/unmap.c~uml-fix-link-tt-mode-against-nptl 2005-07-13 19:37:10.0 +0200 +++ linux-2.6.git-broken-paolo/arch/um/sys-i386/unmap.c 2005-07-13 19:37:32.0 +0200 @@ -15,7 +15,7 @@ int switcheroo(int fd, int prot, void *f if(munmap(to, size) < 0){ return(-1); } - if(mmap2(to, size, prot, MAP_SHARED | MAP_FIXED, fd, 0) != to){ + if(mmap2(to, size, prot, MAP_SHARED | MAP_FIXED, fd, 0) == (void*) -1 ){ return(-1); } if(munmap(from, size) < 0){ diff -puN arch/um/sys-x86_64/unmap.c~uml-fix-link-tt-mode-against-nptl arch/um/sys-x86_64/unmap.c --- linux-2.6.git-broken/arch/um/sys-x86_64/unmap.c~uml-fix-link-tt-mode-against-nptl 2005-07-13 19:37:10.0 +0200 +++ linux-2.6.git-broken-paolo/arch/um/sys-x86_64/unmap.c 2005-07-13 19:37:32.0 +0200 @@ -15,7 +15,7 @@ int switcheroo(int fd, int prot, void *f if(munmap(to, size) < 0){ return(-1); } - if(mmap(to, size, prot, MAP_SHARED | MAP_FIXED, fd, 0) != to){ + if(mmap(to, size, prot, MAP_SHARED | MAP_FIXED, fd, 0) == (void*) -1){ return(-1); } if(munmap(from, size) < 0){ _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 5/9] uml: fix hppfs error path
Fix the error message to refer to the error code, i.e. err, not count, plus add some cosmetical fixes. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-broken-paolo/fs/hppfs/hppfs_kern.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff -puN fs/hppfs/hppfs_kern.c~uml-hppfs-error-case fs/hppfs/hppfs_kern.c --- linux-2.6.git-broken/fs/hppfs/hppfs_kern.c~uml-hppfs-error-case 2005-07-13 19:41:36.0 +0200 +++ linux-2.6.git-broken-paolo/fs/hppfs/hppfs_kern.c2005-07-13 19:41:36.0 +0200 @@ -233,7 +233,7 @@ static ssize_t read_proc(struct file *fi set_fs(USER_DS); if(ppos) *ppos = file->f_pos; - return(n); + return n; } static ssize_t hppfs_read_file(int fd, char *buf, ssize_t count) @@ -254,7 +254,7 @@ static ssize_t hppfs_read_file(int fd, c err = os_read_file(fd, new_buf, cur); if(err < 0){ printk("hppfs_read : read failed, errno = %d\n", - count); + err); n = err; goto out_free; } @@ -271,7 +271,7 @@ static ssize_t hppfs_read_file(int fd, c out_free: kfree(new_buf); out: - return(n); + return n; } static ssize_t hppfs_read(struct file *file, char *buf, size_t count, _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 8/9] uml - hostfs : unuse ROOT_DEV
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> CC: Christoph Hellwig <[EMAIL PROTECTED]> Minimal patch removing uses of ROOT_DEV; next patch unexports it. I've opposed this, but I've planned to reintroduce the functionality without using ROOT_DEV. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-broken-paolo/fs/hostfs/hostfs_kern.c |9 - 1 files changed, 9 deletions(-) diff -puN fs/hostfs/hostfs_kern.c~uml-hostfs-remove-root_dev-simple fs/hostfs/hostfs_kern.c --- linux-2.6.git-broken/fs/hostfs/hostfs_kern.c~uml-hostfs-remove-root_dev-simple 2005-07-13 19:58:18.0 +0200 +++ linux-2.6.git-broken-paolo/fs/hostfs/hostfs_kern.c 2005-07-13 19:58:18.0 +0200 @@ -15,7 +15,6 @@ #include #include #include -#include #include #include #include @@ -160,8 +159,6 @@ static int read_name(struct inode *ino, ino->i_size = i_size; ino->i_blksize = i_blksize; ino->i_blocks = i_blocks; - if((ino->i_sb->s_dev == ROOT_DEV) && (ino->i_uid == getuid())) - ino->i_uid = 0; return(0); } @@ -841,16 +838,10 @@ int hostfs_setattr(struct dentry *dentry attrs.ia_mode = attr->ia_mode; } if(attr->ia_valid & ATTR_UID){ - if((dentry->d_inode->i_sb->s_dev == ROOT_DEV) && - (attr->ia_uid == 0)) - attr->ia_uid = getuid(); attrs.ia_valid |= HOSTFS_ATTR_UID; attrs.ia_uid = attr->ia_uid; } if(attr->ia_valid & ATTR_GID){ - if((dentry->d_inode->i_sb->s_dev == ROOT_DEV) && - (attr->ia_gid == 0)) - attr->ia_gid = getgid(); attrs.ia_valid |= HOSTFS_ATTR_GID; attrs.ia_gid = attr->ia_gid; } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 9/9] remove EXPORT_SYMBOL for root_dev
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> CC: Christoph Hellwig <[EMAIL PROTECTED]> Remove ROOT_DEV after unexporting it in the previous patch, as requested time ago by Christoph Hellwig. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-broken-paolo/init/do_mounts.c |2 -- 1 files changed, 2 deletions(-) diff -puN init/do_mounts.c~remove-export-root_dev init/do_mounts.c --- linux-2.6.git-broken/init/do_mounts.c~remove-export-root_dev 2005-07-13 19:59:50.0 +0200 +++ linux-2.6.git-broken-paolo/init/do_mounts.c 2005-07-13 19:59:50.0 +0200 @@ -25,8 +25,6 @@ static char __initdata saved_root_name[6 /* this is initialized in init/main.c */ dev_t ROOT_DEV; -EXPORT_SYMBOL(ROOT_DEV); - static int __init load_ramdisk(char *str) { rd_doload = simple_strtol(str,NULL,0) & 3; _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 6/9] uml: reintroduce pcap support
The pcap support was not working because of some linking problems (expressing the construct in Kbuild was a bit difficult) and because there was no user request. Now that this has come back, here's the support. This has been tested and works on both 32 and 64-bit hosts, even when "cross-"building 32-bit binaries. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-broken-paolo/arch/um/Kconfig_net |2 +- linux-2.6.git-broken-paolo/arch/um/Makefile | 14 +- linux-2.6.git-broken-paolo/arch/um/drivers/Makefile | 17 ++--- 3 files changed, 24 insertions(+), 9 deletions(-) diff -puN arch/um/drivers/Makefile~uml-reallow-pcap arch/um/drivers/Makefile --- linux-2.6.git-broken/arch/um/drivers/Makefile~uml-reallow-pcap 2005-07-13 19:43:05.0 +0200 +++ linux-2.6.git-broken-paolo/arch/um/drivers/Makefile 2005-07-13 19:43:30.0 +0200 @@ -10,7 +10,6 @@ slip-objs := slip_kern.o slip_user.o slirp-objs := slirp_kern.o slirp_user.o daemon-objs := daemon_kern.o daemon_user.o mcast-objs := mcast_kern.o mcast_user.o -#pcap-objs := pcap_kern.o pcap_user.o $(PCAP) net-objs := net_kern.o net_user.o mconsole-objs := mconsole_kern.o mconsole_user.o hostaudio-objs := hostaudio_kern.o @@ -18,6 +17,17 @@ ubd-objs := ubd_kern.o ubd_user.o port-objs := port_kern.o port_user.o harddog-objs := harddog_kern.o harddog_user.o +LDFLAGS_pcap.o := -r $(shell $(CC) $(CFLAGS) -print-file-name=libpcap.a) + +$(obj)/pcap.o: $(obj)/pcap_kern.o $(obj)/pcap_user.o + $(LD) -r -dp -o $@ $^ $(LDFLAGS) $(LDFLAGS_pcap.o) +#XXX: The call below does not work because the flags are added before the +# object name, so nothing from the library gets linked. +#$(call if_changed,ld) + +# When the above is fixed, don't forget to add this too! +#targets := $(obj)/pcap.o + obj-y := stdio_console.o fd.o chan_kern.o chan_user.o line.o obj-$(CONFIG_SSL) += ssl.o obj-$(CONFIG_STDERR_CONSOLE) += stderr_console.o @@ -26,7 +36,7 @@ obj-$(CONFIG_UML_NET_SLIP) += slip.o sli obj-$(CONFIG_UML_NET_SLIRP) += slirp.o slip_common.o obj-$(CONFIG_UML_NET_DAEMON) += daemon.o obj-$(CONFIG_UML_NET_MCAST) += mcast.o -#obj-$(CONFIG_UML_NET_PCAP) += pcap.o $(PCAP) +obj-$(CONFIG_UML_NET_PCAP) += pcap.o obj-$(CONFIG_UML_NET) += net.o obj-$(CONFIG_MCONSOLE) += mconsole.o obj-$(CONFIG_MMAPPER) += mmapper_kern.o @@ -41,6 +51,7 @@ obj-$(CONFIG_UML_WATCHDOG) += harddog.o obj-$(CONFIG_BLK_DEV_COW_COMMON) += cow_user.o obj-$(CONFIG_UML_RANDOM) += random.o -USER_OBJS := fd.o null.o pty.o tty.o xterm.o slip_common.o +# pcap_user.o must be added explicitly. +USER_OBJS := fd.o null.o pty.o tty.o xterm.o slip_common.o pcap_user.o include arch/um/scripts/Makefile.rules diff -puN arch/um/Kconfig_net~uml-reallow-pcap arch/um/Kconfig_net --- linux-2.6.git-broken/arch/um/Kconfig_net~uml-reallow-pcap 2005-07-13 19:43:05.0 +0200 +++ linux-2.6.git-broken-paolo/arch/um/Kconfig_net 2005-07-13 19:43:05.0 +0200 @@ -135,7 +135,7 @@ config UML_NET_MCAST config UML_NET_PCAP bool "pcap transport" - depends on UML_NET && BROKEN + depends on UML_NET && EXPERIMENTAL help The pcap transport makes a pcap packet stream on the host look like an ethernet device inside UML. This is useful for making diff -puN arch/um/Makefile~uml-reallow-pcap arch/um/Makefile --- linux-2.6.git-broken/arch/um/Makefile~uml-reallow-pcap 2005-07-13 19:43:05.0 +0200 +++ linux-2.6.git-broken-paolo/arch/um/Makefile 2005-07-13 19:43:05.0 +0200 @@ -56,17 +56,21 @@ include $(srctree)/$(ARCH_DIR)/Makefile- core-y += $(SUBARCH_CORE) libs-y += $(SUBARCH_LIBS) -# -Derrno=kernel_errno - This turns all kernel references to errno into -# kernel_errno to separate them from the libc errno. This allows -fno-common -# in CFLAGS. Otherwise, it would cause ld to complain about the two different -# errnos. +# -Dvmap=kernel_vmap affects everything, and prevents anything from +# referencing the libpcap.o symbol so named. CFLAGS += $(CFLAGS-y) -D__arch_um__ -DSUBARCH=\"$(SUBARCH)\" \ - $(ARCH_INCLUDE) $(MODE_INCLUDE) + $(ARCH_INCLUDE) $(MODE_INCLUDE) -Dvmap=kernel_vmap USER_CFLAGS := $(patsubst -I%,,$(CFLAGS)) USER_CFLAGS := $(patsubst -D__KERNEL__,,$(USER_CFLAGS)) $(ARCH_INCLUDE) \ $(MODE_INCLUDE) $(ARCH_USER_CFLAGS) + +# -Derrno=kernel_errno - This turns all kernel references to errno into +# kernel_errno to separate them from the libc errno. This allows -fno-common +# in CFLAGS. Otherwise, it would cause ld to complain about the two different +# errnos. + CFLAGS += -Derrno=kernel_errno -Dsigprocmask=kernel_sigprocmask CFLAGS += $(call cc-option,-fno-unit-at-a-time,) _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 7/9] uml: allow building as 32-bit binary on 64bit host
This patch makes the command: make ARCH=um SUBARCH=i386 work on x86_64 hosts (with support for building 32-bit binaries). This is especially needed since 64-bit UMLs don't support 32-bit emulation for guest binaries, currently. This has been tested in all possible cases and works. Only exception is that I've built but not tested a 64-bit binary, because I hadn't a 64-bit filesystem available. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-broken-paolo/arch/um/Makefile | 11 + linux-2.6.git-broken-paolo/arch/um/Makefile-i386 | 30 +- linux-2.6.git-broken-paolo/arch/um/Makefile-x86_64|6 +- linux-2.6.git-broken-paolo/arch/um/scripts/Makefile.unmap |4 - 4 files changed, 31 insertions(+), 20 deletions(-) diff -puN arch/um/Makefile-i386~uml-build-on-64bit-host arch/um/Makefile-i386 --- linux-2.6.git-broken/arch/um/Makefile-i386~uml-build-on-64bit-host 2005-07-13 19:46:33.0 +0200 +++ linux-2.6.git-broken-paolo/arch/um/Makefile-i3862005-07-13 19:46:33.0 +0200 @@ -1,4 +1,4 @@ -SUBARCH_CORE := arch/um/sys-i386/ arch/i386/crypto/ +core-y += arch/um/sys-i386/ arch/i386/crypto/ TOP_ADDR := $(CONFIG_TOP_ADDR) @@ -8,21 +8,33 @@ ifeq ($(CONFIG_MODE_SKAS),y) endif endif +LDFLAGS+= -m elf_i386 +ELF_ARCH := $(SUBARCH) +ELF_FORMAT := elf32-$(SUBARCH) +OBJCOPYFLAGS := -O binary -R .note -R .comment -S + +ifeq ("$(origin SUBARCH)", "command line") +ifneq ("$(shell uname -m | sed -e s/i.86/i386/)", "$(SUBARCH)") +CFLAGS += $(call cc-option,-m32) +USER_CFLAGS+= $(call cc-option,-m32) +HOSTCFLAGS += $(call cc-option,-m32) +HOSTLDFLAGS+= $(call cc-option,-m32) +AFLAGS += $(call cc-option,-m32) +LINK-y += $(call cc-option,-m32) +UML_OBJCOPYFLAGS += -F $(ELF_FORMAT) + +export LDFLAGS HOSTCFLAGS HOSTLDFLAGS UML_OBJCOPYFLAGS +endif +endif + CFLAGS += -U__$(SUBARCH)__ -U$(SUBARCH) -ARCH_USER_CFLAGS := ifneq ($(CONFIG_GPROF),y) ARCH_CFLAGS += -DUM_FASTCALL endif -ELF_ARCH := $(SUBARCH) -ELF_FORMAT := elf32-$(SUBARCH) - -OBJCOPYFLAGS := -O binary -R .note -R .comment -S - SYS_UTIL_DIR := $(ARCH_DIR)/sys-i386/util - -SYS_HEADERS := $(SYS_DIR)/sc.h $(SYS_DIR)/thread.h +SYS_HEADERS:= $(SYS_DIR)/sc.h $(SYS_DIR)/thread.h prepare: $(SYS_HEADERS) diff -puN arch/um/Makefile~uml-build-on-64bit-host arch/um/Makefile --- linux-2.6.git-broken/arch/um/Makefile~uml-build-on-64bit-host 2005-07-13 19:46:33.0 +0200 +++ linux-2.6.git-broken-paolo/arch/um/Makefile 2005-07-13 19:46:33.0 +0200 @@ -51,11 +51,6 @@ MRPROPER_DIRS+= $(ARCH_DIR)/include2 endif SYS_DIR:= $(ARCH_DIR)/include/sysdep-$(SUBARCH) -include $(srctree)/$(ARCH_DIR)/Makefile-$(SUBARCH) - -core-y += $(SUBARCH_CORE) -libs-y += $(SUBARCH_LIBS) - # -Dvmap=kernel_vmap affects everything, and prevents anything from # referencing the libpcap.o symbol so named. @@ -64,7 +59,7 @@ CFLAGS += $(CFLAGS-y) -D__arch_um__ -DSU USER_CFLAGS := $(patsubst -I%,,$(CFLAGS)) USER_CFLAGS := $(patsubst -D__KERNEL__,,$(USER_CFLAGS)) $(ARCH_INCLUDE) \ - $(MODE_INCLUDE) $(ARCH_USER_CFLAGS) + $(MODE_INCLUDE) # -Derrno=kernel_errno - This turns all kernel references to errno into # kernel_errno to separate them from the libc errno. This allows -fno-common @@ -74,6 +69,8 @@ USER_CFLAGS := $(patsubst -D__KERNEL__,, CFLAGS += -Derrno=kernel_errno -Dsigprocmask=kernel_sigprocmask CFLAGS += $(call cc-option,-fno-unit-at-a-time,) +include $(srctree)/$(ARCH_DIR)/Makefile-$(SUBARCH) + #This will adjust *FLAGS accordingly to the platform. include $(srctree)/$(ARCH_DIR)/Makefile-os-$(OS) @@ -132,7 +129,7 @@ CPPFLAGS_vmlinux.lds = -U$(SUBARCH) \ #The wrappers will select whether using "malloc" or the kernel allocator. LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc -CFLAGS_vmlinux = $(LINK-y) $(LINK_WRAPS) +CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) define cmd_vmlinux__ $(CC) $(CFLAGS_vmlinux) -o $@ \ -Wl,-T,$(vmlinux-lds) $(vmlinux-init) \ diff -puN arch/um/Makefile-x86_64~uml-build-on-64bit-host arch/um/Makefile-x86_64 --- linux-2.6.git-broken/arch/um/Makefile-x86_64~uml-build-on-64bit-host 2005-07-13 19:46:33.0 +0200 +++ linux-2.6.git-broken-paolo/arch/um/Makefile-x86_64 2005-07-13 19:46:33.0 +0200 @@ -1,11 +1,13 @@ # Copyright 2003 - 2004 Pathscale, Inc # Released under the GPL -SUBARCH_LIBS := arch/um/sys-x86_64/ +libs-y += arch/um/sys-x86_64/ START := 0x6000 +#We #undef __x86_64__ for kernelspace, not for userspace where +#it's needed for headers to work! CFLAGS += -U__$(SUBARCH)__ -fno-builtin -ARCH_USER_CFLAGS := -D__x86_64__ +USER_CFLAGS += -fno-builtin ELF_ARCH := i386:x8
[patch 1/9] uml: fix lvalue for gcc4
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>, Russell King <[EMAIL PROTECTED]> This construct is refused by GCC 4, so here's the (corrected) fix. Thanks to Russell for noticing a stupid mistake I did when first sending this. As he noted, the code is largely suboptimal however it currently works, and will be fixed shortly. Just read the access_ok check on fp which is NULL, or the pointer arithmetic below which should be done with a cast to void*: frame = (struct rt_sigframe __user *) round_down(stack_top - sizeof(struct rt_sigframe), 16) - 8; The code shows clearly that has been taken from arch/x86_64/kernel/signal.c:setup_rt_frame(), maybe in a bit of a hurry. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-broken-paolo/arch/um/sys-x86_64/signal.c |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff -puN arch/um/sys-x86_64/signal.c~uml-fix-for-gcc4-lvalue arch/um/sys-x86_64/signal.c --- linux-2.6.git-broken/arch/um/sys-x86_64/signal.c~uml-fix-for-gcc4-lvalue 2005-07-13 19:30:43.0 +0200 +++ linux-2.6.git-broken-paolo/arch/um/sys-x86_64/signal.c 2005-07-13 19:30:44.0 +0200 @@ -168,7 +168,7 @@ int setup_signal_stack_si(unsigned long frame = (struct rt_sigframe __user *) round_down(stack_top - sizeof(struct rt_sigframe), 16) - 8; - ((unsigned char *) frame) -= 128; + frame -= 128 / sizeof(*frame); if (!access_ok(VERIFY_WRITE, fp, sizeof(struct _fpstate))) goto out; _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 4/9] uml: gcc 2.95 fix and Makefile cleanup
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> CC: Raphael Bossek <[EMAIL PROTECTED]> 1) Cleanup an ugly hyper-nested code in Makefile (now only the arith. expression is passed through the host bash). 2) Fix a problem with GCC 2.95: according to a report from Raphael Bossek, .remap_data : { arch/um/sys-SUBARCH/unmap_fin.o (.data .bss) } is expanded into: .remap_data : { arch/um/sys-i386 /unmap_fin.o (.data .bss) } (because I didn't use ## to join the two tokens), thus stopping linking. Pass the whole path from the Makefile as a simple and nice fix. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-broken-paolo/arch/um/Makefile |9 + linux-2.6.git-broken-paolo/arch/um/kernel/uml.lds.S |4 ++-- 2 files changed, 7 insertions(+), 6 deletions(-) diff -puN arch/um/Makefile~uml-cleanup-Makefile-a-bit arch/um/Makefile --- linux-2.6.git-broken/arch/um/Makefile~uml-cleanup-Makefile-a-bit 2005-07-13 19:41:17.0 +0200 +++ linux-2.6.git-broken-paolo/arch/um/Makefile 2005-07-13 19:41:17.0 +0200 @@ -116,13 +116,14 @@ CONFIG_KERNEL_STACK_ORDER ?= 2 STACK_SIZE := $(shell echo $$[ 4096 * (1 << $(CONFIG_KERNEL_STACK_ORDER)) ] ) ifndef START - START = $$(($(TOP_ADDR) - $(SIZE))) + START = $(shell echo $$[ $(TOP_ADDR) - $(SIZE) ] ) endif -CPPFLAGS_vmlinux.lds = $(shell echo -U$(SUBARCH) \ +CPPFLAGS_vmlinux.lds = -U$(SUBARCH) \ -DSTART=$(START) -DELF_ARCH=$(ELF_ARCH) \ - -DELF_FORMAT=\"$(ELF_FORMAT)\" $(CPP_MODE-y) \ - -DKERNEL_STACK_SIZE=$(STACK_SIZE) -DSUBARCH=$(SUBARCH)) + -DELF_FORMAT="$(ELF_FORMAT)" $(CPP_MODE-y) \ + -DKERNEL_STACK_SIZE=$(STACK_SIZE) \ + -DUNMAP_PATH=arch/um/sys-$(SUBARCH)/unmap_fin.o #The wrappers will select whether using "malloc" or the kernel allocator. LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc diff -puN arch/um/kernel/uml.lds.S~uml-cleanup-Makefile-a-bit arch/um/kernel/uml.lds.S --- linux-2.6.git-broken/arch/um/kernel/uml.lds.S~uml-cleanup-Makefile-a-bit 2005-07-13 19:41:17.0 +0200 +++ linux-2.6.git-broken-paolo/arch/um/kernel/uml.lds.S 2005-07-13 19:41:17.0 +0200 @@ -16,8 +16,8 @@ SECTIONS __binary_start = .; #ifdef MODE_TT - .remap_data : { arch/um/sys-SUBARCH/unmap_fin.o (.data .bss) } - .remap : { arch/um/sys-SUBARCH/unmap_fin.o (.text) } + .remap_data : { UNMAP_PATH (.data .bss) } + .remap : { UNMAP_PATH (.text) } . = ALIGN(4096); /* Init code and data */ #endif _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [uml-devel] Re: [patch 1/9] uml: fix lvalue for gcc4
On Wednesday 13 July 2005 23:29, Andrew Morton wrote: > Please identify which of these patches you consider to be 2.6.13 material. All ones are for 2.6.13... except this one, it's still wrong, I overlooked it a bit too much, it must be replaced by this (I'll post it in a mail it if needed): http://user-mode-linux.sourceforge.net/work/current/2.6/2.6.12-mm2/patches/x86_64_compile Bye -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] readd missing define to arch/um/Makefile-i386
On Sunday 17 July 2005 16:52, Olaf Hering wrote: > New in 2.6.13-rc3-git4: > scripts/Makefile.build:13: /Makefile: No such file or directory > scripts/Makefile.build:64: kbuild: Makefile.build is included improperly > the define was removed, but its still required to build some targets. > Signed-off-by: Olaf Hering <[EMAIL PROTECTED]> Yes, this patch is the correct fix, also for -rc3-mm1 (which has the same problem). Andrew, I hadn't the time to look at how you fixed up rejects in last merge ([PATCH] uml: allow building as 32-bit binary on 64bit host)*; the rejects came from the SKAS0 merge, and while you were fixing the patch up you deleted by mistake the line which is readded in this patch. * http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=20d0021394c1b070bf04b22c5bc8fdb437edd4c5 -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Giving developers clue how many testers verified certain kernel version
Adrian Bunk stusta.de> writes: > On Thu, Jul 21, 2005 at 09:40:43PM -0500, Alejandro Bonilla wrote: > > > >How do we know that something is OK or wrong? just by the fact that > > it works or not, it doesn't mean like is OK. > > > > There has to be a process for any user to be able to verify and study a > > problem. We don't have that yet. > If the user doesn't notice the difference then there's no problem for > him. Some performance regressions aren't easily noticeable without benchmarks... and we've had people claiming unnoticed regressions since 2.6.2 (http://kerneltrap.org/node/4940) > If there's a problem the user notices, then the process is to send an > email to linux-kernel and/or open a bug in the kernel Bugzilla and > follow the "please send the output of foo" and "please test patch bar" > instructions. > What comes nearest to what you are talking about is that you run LTP > and/or various benchmarks against every -git and every -mm kernel and > report regressions. But this is sinply a task someone could do (and I > don't know how much of it is already done e.g. at OSDL), and not > something every user could contribute to. Forgot drivers testing? That is where most of the bugs are hidden, and where wide user testing is definitely needed because of the various hardware bugs and different configurations existing in real world. IMHO, I think that publishing statistics about kernel patches downloads would be a very Good Thing(tm) to do. Peter, what's your opinion? I think that was even talked about at Kernel Summit (or at least I thought of it there), but I've not understood if this is going to happen. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] uml: share page bits handling between 2 and 3 level pagetables
On Saturday 30 July 2005 18:02, Jeff Dike wrote: > On Thu, Jul 28, 2005 at 08:56:53PM +0200, [EMAIL PROTECTED] wrote: > > As obvious, a "core code nice cleanup" is not a "stability-friendly > > patch" so usual care applies. > > These look reasonable, as they are what we discussed in Ottawa. > > I'll put them in my tree and see if I see any problems. I would > suggest sending these in early after 2.6.13 if they seem OK. Just noticed: you can drop them (except the first, which is a nice cleanup). set_pte handles that, and include/asm-generic/pgtable.h uses coherently set_pte_at. I've checked UML by examining "grep pte", and either mk_pte or set_pte are used. Exceptions: fixaddr_user_init (but that should be ok as we shouldn't map it actually), pte_modify() (which handles that only for present pages). But pte_modify is used with set_pte, so probably we could as well drop that handling. Also look, on the "set_pte" theme, at the attached patch. I realized this when I needed those lines to work - I was getting a segfault loop. After using set_pte(), things worked. I have now an almost perfectly working implementation of remap_file_pages with protection support. There will probably be some other things to update, like swapping locations, but I can't get this kernel to fail (it's easier to find bugs in the test-program, it grew quite complex). And, I'd like to note, original Ingo's version *DID NOT* work properly (it was not safe against swapout, it didn't allow write-protecting a page successfully). I'm going to clean up the code and write changelogs, to send then the patches for -mm (hoping the page fault scalability patches don't get in the way). -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> The PTE returned from handle_mm_fault is already marked as dirty and accessed if needed. Also, since this is not set with set_pte() (which sets NEWPAGE and NEWPROT as needed), this wouldn't work anyway. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/um/kernel/trap_kern.c |3 +-- 1 files changed, 1 insertion(+), 2 deletions(-) diff -puN arch/um/kernel/trap_kern.c~uml-avoid-already-done-dirtying arch/um/kernel/trap_kern.c --- linux-2.6.git/arch/um/kernel/trap_kern.c~uml-avoid-already-done-dirtying 2005-08-10 19:21:13.0 +0200 +++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c 2005-08-10 19:21:13.0 +0200 @@ -83,8 +83,7 @@ survive: pte = pte_offset_kernel(pmd, address); } while(!pte_present(*pte)); err = 0; - *pte = pte_mkyoung(*pte); - if(pte_write(*pte)) *pte = pte_mkdirty(*pte); + WARN_ON(!pte_young(*pte) || pte_write(*pte) && !pte_dirty(*pte)); flush_tlb_page(vma, address); out: up_read(&mm->mmap_sem); _
Really BAD granularity example in BKCVS output
I've locally downloaded and installed the GIT version of the BitKeeper tree (the first existing upload - have been away for a while, don't know if there are others), and while browsing the history for some work, I found this commit: http://localhost/~paolo/git/?p=old-2.6-bkcvs/.git;a=commit;h=b035f9332ce7e205af43f7cfdf4e1cf3625f7ad5 (the hashes work on the kernel.org copy of that repository, assuming it wasn't re-exported). Well, that is *awfully* big (543 files touched)! Isn't there anything which can be done about that? What is worse, the commit message is truncated! And yes, sorry if this is a stupid question. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.beta.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Feature removal: ACPI S4bios support, ioctl32_conversion
Looking at Documentation/feature-removal-schedule.txt, I've seen an out-dated feature still to remove for you, so I thought to drop you a reminder email. Thanks for attention -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.beta.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 07/39] uml: fault handler micro-cleanups
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Avoid chomping low bits of address for functions doing it by themselves, fix whitespace, add a correctness checking. I did this for remap-file-pages protection support, it was useful on its own too. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/um/kernel/trap_kern.c | 28 +++-- 1 files changed, 13 insertions(+), 15 deletions(-) diff -puN arch/um/kernel/trap_kern.c~uml-fault-handler-changes arch/um/kernel/trap_kern.c --- linux-2.6.git/arch/um/kernel/trap_kern.c~uml-fault-handler-changes 2005-08-11 11:18:03.0 +0200 +++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c 2005-08-11 11:19:56.0 +0200 @@ -26,6 +26,7 @@ #include "mem.h" #include "mem_kern.h" +/* Note this is constrained to return 0, -EFAULT, -EACCESS, -ENOMEM by segv(). */ int handle_page_fault(unsigned long address, unsigned long ip, int is_write, int is_user, int *code_out) { @@ -35,7 +36,6 @@ int handle_page_fault(unsigned long addr pud_t *pud; pmd_t *pmd; pte_t *pte; - unsigned long page; int err = -EFAULT; *code_out = SEGV_MAPERR; @@ -52,7 +52,7 @@ int handle_page_fault(unsigned long addr else if(expand_stack(vma, address)) goto out; - good_area: +good_area: *code_out = SEGV_ACCERR; if(is_write && !(vma->vm_flags & VM_WRITE)) goto out; @@ -60,9 +60,8 @@ int handle_page_fault(unsigned long addr if(!(vma->vm_flags & (VM_READ | VM_EXEC))) goto out; - page = address & PAGE_MASK; do { - survive: +survive: switch (handle_mm_fault(mm, vma, address, is_write)){ case VM_FAULT_MINOR: current->min_flt++; @@ -79,16 +78,16 @@ int handle_page_fault(unsigned long addr default: BUG(); } - pgd = pgd_offset(mm, page); - pud = pud_offset(pgd, page); - pmd = pmd_offset(pud, page); - pte = pte_offset_kernel(pmd, page); + pgd = pgd_offset(mm, address); + pud = pud_offset(pgd, address); + pmd = pmd_offset(pud, address); + pte = pte_offset_kernel(pmd, address); } while(!pte_present(*pte)); err = 0; *pte = pte_mkyoung(*pte); if(pte_write(*pte)) *pte = pte_mkdirty(*pte); - flush_tlb_page(vma, page); - out: + flush_tlb_page(vma, address); +out: up_read(&mm->mmap_sem); return(err); @@ -144,19 +143,18 @@ unsigned long segv(struct faultinfo fi, panic("Kernel mode fault at addr 0x%lx, ip 0x%lx", address, ip); - if(err == -EACCES){ + if (err == -EACCES) { si.si_signo = SIGBUS; si.si_errno = 0; si.si_code = BUS_ADRERR; si.si_addr = (void *)address; current->thread.arch.faultinfo = fi; force_sig_info(SIGBUS, &si, current); - } - else if(err == -ENOMEM){ + } else if (err == -ENOMEM) { printk("VM: killing process %s\n", current->comm); do_exit(SIGKILL); - } - else { + } else { + BUG_ON(err != -EFAULT); si.si_signo = SIGSEGV; si.si_addr = (void *) address; current->thread.arch.faultinfo = fi; _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 15/39] remap_file_pages protection support: add VM_NONUNIFORM to fix existing usage of mprotect()
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Distinguish between "normal" VMA and VMA with non-uniform protection. This will be also useful for fault handling (we must ignore VM_{READ,WRITE,EXEC} in the arch fault handler). As said before, with remap-file-pages-prot, we must punt on private VMA even when we're just changing protections. Also, with the remap_file_pages protection support, we have indeed a regression with remap_file_pages VS mprotect. mprotect alters the VMA protections and walks each installed PTE. Mprotect'ing a nonlinear VMA used to work, obviously, but now doesn't, because we must now read the protections from the PTE which haven't been updated; so, to avoid changing behaviour for old binaries, on uniform VMA's we ignore protections in the PTE, like we did before. On non-uniform VMA's, instead, mprotect is currently broken, however we've never supported it so this is acceptable. What it does is to split the VMA if needed, assign the new protection to the VMA and enforce the new protections on all present pages, ignoring all absent ones (including pte_file() ones), which will keep the current protections. So, the application has no reliable way to know which pages would actually be remapped. What is more, there is IMHO no reason to support using mprotect on non-uniform VMAs. The only exception is to change the VMA's default protection (which is used for non-individually remapped pages), but it should still ignore the page tables. The only need for that is if I want to change protections without changing the indexes, which with remap_file_pages you must do one page at a time and re-specifying the indexes. It is more reasonable to allow remap_file_pages to change protections on a PTE range without changing the offsets. I've not implemented this, but if wanted I can. For sure, UML doesn't need this interface. However, for now I've implemented no change to mprotect(), I'd like to get some feedback before about which way to go. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/linux/mm.h |7 +++ linux-2.6.git-paolo/mm/fremap.c| 13 + linux-2.6.git-paolo/mm/memory.c|2 +- 3 files changed, 21 insertions(+), 1 deletion(-) diff -puN mm/fremap.c~rfp-add-VM_NONUNIFORM mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-add-VM_NONUNIFORM 2005-08-11 23:03:51.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:03:51.0 +0200 @@ -252,6 +252,19 @@ retry: spin_unlock(&mapping->i_mmap_lock); } } + if (pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot)) { + if (!(vma->vm_flags & VM_SHARED)) + goto out_unlock; + if (!(vma->vm_flags & VM_NONUNIFORM)) { + if (!has_write_lock) { + up_read(&mm->mmap_sem); + down_write(&mm->mmap_sem); + has_write_lock = 1; + goto retry; + } + vma->vm_flags |= VM_NONUNIFORM; + } + } err = vma->vm_ops->populate(vma, start, size, pgprot, pgoff, flags & MAP_NONBLOCK); diff -puN include/linux/mm.h~rfp-add-VM_NONUNIFORM include/linux/mm.h --- linux-2.6.git/include/linux/mm.h~rfp-add-VM_NONUNIFORM 2005-08-11 23:03:51.0 +0200 +++ linux-2.6.git-paolo/include/linux/mm.h 2005-08-11 23:03:51.0 +0200 @@ -160,7 +160,14 @@ extern unsigned int kobjsize(const void #define VM_ACCOUNT 0x0010 /* Is a VM accounted object */ #define VM_HUGETLB 0x0040 /* Huge TLB Page VM */ #define VM_NONLINEAR 0x0080 /* Is non-linear (remap_file_pages) */ + +#ifndef CONFIG_MMU #define VM_MAPPED_COPY 0x0100 /* T if mapped copy of data (nommu mmap) */ +#else +#define VM_NONUNIFORM 0x0100 /* The VM individual pages have + different protections + (remap_file_pages)*/ +#endif #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */ #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS diff -puN mm/memory.c~rfp-add-VM_NONUNIFORM mm/memory.c --- linux-2.6.git/mm/memory.c~rfp-add-VM_NONUNIFORM 2005-08-11 23:03:51.0 +0200 +++ linux-2.6.git-paolo/mm/memory.c 2005-08-11 23:03:51.0 +0200 @@ -1941,7 +1941,7 @@ static int do_file_page(struct mm_struct } pgoff = pte_to_pgoff(*pte); - pgprot = pte_to_pgprot(*pte); + pgpr
[patch 06/39] correct _PAGE_FILE comment
_PAGE_FILE does not indicate whether a file is in page / swap cache, it is set just for non-linear PTE's. Correct the comment for i386, x86_64, UML. Also clearify _PAGE_NONE. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-i386/pgtable.h | 10 +- linux-2.6.git-paolo/include/asm-um/pgtable.h |8 +--- linux-2.6.git-paolo/include/asm-x86_64/pgtable.h |2 +- 3 files changed, 11 insertions(+), 9 deletions(-) diff -puN include/asm-i386/pgtable.h~correct-_PAGE_FILE-comment include/asm-i386/pgtable.h --- linux-2.6.git/include/asm-i386/pgtable.h~correct-_PAGE_FILE-comment 2005-08-11 11:17:04.0 +0200 +++ linux-2.6.git-paolo/include/asm-i386/pgtable.h 2005-08-11 11:17:04.0 +0200 @@ -86,9 +86,7 @@ void paging_init(void); #endif /* - * The 4MB page is guessing.. Detailed in the infamous "Chapter H" - * of the Pentium details, but assuming intel did the straightforward - * thing, this bit set in the page directory entry just means that + * _PAGE_PSE set in the page directory entry just means that * the page directory entry points directly to a 4MB-aligned block of * memory. */ @@ -119,8 +117,10 @@ void paging_init(void); #define _PAGE_UNUSED2 0x400 #define _PAGE_UNUSED3 0x800 -#define _PAGE_FILE 0x040 /* set:pagecache unset:swap */ -#define _PAGE_PROTNONE 0x080 /* If not present */ +/* If _PAGE_PRESENT is clear, we use these: */ +#define _PAGE_FILE 0x040 /* nonlinear file mapping, saved PTE; unset:swap */ +#define _PAGE_PROTNONE 0x080 /* if the user mapped it with PROT_NONE; + pte_present gives true */ #ifdef CONFIG_X86_PAE #define _PAGE_NX (1ULL<<_PAGE_BIT_NX) #else diff -puN include/asm-x86_64/pgtable.h~correct-_PAGE_FILE-comment include/asm-x86_64/pgtable.h --- linux-2.6.git/include/asm-x86_64/pgtable.h~correct-_PAGE_FILE-comment 2005-08-11 11:17:04.0 +0200 +++ linux-2.6.git-paolo/include/asm-x86_64/pgtable.h2005-08-11 11:17:04.0 +0200 @@ -143,7 +143,7 @@ extern inline void pgd_clear (pgd_t * pg #define _PAGE_ACCESSED 0x020 #define _PAGE_DIRTY0x040 #define _PAGE_PSE 0x080 /* 2MB page */ -#define _PAGE_FILE 0x040 /* set:pagecache, unset:swap */ +#define _PAGE_FILE 0x040 /* nonlinear file mapping, saved PTE; unset:swap */ #define _PAGE_GLOBAL 0x100 /* Global TLB entry */ #define _PAGE_PROTNONE 0x080 /* If not present */ diff -puN include/asm-um/pgtable.h~correct-_PAGE_FILE-comment include/asm-um/pgtable.h --- linux-2.6.git/include/asm-um/pgtable.h~correct-_PAGE_FILE-comment 2005-08-11 11:17:04.0 +0200 +++ linux-2.6.git-paolo/include/asm-um/pgtable.h2005-08-11 11:17:04.0 +0200 @@ -16,13 +16,15 @@ #define _PAGE_PRESENT 0x001 #define _PAGE_NEWPAGE 0x002 -#define _PAGE_NEWPROT 0x004 -#define _PAGE_FILE 0x008 /* set:pagecache unset:swap */ -#define _PAGE_PROTNONE 0x010 /* If not present */ +#define _PAGE_NEWPROT 0x004 #define _PAGE_RW 0x020 #define _PAGE_USER 0x040 #define _PAGE_ACCESSED 0x080 #define _PAGE_DIRTY0x100 +/* If _PAGE_PRESENT is clear, we use these: */ +#define _PAGE_FILE 0x008 /* nonlinear file mapping, saved PTE; unset:swap */ +#define _PAGE_PROTNONE 0x010 /* if the user mapped it with PROT_NONE; + pte_present gives true */ #ifdef CONFIG_3_LEVEL_PGTABLES #include "asm/pgtable-3level.h" _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 33/39] remap_file_pages protection support: VM_FAULT_SIGSEGV permission checking rework
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Simplify the generic arch permission checking: the previous one was clumsy, as it didn't account arch-specific implications (read implies exec, write implies read, and so on). Still to undo fixes for the archs (i386 and UML) which were modified for the previous scheme. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/memory.c | 49 ++-- 1 files changed, 33 insertions(+), 16 deletions(-) diff -puN mm/memory.c~rfp-sigsegv-4 mm/memory.c --- linux-2.6.git/mm/memory.c~rfp-sigsegv-4 2005-08-12 17:18:55.0 +0200 +++ linux-2.6.git-paolo/mm/memory.c 2005-08-12 17:18:55.0 +0200 @@ -1923,6 +1923,35 @@ oom: goto out; } +static inline int check_perms(struct vm_area_struct * vma, int access_mask) { + if (unlikely(vm_flags & VM_NONUNIFORM)) { + /* we used to check protections in arch handler, but with +* VM_NONUNIFORM the check is skipped. */ +#if 0 + if ((access_mask & VM_WRITE) > (vm_flags & VM_WRITE)) + goto err; + if ((access_mask & VM_READ) > (vm_flags & VM_READ)) + goto err; + if ((access_mask & VM_EXEC) > (vm_flags & VM_EXEC)) + goto err; +#else + /* access_mask contains the type of the access, vm_flags are the +* declared protections, pte has the protection which will be +* given to the PTE's in that area. */ + //pte_t pte = pfn_pte(0UL, protection_map[vm_flags & 0x0f|VM_SHARED]); + pte_t pte = pfn_pte(0UL, vma->vm_page_prot); + if ((access_mask & VM_WRITE) && ! pte_write(pte)) + goto err; + if ((access_mask & VM_READ) && ! pte_read(pte)) + goto err; + if ((access_mask & VM_EXEC) && ! pte_exec(pte)) + goto err; +#endif + } + return 0; +err: + return -EPERM; +} /* * Fault of a previously existing named mapping. Repopulate the pte * from the encoded file_pte if possible. This enables swappable @@ -1944,14 +1973,8 @@ static int do_file_page(struct mm_struct ((access_mask & VM_WRITE) && !(vma->vm_flags & VM_SHARED))) { /* We're behaving as if pte_file was cleared, so check * protections like in handle_pte_fault. */ - if (unlikely(vma->vm_flags & VM_NONUNIFORM)) { - if ((access_mask & VM_WRITE) > (vma->vm_flags & VM_WRITE)) - goto out_segv; - if ((access_mask & VM_READ) > (vma->vm_flags & VM_READ)) - goto out_segv; - if ((access_mask & VM_EXEC) > (vma->vm_flags & VM_EXEC)) - goto out_segv; - } + if (check_perms(vma, access_mask)) + goto out_segv; pte_clear(mm, address, pte); return do_no_page(mm, vma, address, access_mask & VM_WRITE, pte, pmd); @@ -2007,14 +2030,8 @@ static inline int handle_pte_fault(struc /* when pte_file(), the VMA protections are useless. Otherwise, * we used to check protections in arch handler, but with * VM_NONUNIFORM the check is skipped. */ - if (unlikely(vma->vm_flags & VM_NONUNIFORM) && !pte_file(entry)) { - if ((access_mask & VM_WRITE) > (vma->vm_flags & VM_WRITE)) - goto out_segv; - if ((access_mask & VM_READ) > (vma->vm_flags & VM_READ)) - goto out_segv; - if ((access_mask & VM_EXEC) > (vma->vm_flags & VM_EXEC)) - goto out_segv; - } + if (!pte_file(entry) && check_perms(vma, access_mask)) + goto out_segv; /* * If it truly wasn't present, we know that kswapd _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 12/39] remap_file_pages protection support: enhance syscall interface and swapout code
From: Ingo Molnar <[EMAIL PROTECTED]>, Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> This is the "main" patch for the syscall code, containing the core of what was sent by Ingo Molnar, variously reworked. Differently from his patch, I've *not* added a new syscall, choosing to add a new flag (MAP_NOINHERIT) which the application must specify to get the new behavior (prot != 0 is accepted and prot == 0 means PROT_NONE). The changes to the page fault handler have been separated, even because that has required considerable amount of effort. Handle the possibility that remap_file_pages changes protections in various places. * Enable the 'prot' parameter for shared-writable mappings (the ones which are the primary target for remap_file_pages), without breaking up the vma * Use pte_file PTE's also when protections don't match, not only when the offset doesn't match; and add set_nonlinear_pte() for this testing * Save the current protection too when clearing a nonlinear PTE, by replacing pgoff_to_pte() uses with pgoff_prot_to_pte(). * Use the supplied protections on restore and on populate (partially uncomplete, fixed in subsequent patches) Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/linux/pagemap.h | 19 ++ linux-2.6.git-paolo/mm/fremap.c | 50 +--- linux-2.6.git-paolo/mm/memory.c | 14 --- linux-2.6.git-paolo/mm/rmap.c |3 - 4 files changed, 60 insertions(+), 26 deletions(-) diff -puN include/linux/pagemap.h~rfp-enhance-syscall-and-swapout-code include/linux/pagemap.h --- linux-2.6.git/include/linux/pagemap.h~rfp-enhance-syscall-and-swapout-code 2005-08-11 22:59:47.0 +0200 +++ linux-2.6.git-paolo/include/linux/pagemap.h 2005-08-11 22:59:47.0 +0200 @@ -159,6 +159,25 @@ static inline pgoff_t linear_page_index( return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT); } +/*** + * Checks if the PTE is nonlinear, and if yes sets it. + * @vma: the VMA in which @addr is; we don't check if it's VM_NONLINEAR, just + * if this PTE is nonlinear. + * @addr: the addr which @pte refers to. + * @pte: the old PTE value (to read its protections. + * @ptep: the PTE pointer (for setting it). + * @mm: passed to set_pte_at. + * @page: the page which was installed (to read its ->index, i.e. the old + * offset inside the file. + */ +static inline void set_nonlinear_pte(pte_t pte, pte_t * ptep, struct vm_area_struct *vma, struct mm_struct *mm, struct page* page, unsigned long addr) +{ + pgprot_t pgprot = pte_to_pgprot(pte); + if(linear_page_index(vma, addr) != page->index || + pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot)) + set_pte_at(mm, addr, ptep, pgoff_prot_to_pte(page->index, pgprot)); +} + extern void FASTCALL(__lock_page(struct page *page)); extern void FASTCALL(unlock_page(struct page *page)); diff -puN mm/fremap.c~rfp-enhance-syscall-and-swapout-code mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-enhance-syscall-and-swapout-code 2005-08-11 22:59:47.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:01:14.0 +0200 @@ -54,7 +54,7 @@ static inline void zap_pte(struct mm_str * previously existing mapping. */ int install_page(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, struct page *page, pgprot_t prot) + unsigned long addr, struct page *page, pgprot_t pgprot) { struct inode *inode; pgoff_t size; @@ -94,7 +94,7 @@ int install_page(struct mm_struct *mm, s inc_mm_counter(mm,rss); flush_icache_page(vma, page); - set_pte_at(mm, addr, pte, mk_pte(page, prot)); + set_pte_at(mm, addr, pte, mk_pte(page, pgprot)); page_add_file_rmap(page); pte_val = *pte; pte_unmap(pte); @@ -113,7 +113,7 @@ EXPORT_SYMBOL(install_page); * previously existing mapping. */ int install_file_pte(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, unsigned long pgoff, pgprot_t prot) + unsigned long addr, unsigned long pgoff, pgprot_t pgprot) { int err = -ENOMEM; pte_t *pte; @@ -139,7 +139,7 @@ int install_file_pte(struct mm_struct *m zap_pte(mm, vma, addr, pte); - set_pte_at(mm, addr, pte, pgoff_to_pte(pgoff)); + set_pte_at(mm, addr, pte, pgoff_prot_to_pte(pgoff, pgprot)); pte_val = *pte; pte_unmap(pte); update_mmu_cache(vma, addr, pte_val); @@ -157,31 +157,28 @@ err_unlock: *file within an existing vma. * @start: start of the remapped virtual memory range * @size: size of the remapped virtual memory range - * @prot: new protection bits of the range + * @prot: new protection bits of the range, must be 0 if not us
[patch 30/39] remap_file_pages protection support: ia64 bits
From: Ingo Molnar <[EMAIL PROTECTED]> I've attached a 'blind' port of the prot bits of fremap to ia64. I've compiled it with a cross-compiler but otherwise it's untested. (and it's very likely i got the pte bits wrong - but it's roughly OK.) This should at least make ia64 compile. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-ia64/pgtable.h | 17 + 1 files changed, 13 insertions(+), 4 deletions(-) diff -puN include/asm-ia64/pgtable.h~rfp-arch-ia64 include/asm-ia64/pgtable.h --- linux-2.6.git/include/asm-ia64/pgtable.h~rfp-arch-ia64 2005-08-12 19:27:03.0 +0200 +++ linux-2.6.git-paolo/include/asm-ia64/pgtable.h 2005-08-12 19:27:03.0 +0200 @@ -433,7 +433,8 @@ extern void paging_init (void); * Format of file pte: * bit 0 : present bit (must be zero) * bit 1 : _PAGE_FILE (must be one) - * bits 2-62: file_offset/PAGE_SIZE + * bit 2 : _PAGE_AR_RW + * bits 3-62: file_offset/PAGE_SIZE * bit 63 : _PAGE_PROTNONE bit */ #define __swp_type(entry) (((entry).val >> 2) & 0x7f) @@ -442,9 +443,17 @@ extern void paging_init (void); #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) }) #define __swp_entry_to_pte(x) ((pte_t) { (x).val }) -#define PTE_FILE_MAX_BITS 61 -#define pte_to_pgoff(pte) ((pte_val(pte) << 1) >> 3) -#define pgoff_to_pte(off) ((pte_t) { ((off) << 2) | _PAGE_FILE }) +#define PTE_FILE_MAX_BITS 59 +#define pte_to_pgoff(pte) ((pte_val(pte) << 1) >> 4) + +#define pte_to_pgprot(pte) \ + __pgprot((pte_val(pte) & (_PAGE_AR_RW | _PAGE_PROTNONE)) \ + | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \ + (__ACCESS_BITS | _PAGE_PL_3))) + +#define pgoff_prot_to_pte(off, prot) \ + ((pte_t) { _PAGE_FILE + \ + (pgprot_val(prot) & (_PAGE_AR_RW | _PAGE_PROTNONE)) + (off) }) /* XXX is this right? */ #define io_remap_page_range(vma, vaddr, paddr, size, prot) \ _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 27/39] remap_file_pages protection support: fixups to ppc32 bits
From: Paul Mackerras <[EMAIL PROTECTED]> When I tried -mm4 on a ppc32 box, it hit a BUG because I hadn't excluded _PAGE_FILE from the bits used for swap entries. While looking at that I realised that the pte_to_pgoff and pgoff_prot_to_pte macros were wrong for 4xx and 8xx (embedded) PPC chips, since they use Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-ppc/pgtable.h | 48 +- 1 files changed, 39 insertions(+), 9 deletions(-) diff -puN include/asm-ppc/pgtable.h~rfp-arch-ppc32-pgtable-fixes include/asm-ppc/pgtable.h --- linux-2.6.git/include/asm-ppc/pgtable.h~rfp-arch-ppc32-pgtable-fixes 2005-08-12 18:18:44.0 +0200 +++ linux-2.6.git-paolo/include/asm-ppc/pgtable.h 2005-08-12 18:18:44.0 +0200 @@ -205,6 +205,7 @@ extern unsigned long ioremap_bot, iorema */ #define _PAGE_PRESENT 0x0001 /* S: PTE valid */ #define_PAGE_RW0x0002 /* S: Write permission */ +#define _PAGE_FILE 0x0004 /* S: nonlinear file mapping */ #define_PAGE_DIRTY 0x0004 /* S: Page dirty */ #define _PAGE_ACCESSED 0x0008 /* S: Page referenced */ #define _PAGE_HWWRITE 0x0010 /* H: Dirty & RW */ @@ -213,7 +214,6 @@ extern unsigned long ioremap_bot, iorema #define_PAGE_ENDIAN0x0080 /* H: E bit */ #define_PAGE_GUARDED 0x0100 /* H: G bit */ #define_PAGE_COHERENT 0x0200 /* H: M bit */ -#define _PAGE_FILE 0x0400 /* S: nonlinear file mapping */ #define_PAGE_NO_CACHE 0x0400 /* H: I bit */ #define_PAGE_WRITETHRU 0x0800 /* H: W bit */ @@ -724,20 +724,50 @@ extern void paging_init(void); #define __swp_type(entry) ((entry).val & 0x1f) #define __swp_offset(entry)((entry).val >> 5) #define __swp_entry(type, offset) ((swp_entry_t) { (type) | ((offset) << 5) }) + +#if defined(CONFIG_4xx) || defined(CONFIG_8xx) +/* _PAGE_FILE and _PAGE_PRESENT are in the bottom 3 bits on all these chips. */ #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) >> 3 }) #define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 }) +#else /* Classic PPC */ +#define __pte_to_swp_entry(pte)\ +((swp_entry_t) { ((pte_val(pte) >> 3) & ~1) | ((pte_val(pte) >> 2) & 1) }) +#define __swp_entry_to_pte(x) \ +((pte_t) { (((x).val & ~1) << 3) | (((x).val & 1) << 2) }) +#endif /* Encode and decode a nonlinear file mapping entry */ -#define PTE_FILE_MAX_BITS 27 -#define pte_to_pgoff(pte) (((pte_val(pte) & ~0x7ff) >> 5) \ -| ((pte_val(pte) & 0x3f0) >> 4)) -#define pte_to_pgprot(pte) \ -__pgprot((pte_val(pte) & (_PAGE_USER|_PAGE_RW|_PAGE_PRESENT)) | _PAGE_ACCESSED) +/* We can't use any the _PAGE_PRESENT, _PAGE_FILE, _PAGE_USER, _PAGE_RW, + or _PAGE_HASHPTE bits for storing a page offset. */ +#if defined(CONFIG_40x) +/* 40x, avoid the 0x53 bits - to simplify things, avoid 0x73 */ */ +#define __pgoff_split(x) x) << 5) & ~0x7f) | (((x) << 2) & 0xc)) +#define __pgoff_glue(x)x) & ~0x7f) >> 5) | (((x) & 0xc) >> 2)) +#elif defined(CONFIG_44x) +/* 44x, avoid the 0x47 bits */ +#define __pgoff_split(x) x) << 4) & ~0x7f) | (((x) << 3) & 0x38)) +#define __pgoff_glue(x)x) & ~0x7f) >> 4) | (((x) & 0x38) >> 3)) +#elif defined(CONFIG_8xx) +/* 8xx, avoid the 0x843 bits */ +#define __pgoff_split(x) x) << 4) & ~0xfff) | (((x) << 3) & 0x780) \ +| (((x) << 2) & 0x3c)) +#define __pgoff_glue(x)x) & ~0xfff) >> 4) | (((x) & 0x780) >> 3))\ +| (((x) & 0x3c) >> 2)) +#else +/* classic PPC, avoid the 0x40f bits */ +#define __pgoff_split(x) x) << 5) & ~0x7ff) | (((x) << 4) & 0x3f0)) +#define __pgoff_glue(x)x) & ~0x7ff) >> 5) | (((x) & 0x3f0) >> 4)) +#endif +#define PTE_FILE_MAX_BITS 27 +#define pte_to_pgoff(pte) __pgoff_glue(pte_val(pte)) #define pgoff_prot_to_pte(off, prot) \ - ((pte_t) { (((off) << 5) & ~0x7ff) | (((off) << 4) & 0x3f0) \ - | (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) \ - | _PAGE_FILE }) + ((pte_t) { __pgoff_split(off) | _PAGE_FILE |\ + (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) }) + +#de
[patch 11/39] remap_file_pages protection support: add MAP_NOINHERIT flag
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Add the MAP_NOINHERIT flag to arch headers, for use with remap-file-pages. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-i386/mman.h |1 + linux-2.6.git-paolo/include/asm-ia64/mman.h |1 + linux-2.6.git-paolo/include/asm-ppc/mman.h|1 + linux-2.6.git-paolo/include/asm-ppc64/mman.h |1 + linux-2.6.git-paolo/include/asm-s390/mman.h |1 + linux-2.6.git-paolo/include/asm-x86_64/mman.h |1 + 6 files changed, 6 insertions(+) diff -puN include/asm-i386/mman.h~rfp-map-noinherit include/asm-i386/mman.h --- linux-2.6.git/include/asm-i386/mman.h~rfp-map-noinherit 2005-08-11 12:06:40.0 +0200 +++ linux-2.6.git-paolo/include/asm-i386/mman.h 2005-08-11 12:06:40.0 +0200 @@ -22,6 +22,7 @@ #define MAP_NORESERVE 0x4000 /* don't check for reservations */ #define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x1 /* do not block on IO */ +#define MAP_NOINHERIT 0x2 /* don't inherit the protection bits of the underlying vma*/ #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_INVALIDATE 2 /* invalidate the caches */ diff -puN include/asm-ia64/mman.h~rfp-map-noinherit include/asm-ia64/mman.h --- linux-2.6.git/include/asm-ia64/mman.h~rfp-map-noinherit 2005-08-11 12:06:40.0 +0200 +++ linux-2.6.git-paolo/include/asm-ia64/mman.h 2005-08-11 12:06:40.0 +0200 @@ -30,6 +30,7 @@ #define MAP_NORESERVE 0x04000 /* don't check for reservations */ #define MAP_POPULATE 0x08000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x1 /* do not block on IO */ +#define MAP_NOINHERIT 0x2 /* don't inherit the protection bits of the underlying vma*/ #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_INVALIDATE 2 /* invalidate the caches */ diff -puN include/asm-ppc64/mman.h~rfp-map-noinherit include/asm-ppc64/mman.h --- linux-2.6.git/include/asm-ppc64/mman.h~rfp-map-noinherit2005-08-11 12:06:40.0 +0200 +++ linux-2.6.git-paolo/include/asm-ppc64/mman.h2005-08-11 12:06:40.0 +0200 @@ -38,6 +38,7 @@ #define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x1 /* do not block on IO */ +#define MAP_NOINHERIT 0x2 /* don't inherit the protection bits of the underlying vma*/ #define MADV_NORMAL0x0 /* default page-in behavior */ #define MADV_RANDOM0x1 /* page-in minimum required */ diff -puN include/asm-ppc/mman.h~rfp-map-noinherit include/asm-ppc/mman.h --- linux-2.6.git/include/asm-ppc/mman.h~rfp-map-noinherit 2005-08-11 12:06:40.0 +0200 +++ linux-2.6.git-paolo/include/asm-ppc/mman.h 2005-08-11 12:06:40.0 +0200 @@ -23,6 +23,7 @@ #define MAP_EXECUTABLE 0x1000 /* mark it as an executable */ #define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x1 /* do not block on IO */ +#define MAP_NOINHERIT 0x2 /* don't inherit the protection bits of the underlying vma*/ #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_INVALIDATE 2 /* invalidate the caches */ diff -puN include/asm-s390/mman.h~rfp-map-noinherit include/asm-s390/mman.h --- linux-2.6.git/include/asm-s390/mman.h~rfp-map-noinherit 2005-08-11 12:06:40.0 +0200 +++ linux-2.6.git-paolo/include/asm-s390/mman.h 2005-08-11 12:06:40.0 +0200 @@ -30,6 +30,7 @@ #define MAP_NORESERVE 0x4000 /* don't check for reservations */ #define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x1 /* do not block on IO */ +#define MAP_NOINHERIT 0x2 /* don't inherit the protection bits of the underlying vma*/ #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_INVALIDATE 2 /* invalidate the caches */ diff -puN include/asm-x86_64/mman.h~rfp-map-noinherit include/asm-x86_64/mman.h --- linux-2.6.git/include/asm-x86_64/mman.h~rfp-map-noinherit 2005-08-11 12:06:40.0 +0200 +++ linux-2.6.git-paolo/include/asm-x86_64/mman.h 2005-08-11 12:06:40.0 +0200 @@ -23,6 +23,7 @@ #define MAP_NORESERVE 0x4000 /* don't check for reservations */ #define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x1 /* do not block on IO */ +#define MAP_NOINHERIT 0x2 /* don't inherit the protection bits of the underlying vma*/ #define MS_ASYNC 1 /* sync memory asynchronously */ #define
[patch 14/39] remap_file_pages protection support: assume VM_SHARED never disappears
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Assume that even after dropping and reacquiring the lock, (vma->vm_flags & VM_SHARED) won't change, thus moving a check earlier. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/fremap.c | 12 ++-- 1 files changed, 2 insertions(+), 10 deletions(-) diff -puN mm/fremap.c~rfp-assume-VM_PRIVATE-stays mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-assume-VM_PRIVATE-stays 2005-08-11 12:58:07.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 13:38:56.0 +0200 @@ -232,6 +232,8 @@ retry: /* Must set VM_NONLINEAR before any pages are populated. */ if (pgoff != linear_page_index(vma, start)) { + if (!(vma->vm_flags & VM_SHARED)) + goto out_unlock; if (!(vma->vm_flags & VM_NONLINEAR)) { if (!has_write_lock) { up_read(&mm->mmap_sem); @@ -239,12 +241,6 @@ retry: has_write_lock = 1; goto retry; } - /* XXX: we check VM_SHARED after re-getting the -* (write) semaphore but I guess that we could -* check it earlier as we're not allowed to turn -* a VM_PRIVATE vma into a VM_SHARED one! */ - if (!(vma->vm_flags & VM_SHARED)) - goto out_unlock; mapping = vma->vm_file->f_mapping; spin_lock(&mapping->i_mmap_lock); @@ -254,10 +250,6 @@ retry: vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear); flush_dcache_mmap_unlock(mapping); spin_unlock(&mapping->i_mmap_lock); - } else { - /* Won't drop the lock, check it here.*/ - if (!(vma->vm_flags & VM_SHARED)) - goto out_unlock; } } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 35/39] remap_file_pages protection support: avoid redundant pte_file PTE's
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> For linear VMA's, there is no need to install pte_file PTEs to remember the offset. We could probably go as far as checking directly the address and protection like in include/linux/pagemap.h:set_nonlinear_pte(), instead of vma->vm_flags. Also add some warnings on the path which used to cope with such PTE's. Untested yet. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/fremap.c | 12 ++-- linux-2.6.git-paolo/mm/memory.c |5 + 2 files changed, 11 insertions(+), 6 deletions(-) diff -puN mm/fremap.c~rfp-linear-optim-v3 mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-linear-optim-v3 2005-08-11 23:20:09.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:20:09.0 +0200 @@ -125,6 +125,12 @@ int install_file_pte(struct mm_struct *m BUG_ON(!uniform && !(vma->vm_flags & VM_SHARED)); + /* We're being called by mmap(MAP_NONBLOCK|MAP_POPULATE) on an uniform +* VMA. So don't need to take the lock, and to install a PTE for the +* page we'd fault in anyway. */ + if (uniform) + return 0; + pgd = pgd_offset(mm, addr); spin_lock(&mm->page_table_lock); @@ -139,12 +145,6 @@ int install_file_pte(struct mm_struct *m pte = pte_alloc_map(mm, pmd, addr); if (!pte) goto err_unlock; - /* -* Skip uniform non-existent ptes: -*/ - err = 0; - if (uniform && pte_none(*pte)) - goto err_unlock; zap_pte(mm, vma, addr, pte); diff -puN mm/memory.c~rfp-linear-optim-v3 mm/memory.c --- linux-2.6.git/mm/memory.c~rfp-linear-optim-v3 2005-08-11 23:20:09.0 +0200 +++ linux-2.6.git-paolo/mm/memory.c 2005-08-11 23:20:09.0 +0200 @@ -1969,9 +1969,14 @@ static int do_file_page(struct mm_struct /* * Fall back to the linear mapping if the fs does not support * ->populate; in this case do the protection checks. +* Could have been installed by install_file_pte, for a MAP_NONBLOCK +* pagetable population. */ if (!vma->vm_ops->populate || ((access_mask & VM_WRITE) && !(vma->vm_flags & VM_SHARED))) { + /* remap_file_pages should disallow this, now that +* install_file_pte skips linear ones. */ + WARN_ON(1); /* We're behaving as if pte_file was cleared, so check * protections like in handle_pte_fault. */ if (check_perms(vma, access_mask)) _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 03/39] add swap cache mapping comment
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Add some more comments about page->mapping and swapper_space, explaining their (historical and current) relationship. Such material can be extracted from the old GIT history (which I used for reference), but having it in the source is more useful. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/./mm/swap_state.c |5 + 1 files changed, 5 insertions(+) diff -puN ./mm/swap_state.c~swap-cache-mapping-comment ./mm/swap_state.c --- linux-2.6.git/./mm/swap_state.c~swap-cache-mapping-comment 2005-08-11 11:12:57.0 +0200 +++ linux-2.6.git-paolo/./mm/swap_state.c 2005-08-11 11:12:57.0 +0200 @@ -21,6 +21,11 @@ * swapper_space is a fiction, retained to simplify the path through * vmscan's shrink_list, to make sync_page look nicer, and to allow * future use of radix_tree tags in the swap cache. + * + * In 2.4 and until 2.6.6 pages in the swap cache also had page->mapping == + * &swapper_space (this was the definition of PageSwapCache), but this is no + * more true. Instead, we use page->flags for that, and page->mapping is + * *ignored* here. However, also take a look at page_mapping(). */ static struct address_space_operations swap_aops = { .writepage = swap_writepage, _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 31/39] remap_file_pages protection support: s390 bits
From: Martin Schwidefsky <[EMAIL PROTECTED]> s390 memory management changes for remap-file-pages-prot patch: - Add pgoff_prot_to_pte/pte_to_pgprot, remove pgoff_to_pte (required for 'prot' parameteter in shared-writeable mappings). - Handle VM_FAULT_SIGSEGV from handle_mm_fault in do_exception. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/s390/mm/fault.c |2 linux-2.6.git-paolo/include/asm-s390/pgtable.h | 90 - 2 files changed, 60 insertions(+), 32 deletions(-) diff -puN arch/s390/mm/fault.c~rfp-arch-s390 arch/s390/mm/fault.c --- linux-2.6.git/arch/s390/mm/fault.c~rfp-arch-s3902005-08-12 19:27:58.0 +0200 +++ linux-2.6.git-paolo/arch/s390/mm/fault.c2005-08-12 19:27:58.0 +0200 @@ -260,6 +260,8 @@ survive: goto do_sigbus; case VM_FAULT_OOM: goto out_of_memory; + case VM_FAULT_SIGSEGV: + goto bad_area; default: BUG(); } diff -puN include/asm-s390/pgtable.h~rfp-arch-s390 include/asm-s390/pgtable.h --- linux-2.6.git/include/asm-s390/pgtable.h~rfp-arch-s390 2005-08-12 19:27:58.0 +0200 +++ linux-2.6.git-paolo/include/asm-s390/pgtable.h 2005-08-12 19:27:58.0 +0200 @@ -211,16 +211,41 @@ extern char empty_zero_page[PAGE_SIZE]; * C : changed bit */ -/* Hardware bits in the page table entry */ +/* Hardware bits in the page table entry. */ #define _PAGE_RO0x200 /* HW read-only */ #define _PAGE_INVALID 0x400 /* HW invalid */ -/* Mask and four different kinds of invalid pages. */ -#define _PAGE_INVALID_MASK 0x601 +/* Software bits in the page table entry. */ +#define _PAGE_FILE 0x001 +#define _PAGE_PROTNONE 0x002 + +/* + * We have 8 different page "types", two valid types and 6 invalid types + * (p = page address, o = swap offset, t = swap type, f = file offset): + * 0 xxx 0IP0 yy NF + * valid rw: 0 <p> <--0-> 00 + * valid ro: 0 <p> 0010 <--0-> 00 + * invalid none: 0 <p> 0100 <--0-> 10 + * invalid empty: 0 <0> 0100 <--0-> 00 + * invalid swap: 0 <o> 0110 <--t-> 00 + * invalid file rw:0 <f> 0100 <--f-> 01 + * invalid file ro:0 <f> 0110 <--f-> 01 + * invaild file none: 0 <f> 0100 <--f-> 11 + * + * The format for 64 bit is almost identical, there isn't a leading zero + * and the number of bits in the page address part of the pte is 52 bits + * instead of 19. + */ + #define _PAGE_INVALID_EMPTY0x400 -#define _PAGE_INVALID_NONE 0x401 #define _PAGE_INVALID_SWAP 0x600 -#define _PAGE_INVALID_FILE 0x601 +#define _PAGE_INVALID_FILE 0x401 + +#define _PTE_IS_VALID(__pte) (!(pte_val(__pte) & _PAGE_INVALID)) +#define _PTE_IS_NONE(__pte)((pte_val(__pte) & 0x603) == 0x402) +#define _PTE_IS_EMPTY(__pte) ((pte_val(__pte) & 0x603) == 0x400) +#define _PTE_IS_SWAP(__pte)((pte_val(__pte) & 0x603) == 0x600) +#define _PTE_IS_FILE(__pte)((pte_val(__pte) & 0x401) == 0x401) #ifndef __s390x__ @@ -281,13 +306,11 @@ extern char empty_zero_page[PAGE_SIZE]; /* * No mapping available */ -#define PAGE_NONE_SHARED __pgprot(_PAGE_INVALID_NONE) -#define PAGE_NONE_PRIVATE __pgprot(_PAGE_INVALID_NONE) -#define PAGE_RO_SHARED __pgprot(_PAGE_RO) -#define PAGE_RO_PRIVATE __pgprot(_PAGE_RO) -#define PAGE_COPY__pgprot(_PAGE_RO) -#define PAGE_SHARED __pgprot(0) -#define PAGE_KERNEL __pgprot(0) +#define PAGE_NONE __pgprot(_PAGE_INVALID | _PAGE_PROTNONE) +#define PAGE_READONLY __pgprot(_PAGE_RO) +#define PAGE_COPY __pgprot(_PAGE_RO) +#define PAGE_SHARED__pgprot(0) +#define PAGE_KERNEL__pgprot(0) /* * The S390 can't do page protection for execute, and considers that the @@ -295,21 +318,21 @@ extern char empty_zero_page[PAGE_SIZE]; * the closest we can get.. */ /*xwr*/ -#define __P000 PAGE_NONE_PRIVATE -#define __P001 PAGE_RO_PRIVATE +#define __P000 PAGE_NONE +#define __P001 PAGE_READONLY #define __P010 PAGE_COPY #define __P011 PAGE_COPY -#define __P100 PAGE_RO_PRIVATE -#define __P101 PAGE_RO_PRIVATE +#define __P100 PAGE_READONLY +#define __P101 PAGE_READONLY #define __P110 PAGE_COPY #define __P111 PAGE_COPY -#define __S000 PAGE_NONE_SHARED -#define __S001 PAGE_RO_SHARED +#define __S000 PAGE_NONE +#define __S001 PAGE_READONLY #define __S010 PAGE_SHARED #define __S011 PAGE_SHARED -#define __S100 PAGE_RO_SHARED -#define __S101 PAGE_RO_SHARED +#define __S100 PAGE_READONLY +#define __S101 PAGE_READONLY #define __S110 PAGE_
[patch 16/39] remap_file_pages protection support: readd lock downgrading
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Even now, we'll sometimes take the write lock. So, in that case, we could downgrade it; after a tiny bit of thought, I've choosen doing that when we'll either do any I/O or we'll alter a lot of PTEs. About how much "a lot" is, I've copied the values from this code in mm/memory.c: #ifdef CONFIG_PREEMPT # define ZAP_BLOCK_SIZE (8 * PAGE_SIZE) #else /* No preempt: go for improved straight-line efficiency */ # define ZAP_BLOCK_SIZE (1024 * PAGE_SIZE) #endif I'm not sure about the trade-offs - we used to have a down_write, now we have a down_read() and a possible up_read()down_write(), and with this patch, the fast-path still takes only down_read, but the slow path will do down_read(), down_write(), downgrade_write(). This will increase the number of atomic operation but increase concurrency wrt mmap and similar operations - I don't know how much contention there is on that lock. Also, drop a bust comment: we cannot clear VM_NONLINEAR simply because code elsewhere is going to use it. At the very least, madvise_dontneed() relies on that flag being set (remaining non-linear truncation read the mapping list), but the list is probably longer and going to increase in the next patches of this series. Just in case this wasn't clear: this patch is not strictly related to protection support, I was just too lazy to move it up in the hierarchy. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/fremap.c | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff -puN mm/fremap.c~rfp-downgrade-lock mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-downgrade-lock2005-08-11 23:04:39.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:04:39.0 +0200 @@ -152,6 +152,13 @@ err_unlock: } +#ifdef CONFIG_PREEMPT +# define INSTALL_SIZE (8 * PAGE_SIZE) +#else +/* No preempt: go for improved straight-line efficiency */ +# define INSTALL_SIZE (1024 * PAGE_SIZE) +#endif + /*** * sys_remap_file_pages - remap arbitrary pages of a shared backing store *file within an existing vma. @@ -266,14 +273,15 @@ retry: } } + /* Do NOT hold the write lock while doing any I/O, nor when +* iterating over too many PTEs. Values might need tuning. */ + if (has_write_lock && (!(flags & MAP_NONBLOCK) || size > INSTALL_SIZE)) { + downgrade_write(&mm->mmap_sem); + has_write_lock = 0; + } err = vma->vm_ops->populate(vma, start, size, pgprot, pgoff, flags & MAP_NONBLOCK); - /* -* We can't clear VM_NONLINEAR because we'd have to do -* it after ->populate completes, and that would prevent -* downgrading the lock. (Locks can't be upgraded). -*/ } out_unlock: _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 34/39] remap_file_pages protection support: restrict permission testing
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Yet to test. Currently we install a PTE when one is missing irrispective of the fault type, and if the access type is prohibited we'll get another fault and kill the process only then. With this, we check the access type on the 1st fault. We could also use this code for testing present PTE's, if the current assumption (fault on present PTE's in VM_NONUNIFORM vma's means access violation) proves problematic for architectures other than UML (which I already fixed), but I hope it's not needed. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/memory.c | 16 1 files changed, 16 insertions(+) diff -puN mm/memory.c~rfp-fault-sigsegv-3 mm/memory.c --- linux-2.6.git/mm/memory.c~rfp-fault-sigsegv-3 2005-08-12 17:19:17.0 +0200 +++ linux-2.6.git-paolo/mm/memory.c 2005-08-12 17:19:17.0 +0200 @@ -1963,6 +1963,7 @@ static int do_file_page(struct mm_struct unsigned long pgoff; pgprot_t pgprot; int err; + pte_t test_entry; BUG_ON(!vma->vm_ops || !vma->vm_ops->nopage); /* @@ -1983,6 +1984,21 @@ static int do_file_page(struct mm_struct pgoff = pte_to_pgoff(*pte); pgprot = vma->vm_flags & VM_NONUNIFORM ? pte_to_pgprot(*pte): vma->vm_page_prot; + /* If this is not enabled, we'll get another fault after return next +* time, check we handle that one, and that this code works. */ +#if 1 + /* We just want to test pte_{read,write,exec} */ + test_entry = mk_pte(0, pgprot); + if (unlikely(vma->vm_flags & VM_NONUNIFORM) && !pte_file(*pte)) { + if ((access_mask & VM_WRITE) && !pte_write(test_entry)) + goto out_segv; + if ((access_mask & VM_READ) && !pte_read(test_entry)) + goto out_segv; + if ((access_mask & VM_EXEC) && !pte_exec(test_entry)) + goto out_segv; + } +#endif + pte_unmap(pte); spin_unlock(&mm->page_table_lock); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 08/39] remap_file_pages protection support: uml bits
Update pte encoding macros for UML. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-um/pgtable-2level.h | 15 ++ linux-2.6.git-paolo/include/asm-um/pgtable-3level.h | 21 +++- 2 files changed, 27 insertions(+), 9 deletions(-) diff -puN include/asm-um/pgtable-2level.h~rfp-arch-uml include/asm-um/pgtable-2level.h --- linux-2.6.git/include/asm-um/pgtable-2level.h~rfp-arch-uml 2005-08-11 11:23:21.0 +0200 +++ linux-2.6.git-paolo/include/asm-um/pgtable-2level.h 2005-08-11 11:23:21.0 +0200 @@ -72,12 +72,19 @@ static inline void set_pte(pte_t *pteptr ((unsigned long) __va(pmd_val(pmd) & PAGE_MASK)) /* - * Bits 0 through 3 are taken + * Bits 0 to 5 are taken, split up the 26 bits of offset + * into this range: */ -#define PTE_FILE_MAX_BITS 28 +#define PTE_FILE_MAX_BITS 26 -#define pte_to_pgoff(pte) (pte_val(pte) >> 4) +#define pte_to_pgoff(pte) (pte_val(pte) >> 6) +#define pte_to_pgprot(pte) \ + __pgprot((pte_val(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \ + | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \ + (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED) -#define pgoff_to_pte(off) ((pte_t) { ((off) << 4) + _PAGE_FILE }) +#define pgoff_prot_to_pte(off, prot) \ + ((pte_t) { ((off) << 6) + \ +(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE }) #endif diff -puN include/asm-um/pgtable-3level.h~rfp-arch-uml include/asm-um/pgtable-3level.h --- linux-2.6.git/include/asm-um/pgtable-3level.h~rfp-arch-uml 2005-08-11 11:23:21.0 +0200 +++ linux-2.6.git-paolo/include/asm-um/pgtable-3level.h 2005-08-11 11:23:21.0 +0200 @@ -140,25 +140,36 @@ static inline pmd_t pfn_pmd(pfn_t page_n } /* - * Bits 0 through 3 are taken in the low part of the pte, + * Bits 0 through 5 are taken in the low part of the pte, * put the 32 bits of offset into the high part. */ #define PTE_FILE_MAX_BITS 32 + #ifdef CONFIG_64BIT #define pte_to_pgoff(p) ((p).pte >> 32) - -#define pgoff_to_pte(off) ((pte_t) { ((off) << 32) | _PAGE_FILE }) +#define pgoff_to_pte(off) ((pte_t) { ((off) << 32) | _PAGE_FILE | \ + (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) }) +#define pte_flags(pte) pte_val(pte) #else #define pte_to_pgoff(pte) ((pte).pte_high) - -#define pgoff_to_pte(off) ((pte_t) { _PAGE_FILE, (off) }) +#define pgoff_prot_to_pte(off, prot) ((pte_t) { \ + (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) | _PAGE_FILE, \ + (off) }) +/* Don't use pte_val below, useless to join the two halves */ +#define pte_flags(pte) ((pte).pte_low) #endif +#define pte_to_pgprot(pte) \ + __pgprot((pte_flags(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \ + | ((pte_flags(pte) & _PAGE_PROTNONE) ? 0 : \ + (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED) +#undef pte_flags + #endif /* _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 22/39] remap file pages protection support: use FAULT_SIGSEGV for protection checking, uml bits
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> This adapts the changes to the i386 handler to the UML one. It isn't enough to make UML work, however, because UML has some peculiarities. Subsequent patches fix this. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/um/kernel/trap_kern.c | 32 + 1 files changed, 27 insertions(+), 5 deletions(-) diff -puN arch/um/kernel/trap_kern.c~rfp-fault-sigsegv-2-uml arch/um/kernel/trap_kern.c --- linux-2.6.git/arch/um/kernel/trap_kern.c~rfp-fault-sigsegv-2-uml 2005-08-11 23:09:32.0 +0200 +++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c 2005-08-11 23:09:32.0 +0200 @@ -37,6 +37,7 @@ int handle_page_fault(unsigned long addr pmd_t *pmd; pte_t *pte; int err = -EFAULT; + int access_mask = 0; *code_out = SEGV_MAPERR; down_read(&mm->mmap_sem); @@ -55,14 +56,15 @@ int handle_page_fault(unsigned long addr good_area: *code_out = SEGV_ACCERR; if(is_write && !(vma->vm_flags & VM_WRITE)) - goto out; + goto prot_bad; if(!(vma->vm_flags & (VM_READ | VM_EXEC))) -goto out; +goto prot_bad; + access_mask = is_write ? VM_WRITE : 0; do { -survive: - switch (handle_mm_fault(mm, vma, address, is_write)){ +handle_fault: + switch (__handle_mm_fault(mm, vma, address, access_mask)) { case VM_FAULT_MINOR: current->min_flt++; break; @@ -72,6 +74,9 @@ survive: case VM_FAULT_SIGBUS: err = -EACCES; goto out; + case VM_FAULT_SIGSEGV: + err = -EFAULT; + goto out; case VM_FAULT_OOM: err = -ENOMEM; goto out_of_memory; @@ -87,10 +92,27 @@ survive: *pte = pte_mkyoung(*pte); if(pte_write(*pte)) *pte = pte_mkdirty(*pte); flush_tlb_page(vma, address); + + /* If the PTE is not present, the vma protection are not accurate if +* VM_NONUNIFORM; present PTE's are correct for VM_NONUNIFORM and were +* already handled otherwise. */ out: up_read(&mm->mmap_sem); return(err); +prot_bad: + if (unlikely(vma->vm_flags & VM_NONUNIFORM)) { + access_mask = is_write ? VM_WRITE : 0; + /* Otherwise, on a legitimate read fault on a page mapped as +* exec-only, we get problems. Probably, we should lower +* requirements... we should always test just +* pte_read/write/exec, on vma->vm_page_prot! This way is +* cumbersome. However, for now things should work for UML. */ + access_mask |= vma->vm_flags & VM_EXEC ? VM_EXEC : VM_READ; + goto handle_fault; + } + goto out; + /* * We ran out of memory, or some other thing happened to us that made * us unable to handle the page fault gracefully. @@ -100,7 +122,7 @@ out_of_memory: up_read(&mm->mmap_sem); yield(); down_read(&mm->mmap_sem); - goto survive; + goto handle_fault; } goto out; } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 17/39] remap_file_pages protection support: safety net for lazy arches
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Since proper support requires that the arch at the very least handles VM_FAULT_SIGSEGV, as in next patch (otherwise the arch may BUG), and things are even more complex (see next patches), and it's triggerable only with VM_NONUNIFORM vma's, simply refuse creating them if the arch doesn't declare itself ready. This is a very temporary hack, so I've clearly marked it as such. And, with current rythms, I've given about 6 months for arches to get ready. Reducing this time is perfectly ok for me. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/Documentation/feature-removal-schedule.txt | 12 ++ linux-2.6.git-paolo/include/asm-i386/pgtable.h |3 ++ linux-2.6.git-paolo/include/asm-um/pgtable.h |3 ++ linux-2.6.git-paolo/mm/fremap.c|5 4 files changed, 23 insertions(+) diff -puN mm/fremap.c~rfp-safety-net-for-archs mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-safety-net-for-archs 2005-08-11 13:46:49.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 13:55:02.0 +0200 @@ -184,6 +184,11 @@ asmlinkage long sys_remap_file_pages(uns int err = -EINVAL; int has_write_lock = 0; + /* Hack for not-updated archs, KILLME after 2.6.16! */ +#ifndef __ARCH_SUPPORTS_VM_NONUNIFORM + if (flags & MAP_NOINHERIT) + goto out; +#endif if (prot && !(flags & MAP_NOINHERIT)) goto out; /* diff -puN include/asm-i386/pgtable.h~rfp-safety-net-for-archs include/asm-i386/pgtable.h --- linux-2.6.git/include/asm-i386/pgtable.h~rfp-safety-net-for-archs 2005-08-11 13:46:49.0 +0200 +++ linux-2.6.git-paolo/include/asm-i386/pgtable.h 2005-08-11 13:55:02.0 +0200 @@ -419,4 +419,7 @@ extern void noexec_setup(const char *str #define __HAVE_ARCH_PTE_SAME #include +/* Hack for not-updated archs, KILLME after 2.6.16! */ +#define __ARCH_SUPPORTS_VM_NONUNIFORM + #endif /* _I386_PGTABLE_H */ diff -puN include/asm-um/pgtable.h~rfp-safety-net-for-archs include/asm-um/pgtable.h --- linux-2.6.git/include/asm-um/pgtable.h~rfp-safety-net-for-archs 2005-08-11 13:46:49.0 +0200 +++ linux-2.6.git-paolo/include/asm-um/pgtable.h2005-08-11 13:55:02.0 +0200 @@ -361,6 +361,9 @@ static inline pte_t pte_modify(pte_t pte #include +/* Hack for not-updated archs, KILLME after 2.6.16! */ +#define __ARCH_SUPPORTS_VM_NONUNIFORM + #endif #endif diff -puN Documentation/feature-removal-schedule.txt~rfp-safety-net-for-archs Documentation/feature-removal-schedule.txt --- linux-2.6.git/Documentation/feature-removal-schedule.txt~rfp-safety-net-for-archs 2005-08-11 14:06:00.0 +0200 +++ linux-2.6.git-paolo/Documentation/feature-removal-schedule.txt 2005-08-11 14:10:34.0 +0200 @@ -135,3 +135,15 @@ Why: With the 16-bit PCMCIA subsystem no pcmciautils package available at http://kernel.org/pub/linux/utils/kernel/pcmcia/ Who: Dominik Brodowski <[EMAIL PROTECTED]> + +--- + +What: __ARCH_SUPPORTS_VM_NONUNIFORM +When: December 2005 +Files: mm/fremap.c, include/asm-*/pgtable.h +Why: It's just there to allow arches to update their page fault handlers to + support VM_FAULT_SIGSEGV, for remap_file_pages protection support. + Since they may BUG if this support is not added, the syscall code + refuses this new operation mode unless the arch declares itself as + "VM_FAULT_SIGSEGV-aware" with this macro. +Who: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 24/39] remap_file_pages protection support: adapt to uml peculiarities
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Uml is particular in respect with other architectures (and possibly this is to fix) in the fact that our arch fault handler handles indifferently both TLB and page faults. In particular, we may get to call handle_mm_fault() when the PTE is already correct, but simply it's not flushed. And rfp-fault-sigsegv-2 breaks this, because when getting a fault on a pte_present PTE and non-uniform VMA, it assumes the fault is due to a protection fault, and signals the caller a SIGSEGV must be sent. This isn't the final fix for UML, that's the next one. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/um/kernel/trap_kern.c | 19 +++ 1 files changed, 15 insertions(+), 4 deletions(-) diff -puN arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-3 arch/um/kernel/trap_kern.c --- linux-2.6.git/arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-3 2005-08-11 23:13:06.0 +0200 +++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c 2005-08-11 23:14:26.0 +0200 @@ -75,8 +75,21 @@ handle_fault: err = -EACCES; goto out; case VM_FAULT_SIGSEGV: - err = -EFAULT; - goto out; + /* Duplicate this code here. */ + pgd = pgd_offset(mm, address); + pud = pud_offset(pgd, address); + pmd = pmd_offset(pud, address); + pte = pte_offset_kernel(pmd, address); + if (likely (pte_newpage(*pte) || pte_newprot(*pte))) { + /* This wasn't done by __handle_mm_fault(), and +* the page hadn't been flushed. */ + *pte = pte_mkyoung(*pte); + if(pte_write(*pte)) *pte = pte_mkdirty(*pte); + break; + } else { + err = -EFAULT; + goto out; + } case VM_FAULT_OOM: err = -ENOMEM; goto out_of_memory; @@ -89,8 +102,6 @@ handle_fault: pte = pte_offset_kernel(pmd, address); } while(!pte_present(*pte)); err = 0; - *pte = pte_mkyoung(*pte); - if(pte_write(*pte)) *pte = pte_mkdirty(*pte); flush_tlb_page(vma, address); /* If the PTE is not present, the vma protection are not accurate if _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 09/39] remap_file_pages protection support: improvement for UML bits
Recover one bit by additionally using _PAGE_NEWPROT. Since I wasn't sure this would work, I've split this out. We rely on the fact that pte_newprot always checks first if the PTE is marked present. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-um/pgtable-2level.h | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff -puN include/asm-um/pgtable-2level.h~rfp-arch-uml-improv include/asm-um/pgtable-2level.h --- linux-2.6.git/include/asm-um/pgtable-2level.h~rfp-arch-uml-improv 2005-08-07 19:09:34.0 +0200 +++ linux-2.6.git-paolo/include/asm-um/pgtable-2level.h 2005-08-07 19:09:34.0 +0200 @@ -72,19 +72,19 @@ static inline void set_pte(pte_t *pteptr ((unsigned long) __va(pmd_val(pmd) & PAGE_MASK)) /* - * Bits 0 to 5 are taken, split up the 26 bits of offset + * Bits 0, 1, 3 to 5 are taken, split up the 27 bits of offset * into this range: */ -#define PTE_FILE_MAX_BITS 26 +#define PTE_FILE_MAX_BITS 27 -#define pte_to_pgoff(pte) (pte_val(pte) >> 6) +#define pte_to_pgoff(pte) (((pte_val(pte) >> 6) << 1) | ((pte_val(pte) >> 2) & 0x1)) #define pte_to_pgprot(pte) \ __pgprot((pte_val(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \ | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \ (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED) #define pgoff_prot_to_pte(off, prot) \ - ((pte_t) { ((off) << 6) + \ -(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE }) + __pteoff) >> 1) << 6) + (((off) & 0x1) << 2) + \ +(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE) #endif _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 26/39] remap_file_pages protection support: ppc32 bits
From: Ingo Molnar <[EMAIL PROTECTED]> PPC32 bits of RFP - as in original patch. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-ppc/pgtable.h | 15 +++ 1 files changed, 11 insertions(+), 4 deletions(-) diff -puN include/asm-ppc/pgtable.h~rfp-arch-ppc include/asm-ppc/pgtable.h --- linux-2.6.git/include/asm-ppc/pgtable.h~rfp-arch-ppc2005-08-12 18:18:43.0 +0200 +++ linux-2.6.git-paolo/include/asm-ppc/pgtable.h 2005-08-12 18:39:57.0 +0200 @@ -309,8 +309,8 @@ extern unsigned long ioremap_bot, iorema /* Definitions for 60x, 740/750, etc. */ #define _PAGE_PRESENT 0x001 /* software: pte contains a translation */ #define _PAGE_HASHPTE 0x002 /* hash_page has made an HPTE for this pte */ -#define _PAGE_FILE 0x004 /* when !present: nonlinear file mapping */ #define _PAGE_USER 0x004 /* usermode access allowed */ +#define _PAGE_FILE 0x008 /* when !present: nonlinear file mapping */ #define _PAGE_GUARDED 0x008 /* G: prohibit speculative access */ #define _PAGE_COHERENT 0x010 /* M: enforce memory coherence (SMP systems) */ #define _PAGE_NO_CACHE 0x020 /* I: cache inhibit */ @@ -728,9 +728,16 @@ extern void paging_init(void); #define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 }) /* Encode and decode a nonlinear file mapping entry */ -#define PTE_FILE_MAX_BITS 29 -#define pte_to_pgoff(pte) (pte_val(pte) >> 3) -#define pgoff_to_pte(off) ((pte_t) { ((off) << 3) | _PAGE_FILE }) +#define PTE_FILE_MAX_BITS 27 +#define pte_to_pgoff(pte) (((pte_val(pte) & ~0x7ff) >> 5) \ +| ((pte_val(pte) & 0x3f0) >> 4)) +#define pte_to_pgprot(pte) \ +__pgprot((pte_val(pte) & (_PAGE_USER|_PAGE_RW|_PAGE_PRESENT)) | _PAGE_ACCESSED) + +#define pgoff_prot_to_pte(off, prot) \ + ((pte_t) { (((off) << 5) & ~0x7ff) | (((off) << 4) & 0x3f0) \ + | (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) \ + | _PAGE_FILE }) /* CONFIG_APUS */ /* For virtual address to physical address conversion */ _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 19/39] remap file pages protection support: use FAULT_SIGSEGV for protection checking
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> The arch handler used to check itself protection, now we must possibly move that to the generic VM if the VMA is non-uniform. For now, do_file_page installs the PTE and doesn't check the fault type, if it was wrong, then it'll do another fault and die only then. I've left this for now to exercise more the code (and it works anyway). I've also changed do_no_pages to fault in pages with their *exact* permissions for non-uniform VMAs. The approach for checking is a bit clumsy because we are given a VM_{READ,WRITE,EXEC} mask so we do *strict* checking. For instance, a VM_EXEC mapping (which won't have VM_READ in vma->vm_flags) would give a fault on read. To fix that properly, we should get a pgprot mask and test pte_read()/write/exec; for now I workaround that in the i386/UML handler, I have patches for fixing that subsequently. Also, there is a (potential) problem: on VM_NONUNIFORM vmas, in handle_pte_fault(), if the PTE is present we return VM_FAULT_SIGSEGV. This has proven to be a bit strict, at least for UML - so this may break other arches too (only for new functionality). At least, peculiar ones - this problem was due to handle_mm_fault() called for TLB faults rather than PTE faults. Another problem I've just discovered is that PTRACE_POKETEXT access_process_vm on VM_NONUNIFORM write-protected vma's won't work. That's not a big problem. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/i386/mm/fault.c | 28 +++-- linux-2.6.git-paolo/include/linux/mm.h | 11 +++ linux-2.6.git-paolo/mm/memory.c | 96 --- 3 files changed, 108 insertions(+), 27 deletions(-) diff -puN arch/i386/mm/fault.c~rfp-fault-sigsegv-2 arch/i386/mm/fault.c --- linux-2.6.git/arch/i386/mm/fault.c~rfp-fault-sigsegv-2 2005-08-11 14:21:01.0 +0200 +++ linux-2.6.git-paolo/arch/i386/mm/fault.c2005-08-11 16:12:46.0 +0200 @@ -219,6 +219,7 @@ fastcall void do_page_fault(struct pt_re unsigned long address; unsigned long page; int write; + int access_mask = 0; siginfo_t info; /* get the address */ @@ -324,23 +325,24 @@ good_area: /* fall through */ case 2: /* write, not present */ if (!(vma->vm_flags & VM_WRITE)) - goto bad_area; + goto bad_area_prot; write++; break; - case 1: /* read, present */ + case 1: /* read, present - when does this happen? Maybe for NX exceptions? */ goto bad_area; case 0: /* read, not present */ if (!(vma->vm_flags & (VM_READ | VM_EXEC))) - goto bad_area; + goto bad_area_prot; } - survive: + access_mask = write ? VM_WRITE : 0; +handle_fault: /* * If for any reason at all we couldn't handle the fault, * make sure we exit gracefully rather than endlessly redo * the fault. */ - switch (handle_mm_fault(mm, vma, address, write)) { + switch (__handle_mm_fault(mm, vma, address, access_mask)) { case VM_FAULT_MINOR: tsk->min_flt++; break; @@ -368,6 +370,20 @@ good_area: up_read(&mm->mmap_sem); return; + /* If the PTE is not present, the vma protection are not accurate if +* VM_NONUNIFORM; present PTE's are correct for VM_NONUNIFORM and were +* already handled otherwise. */ +bad_area_prot: + if (unlikely(vma->vm_flags & VM_NONUNIFORM)) { + access_mask = write ? VM_WRITE : 0; + /* Otherwise, on a legitimate read fault on a page mapped as +* exec-only, we get problems. Probably, we should lower +* requirements... we should always test just +* pte_read/write/exec, on vma->vm_page_prot! This way is +* cumbersome. However, for now things should work for i386. */ + access_mask |= vma->vm_flags & VM_EXEC ? VM_EXEC : VM_READ; + goto handle_fault; + } /* * Something tried to access memory that isn't in our memory map.. * Fix it, but check if it's kernel or user first.. @@ -481,7 +497,7 @@ out_of_memory: if (tsk->pid == 1) { yield(); down_read(&mm->mmap_sem); - goto survive; + goto handle_fault; } printk("VM: killing process %s\n", tsk->comm); if (error_code & 4) diff -puN mm/memory.c~rfp-fault-
[patch 1/1] uml: fixes performance regression in activate_mm and thus exec()
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> CC: Benjamin LaHaise <[EMAIL PROTECTED]> Normally, activate_mm() is called from exec(), and thus it used to be a no-op because we use a completely new "MM context" on the host (for instance, a new process), and so we didn't need to flush any "TLB entries" (which for us are the set of memory mappings for the host process from the virtual "RAM" file). Kernel threads, instead, are usually handled in a different way. So, when for AIO we call use_mm(), things used to break and so Benjamin implemented activate_mm(). However, that is only needed for AIO, and could slow down exec() inside UML, so be smart: detect being called for AIO (via PF_BORROWED_MM) and do the full flush only in that situation. Comment also the caller so that people won't go breaking UML without noticing. I also rely on the caller's locks for testing current->flags. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/fs/aio.c |2 ++ linux-2.6.git-paolo/include/asm-um/mmu_context.h |8 +++- 2 files changed, 9 insertions(+), 1 deletion(-) diff -puN include/asm-um/mmu_context.h~uml-optimize-activate-mm include/asm-um/mmu_context.h --- linux-2.6.git/include/asm-um/mmu_context.h~uml-optimize-activate-mm 2005-08-06 12:53:30.141344264 +0200 +++ linux-2.6.git-paolo/include/asm-um/mmu_context.h2005-08-06 12:58:49.682766584 +0200 @@ -20,7 +20,13 @@ extern void force_flush_all(void); static inline void activate_mm(struct mm_struct *old, struct mm_struct *new) { - if (old != new) + /* This is called by fs/exec.c and fs/aio.c. In the first case, for an +* exec, we don't need to do anything as we're called from userspace +* and thus going to use a new host PID. In the second, we're called +* from a kernel thread, and thus need to go doing the mmap's on the +* host. Since they're very expensive, we want to avoid that as far as +* possible. */ + if (old != new && (current->flags & PF_BORROWED_MM)) force_flush_all(); } diff -puN fs/aio.c~uml-optimize-activate-mm fs/aio.c --- linux-2.6.git/fs/aio.c~uml-optimize-activate-mm 2005-08-06 12:59:14.393010056 +0200 +++ linux-2.6.git-paolo/fs/aio.c2005-08-06 13:03:07.163623544 +0200 @@ -567,6 +567,8 @@ static void use_mm(struct mm_struct *mm) atomic_inc(&mm->mm_count); tsk->mm = mm; tsk->active_mm = mm; + /* Note that on UML this *requires* PF_BORROWED_MM to be set, otherwise +* it won't work. Update it accordingly if you change it here. */ activate_mm(active_mm, mm); task_unlock(tsk); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 03/39] add swap cache mapping comment
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Add some more comments about page->mapping and swapper_space, explaining their (historical and current) relationship. Such material can be extracted from the old GIT history (which I used for reference), but having it in the source is more useful. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/./mm/swap_state.c |5 + 1 files changed, 5 insertions(+) diff -puN ./mm/swap_state.c~swap-cache-mapping-comment ./mm/swap_state.c --- linux-2.6.git/./mm/swap_state.c~swap-cache-mapping-comment 2005-08-11 11:12:57.0 +0200 +++ linux-2.6.git-paolo/./mm/swap_state.c 2005-08-11 11:12:57.0 +0200 @@ -21,6 +21,11 @@ * swapper_space is a fiction, retained to simplify the path through * vmscan's shrink_list, to make sync_page look nicer, and to allow * future use of radix_tree tags in the swap cache. + * + * In 2.4 and until 2.6.6 pages in the swap cache also had page->mapping == + * &swapper_space (this was the definition of PageSwapCache), but this is no + * more true. Instead, we use page->flags for that, and page->mapping is + * *ignored* here. However, also take a look at page_mapping(). */ static struct address_space_operations swap_aops = { .writepage = swap_writepage, _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 15/39] remap_file_pages protection support: add VM_NONUNIFORM to fix existing usage of mprotect()
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Distinguish between "normal" VMA and VMA with non-uniform protection. This will be also useful for fault handling (we must ignore VM_{READ,WRITE,EXEC} in the arch fault handler). As said before, with remap-file-pages-prot, we must punt on private VMA even when we're just changing protections. Also, with the remap_file_pages protection support, we have indeed a regression with remap_file_pages VS mprotect. mprotect alters the VMA protections and walks each installed PTE. Mprotect'ing a nonlinear VMA used to work, obviously, but now doesn't, because we must now read the protections from the PTE which haven't been updated; so, to avoid changing behaviour for old binaries, on uniform VMA's we ignore protections in the PTE, like we did before. On non-uniform VMA's, instead, mprotect is currently broken, however we've never supported it so this is acceptable. What it does is to split the VMA if needed, assign the new protection to the VMA and enforce the new protections on all present pages, ignoring all absent ones (including pte_file() ones), which will keep the current protections. So, the application has no reliable way to know which pages would actually be remapped. What is more, there is IMHO no reason to support using mprotect on non-uniform VMAs. The only exception is to change the VMA's default protection (which is used for non-individually remapped pages), but it should still ignore the page tables. The only need for that is if I want to change protections without changing the indexes, which with remap_file_pages you must do one page at a time and re-specifying the indexes. It is more reasonable to allow remap_file_pages to change protections on a PTE range without changing the offsets. I've not implemented this, but if wanted I can. For sure, UML doesn't need this interface. However, for now I've implemented no change to mprotect(), I'd like to get some feedback before about which way to go. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/linux/mm.h |7 +++ linux-2.6.git-paolo/mm/fremap.c| 13 + linux-2.6.git-paolo/mm/memory.c|2 +- 3 files changed, 21 insertions(+), 1 deletion(-) diff -puN mm/fremap.c~rfp-add-VM_NONUNIFORM mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-add-VM_NONUNIFORM 2005-08-11 23:03:51.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:03:51.0 +0200 @@ -252,6 +252,19 @@ retry: spin_unlock(&mapping->i_mmap_lock); } } + if (pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot)) { + if (!(vma->vm_flags & VM_SHARED)) + goto out_unlock; + if (!(vma->vm_flags & VM_NONUNIFORM)) { + if (!has_write_lock) { + up_read(&mm->mmap_sem); + down_write(&mm->mmap_sem); + has_write_lock = 1; + goto retry; + } + vma->vm_flags |= VM_NONUNIFORM; + } + } err = vma->vm_ops->populate(vma, start, size, pgprot, pgoff, flags & MAP_NONBLOCK); diff -puN include/linux/mm.h~rfp-add-VM_NONUNIFORM include/linux/mm.h --- linux-2.6.git/include/linux/mm.h~rfp-add-VM_NONUNIFORM 2005-08-11 23:03:51.0 +0200 +++ linux-2.6.git-paolo/include/linux/mm.h 2005-08-11 23:03:51.0 +0200 @@ -160,7 +160,14 @@ extern unsigned int kobjsize(const void #define VM_ACCOUNT 0x0010 /* Is a VM accounted object */ #define VM_HUGETLB 0x0040 /* Huge TLB Page VM */ #define VM_NONLINEAR 0x0080 /* Is non-linear (remap_file_pages) */ + +#ifndef CONFIG_MMU #define VM_MAPPED_COPY 0x0100 /* T if mapped copy of data (nommu mmap) */ +#else +#define VM_NONUNIFORM 0x0100 /* The VM individual pages have + different protections + (remap_file_pages)*/ +#endif #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */ #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS diff -puN mm/memory.c~rfp-add-VM_NONUNIFORM mm/memory.c --- linux-2.6.git/mm/memory.c~rfp-add-VM_NONUNIFORM 2005-08-11 23:03:51.0 +0200 +++ linux-2.6.git-paolo/mm/memory.c 2005-08-11 23:03:51.0 +0200 @@ -1941,7 +1941,7 @@ static int do_file_page(struct mm_struct } pgoff = pte_to_pgoff(*pte); - pgprot = pte_to_pgprot(*pte); + pgpr
[patch 14/39] remap_file_pages protection support: assume VM_SHARED never disappears
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Assume that even after dropping and reacquiring the lock, (vma->vm_flags & VM_SHARED) won't change, thus moving a check earlier. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/fremap.c | 12 ++-- 1 files changed, 2 insertions(+), 10 deletions(-) diff -puN mm/fremap.c~rfp-assume-VM_PRIVATE-stays mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-assume-VM_PRIVATE-stays 2005-08-11 12:58:07.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 13:38:56.0 +0200 @@ -232,6 +232,8 @@ retry: /* Must set VM_NONLINEAR before any pages are populated. */ if (pgoff != linear_page_index(vma, start)) { + if (!(vma->vm_flags & VM_SHARED)) + goto out_unlock; if (!(vma->vm_flags & VM_NONLINEAR)) { if (!has_write_lock) { up_read(&mm->mmap_sem); @@ -239,12 +241,6 @@ retry: has_write_lock = 1; goto retry; } - /* XXX: we check VM_SHARED after re-getting the -* (write) semaphore but I guess that we could -* check it earlier as we're not allowed to turn -* a VM_PRIVATE vma into a VM_SHARED one! */ - if (!(vma->vm_flags & VM_SHARED)) - goto out_unlock; mapping = vma->vm_file->f_mapping; spin_lock(&mapping->i_mmap_lock); @@ -254,10 +250,6 @@ retry: vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear); flush_dcache_mmap_unlock(mapping); spin_unlock(&mapping->i_mmap_lock); - } else { - /* Won't drop the lock, check it here.*/ - if (!(vma->vm_flags & VM_SHARED)) - goto out_unlock; } } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 06/39] correct _PAGE_FILE comment
_PAGE_FILE does not indicate whether a file is in page / swap cache, it is set just for non-linear PTE's. Correct the comment for i386, x86_64, UML. Also clearify _PAGE_NONE. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-i386/pgtable.h | 10 +- linux-2.6.git-paolo/include/asm-um/pgtable.h |8 +--- linux-2.6.git-paolo/include/asm-x86_64/pgtable.h |2 +- 3 files changed, 11 insertions(+), 9 deletions(-) diff -puN include/asm-i386/pgtable.h~correct-_PAGE_FILE-comment include/asm-i386/pgtable.h --- linux-2.6.git/include/asm-i386/pgtable.h~correct-_PAGE_FILE-comment 2005-08-11 11:17:04.0 +0200 +++ linux-2.6.git-paolo/include/asm-i386/pgtable.h 2005-08-11 11:17:04.0 +0200 @@ -86,9 +86,7 @@ void paging_init(void); #endif /* - * The 4MB page is guessing.. Detailed in the infamous "Chapter H" - * of the Pentium details, but assuming intel did the straightforward - * thing, this bit set in the page directory entry just means that + * _PAGE_PSE set in the page directory entry just means that * the page directory entry points directly to a 4MB-aligned block of * memory. */ @@ -119,8 +117,10 @@ void paging_init(void); #define _PAGE_UNUSED2 0x400 #define _PAGE_UNUSED3 0x800 -#define _PAGE_FILE 0x040 /* set:pagecache unset:swap */ -#define _PAGE_PROTNONE 0x080 /* If not present */ +/* If _PAGE_PRESENT is clear, we use these: */ +#define _PAGE_FILE 0x040 /* nonlinear file mapping, saved PTE; unset:swap */ +#define _PAGE_PROTNONE 0x080 /* if the user mapped it with PROT_NONE; + pte_present gives true */ #ifdef CONFIG_X86_PAE #define _PAGE_NX (1ULL<<_PAGE_BIT_NX) #else diff -puN include/asm-x86_64/pgtable.h~correct-_PAGE_FILE-comment include/asm-x86_64/pgtable.h --- linux-2.6.git/include/asm-x86_64/pgtable.h~correct-_PAGE_FILE-comment 2005-08-11 11:17:04.0 +0200 +++ linux-2.6.git-paolo/include/asm-x86_64/pgtable.h2005-08-11 11:17:04.0 +0200 @@ -143,7 +143,7 @@ extern inline void pgd_clear (pgd_t * pg #define _PAGE_ACCESSED 0x020 #define _PAGE_DIRTY0x040 #define _PAGE_PSE 0x080 /* 2MB page */ -#define _PAGE_FILE 0x040 /* set:pagecache, unset:swap */ +#define _PAGE_FILE 0x040 /* nonlinear file mapping, saved PTE; unset:swap */ #define _PAGE_GLOBAL 0x100 /* Global TLB entry */ #define _PAGE_PROTNONE 0x080 /* If not present */ diff -puN include/asm-um/pgtable.h~correct-_PAGE_FILE-comment include/asm-um/pgtable.h --- linux-2.6.git/include/asm-um/pgtable.h~correct-_PAGE_FILE-comment 2005-08-11 11:17:04.0 +0200 +++ linux-2.6.git-paolo/include/asm-um/pgtable.h2005-08-11 11:17:04.0 +0200 @@ -16,13 +16,15 @@ #define _PAGE_PRESENT 0x001 #define _PAGE_NEWPAGE 0x002 -#define _PAGE_NEWPROT 0x004 -#define _PAGE_FILE 0x008 /* set:pagecache unset:swap */ -#define _PAGE_PROTNONE 0x010 /* If not present */ +#define _PAGE_NEWPROT 0x004 #define _PAGE_RW 0x020 #define _PAGE_USER 0x040 #define _PAGE_ACCESSED 0x080 #define _PAGE_DIRTY0x100 +/* If _PAGE_PRESENT is clear, we use these: */ +#define _PAGE_FILE 0x008 /* nonlinear file mapping, saved PTE; unset:swap */ +#define _PAGE_PROTNONE 0x010 /* if the user mapped it with PROT_NONE; + pte_present gives true */ #ifdef CONFIG_3_LEVEL_PGTABLES #include "asm/pgtable-3level.h" _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 17/39] remap_file_pages protection support: safety net for lazy arches
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Since proper support requires that the arch at the very least handles VM_FAULT_SIGSEGV, as in next patch (otherwise the arch may BUG), and things are even more complex (see next patches), and it's triggerable only with VM_NONUNIFORM vma's, simply refuse creating them if the arch doesn't declare itself ready. This is a very temporary hack, so I've clearly marked it as such. And, with current rythms, I've given about 6 months for arches to get ready. Reducing this time is perfectly ok for me. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/Documentation/feature-removal-schedule.txt | 12 ++ linux-2.6.git-paolo/include/asm-i386/pgtable.h |3 ++ linux-2.6.git-paolo/include/asm-um/pgtable.h |3 ++ linux-2.6.git-paolo/mm/fremap.c|5 4 files changed, 23 insertions(+) diff -puN mm/fremap.c~rfp-safety-net-for-archs mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-safety-net-for-archs 2005-08-11 13:46:49.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 13:55:02.0 +0200 @@ -184,6 +184,11 @@ asmlinkage long sys_remap_file_pages(uns int err = -EINVAL; int has_write_lock = 0; + /* Hack for not-updated archs, KILLME after 2.6.16! */ +#ifndef __ARCH_SUPPORTS_VM_NONUNIFORM + if (flags & MAP_NOINHERIT) + goto out; +#endif if (prot && !(flags & MAP_NOINHERIT)) goto out; /* diff -puN include/asm-i386/pgtable.h~rfp-safety-net-for-archs include/asm-i386/pgtable.h --- linux-2.6.git/include/asm-i386/pgtable.h~rfp-safety-net-for-archs 2005-08-11 13:46:49.0 +0200 +++ linux-2.6.git-paolo/include/asm-i386/pgtable.h 2005-08-11 13:55:02.0 +0200 @@ -419,4 +419,7 @@ extern void noexec_setup(const char *str #define __HAVE_ARCH_PTE_SAME #include +/* Hack for not-updated archs, KILLME after 2.6.16! */ +#define __ARCH_SUPPORTS_VM_NONUNIFORM + #endif /* _I386_PGTABLE_H */ diff -puN include/asm-um/pgtable.h~rfp-safety-net-for-archs include/asm-um/pgtable.h --- linux-2.6.git/include/asm-um/pgtable.h~rfp-safety-net-for-archs 2005-08-11 13:46:49.0 +0200 +++ linux-2.6.git-paolo/include/asm-um/pgtable.h2005-08-11 13:55:02.0 +0200 @@ -361,6 +361,9 @@ static inline pte_t pte_modify(pte_t pte #include +/* Hack for not-updated archs, KILLME after 2.6.16! */ +#define __ARCH_SUPPORTS_VM_NONUNIFORM + #endif #endif diff -puN Documentation/feature-removal-schedule.txt~rfp-safety-net-for-archs Documentation/feature-removal-schedule.txt --- linux-2.6.git/Documentation/feature-removal-schedule.txt~rfp-safety-net-for-archs 2005-08-11 14:06:00.0 +0200 +++ linux-2.6.git-paolo/Documentation/feature-removal-schedule.txt 2005-08-11 14:10:34.0 +0200 @@ -135,3 +135,15 @@ Why: With the 16-bit PCMCIA subsystem no pcmciautils package available at http://kernel.org/pub/linux/utils/kernel/pcmcia/ Who: Dominik Brodowski <[EMAIL PROTECTED]> + +--- + +What: __ARCH_SUPPORTS_VM_NONUNIFORM +When: December 2005 +Files: mm/fremap.c, include/asm-*/pgtable.h +Why: It's just there to allow arches to update their page fault handlers to + support VM_FAULT_SIGSEGV, for remap_file_pages protection support. + Since they may BUG if this support is not added, the syscall code + refuses this new operation mode unless the arch declares itself as + "VM_FAULT_SIGSEGV-aware" with this macro. +Who: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 11/39] remap_file_pages protection support: add MAP_NOINHERIT flag
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Add the MAP_NOINHERIT flag to arch headers, for use with remap-file-pages. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-i386/mman.h |1 + linux-2.6.git-paolo/include/asm-ia64/mman.h |1 + linux-2.6.git-paolo/include/asm-ppc/mman.h|1 + linux-2.6.git-paolo/include/asm-ppc64/mman.h |1 + linux-2.6.git-paolo/include/asm-s390/mman.h |1 + linux-2.6.git-paolo/include/asm-x86_64/mman.h |1 + 6 files changed, 6 insertions(+) diff -puN include/asm-i386/mman.h~rfp-map-noinherit include/asm-i386/mman.h --- linux-2.6.git/include/asm-i386/mman.h~rfp-map-noinherit 2005-08-11 12:06:40.0 +0200 +++ linux-2.6.git-paolo/include/asm-i386/mman.h 2005-08-11 12:06:40.0 +0200 @@ -22,6 +22,7 @@ #define MAP_NORESERVE 0x4000 /* don't check for reservations */ #define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x1 /* do not block on IO */ +#define MAP_NOINHERIT 0x2 /* don't inherit the protection bits of the underlying vma*/ #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_INVALIDATE 2 /* invalidate the caches */ diff -puN include/asm-ia64/mman.h~rfp-map-noinherit include/asm-ia64/mman.h --- linux-2.6.git/include/asm-ia64/mman.h~rfp-map-noinherit 2005-08-11 12:06:40.0 +0200 +++ linux-2.6.git-paolo/include/asm-ia64/mman.h 2005-08-11 12:06:40.0 +0200 @@ -30,6 +30,7 @@ #define MAP_NORESERVE 0x04000 /* don't check for reservations */ #define MAP_POPULATE 0x08000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x1 /* do not block on IO */ +#define MAP_NOINHERIT 0x2 /* don't inherit the protection bits of the underlying vma*/ #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_INVALIDATE 2 /* invalidate the caches */ diff -puN include/asm-ppc64/mman.h~rfp-map-noinherit include/asm-ppc64/mman.h --- linux-2.6.git/include/asm-ppc64/mman.h~rfp-map-noinherit2005-08-11 12:06:40.0 +0200 +++ linux-2.6.git-paolo/include/asm-ppc64/mman.h2005-08-11 12:06:40.0 +0200 @@ -38,6 +38,7 @@ #define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x1 /* do not block on IO */ +#define MAP_NOINHERIT 0x2 /* don't inherit the protection bits of the underlying vma*/ #define MADV_NORMAL0x0 /* default page-in behavior */ #define MADV_RANDOM0x1 /* page-in minimum required */ diff -puN include/asm-ppc/mman.h~rfp-map-noinherit include/asm-ppc/mman.h --- linux-2.6.git/include/asm-ppc/mman.h~rfp-map-noinherit 2005-08-11 12:06:40.0 +0200 +++ linux-2.6.git-paolo/include/asm-ppc/mman.h 2005-08-11 12:06:40.0 +0200 @@ -23,6 +23,7 @@ #define MAP_EXECUTABLE 0x1000 /* mark it as an executable */ #define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x1 /* do not block on IO */ +#define MAP_NOINHERIT 0x2 /* don't inherit the protection bits of the underlying vma*/ #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_INVALIDATE 2 /* invalidate the caches */ diff -puN include/asm-s390/mman.h~rfp-map-noinherit include/asm-s390/mman.h --- linux-2.6.git/include/asm-s390/mman.h~rfp-map-noinherit 2005-08-11 12:06:40.0 +0200 +++ linux-2.6.git-paolo/include/asm-s390/mman.h 2005-08-11 12:06:40.0 +0200 @@ -30,6 +30,7 @@ #define MAP_NORESERVE 0x4000 /* don't check for reservations */ #define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x1 /* do not block on IO */ +#define MAP_NOINHERIT 0x2 /* don't inherit the protection bits of the underlying vma*/ #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_INVALIDATE 2 /* invalidate the caches */ diff -puN include/asm-x86_64/mman.h~rfp-map-noinherit include/asm-x86_64/mman.h --- linux-2.6.git/include/asm-x86_64/mman.h~rfp-map-noinherit 2005-08-11 12:06:40.0 +0200 +++ linux-2.6.git-paolo/include/asm-x86_64/mman.h 2005-08-11 12:06:40.0 +0200 @@ -23,6 +23,7 @@ #define MAP_NORESERVE 0x4000 /* don't check for reservations */ #define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ #define MAP_NONBLOCK 0x1 /* do not block on IO */ +#define MAP_NOINHERIT 0x2 /* don't inherit the protection bits of the underlying vma*/ #define MS_ASYNC 1 /* sync memory asynchronously */ #define
[patch 10/39] remap_file_pages protection support: i386 and x86-64 bits
Update pte encoding macros for i386 and x86-64. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-i386/pgtable-2level.h | 15 ++- linux-2.6.git-paolo/include/asm-i386/pgtable-3level.h | 11 ++- linux-2.6.git-paolo/include/asm-x86_64/pgtable.h | 12 +++- 3 files changed, 31 insertions(+), 7 deletions(-) diff -puN include/asm-i386/pgtable-2level.h~rfp-arch-i386-x86_64 include/asm-i386/pgtable-2level.h --- linux-2.6.git/include/asm-i386/pgtable-2level.h~rfp-arch-i386-x86_64 2005-08-11 11:42:28.0 +0200 +++ linux-2.6.git-paolo/include/asm-i386/pgtable-2level.h 2005-08-11 11:42:28.0 +0200 @@ -48,16 +48,21 @@ static inline int pte_exec_kernel(pte_t } /* - * Bits 0, 6 and 7 are taken, split up the 29 bits of offset + * Bits 0, 1, 6 and 7 are taken, split up the 28 bits of offset * into this range: */ -#define PTE_FILE_MAX_BITS 29 +#define PTE_FILE_MAX_BITS 28 #define pte_to_pgoff(pte) \ - pte).pte_low >> 1) & 0x1f ) + (((pte).pte_low >> 8) << 5 )) + pte).pte_low >> 2) & 0xf ) + (((pte).pte_low >> 8) << 4 )) +#define pte_to_pgprot(pte) \ + __pgprot(((pte).pte_low & (_PAGE_RW | _PAGE_PROTNONE)) \ + | (((pte).pte_low & _PAGE_PROTNONE) ? 0 : \ + (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED) -#define pgoff_to_pte(off) \ - ((pte_t) { (((off) & 0x1f) << 1) + (((off) >> 5) << 8) + _PAGE_FILE }) +#define pgoff_prot_to_pte(off, prot) \ + ((pte_t) { (((off) & 0xf) << 2) + (((off) >> 4) << 8) + \ +(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE }) /* Encode and de-code a swap entry */ #define __swp_type(x) (((x).val >> 1) & 0x1f) diff -puN include/asm-i386/pgtable-3level.h~rfp-arch-i386-x86_64 include/asm-i386/pgtable-3level.h --- linux-2.6.git/include/asm-i386/pgtable-3level.h~rfp-arch-i386-x86_64 2005-08-11 11:42:28.0 +0200 +++ linux-2.6.git-paolo/include/asm-i386/pgtable-3level.h 2005-08-11 11:42:28.0 +0200 @@ -145,7 +145,16 @@ static inline pmd_t pfn_pmd(unsigned lon * put the 32 bits of offset into the high part. */ #define pte_to_pgoff(pte) ((pte).pte_high) -#define pgoff_to_pte(off) ((pte_t) { _PAGE_FILE, (off) }) + +#define pte_to_pgprot(pte) \ + __pgprot(((pte).pte_low & (_PAGE_RW | _PAGE_PROTNONE)) \ + | (((pte).pte_low & _PAGE_PROTNONE) ? 0 : \ + (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED) + +#define pgoff_prot_to_pte(off, prot) \ + ((pte_t) { _PAGE_FILE + \ + (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) , (off) }) + #define PTE_FILE_MAX_BITS 32 /* Encode and de-code a swap entry */ diff -puN include/asm-x86_64/pgtable.h~rfp-arch-i386-x86_64 include/asm-x86_64/pgtable.h --- linux-2.6.git/include/asm-x86_64/pgtable.h~rfp-arch-i386-x86_64 2005-08-11 11:42:28.0 +0200 +++ linux-2.6.git-paolo/include/asm-x86_64/pgtable.h2005-08-11 11:42:28.0 +0200 @@ -343,9 +343,19 @@ static inline pud_t *__pud_offset_k(pud_ #define pmd_pfn(x) ((pmd_val(x) >> PAGE_SHIFT) & __PHYSICAL_MASK) #define pte_to_pgoff(pte) ((pte_val(pte) & PHYSICAL_PAGE_MASK) >> PAGE_SHIFT) -#define pgoff_to_pte(off) ((pte_t) { ((off) << PAGE_SHIFT) | _PAGE_FILE }) #define PTE_FILE_MAX_BITS __PHYSICAL_MASK_SHIFT +#define pte_to_pgprot(pte) \ + __pgprot((pte_val(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \ + | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \ + (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED) + +#define pgoff_prot_to_pte(off, prot) \ + ((pte_t) { _PAGE_FILE + \ + (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + \ + ((off) << PAGE_SHIFT) }) + + /* PTE - Level 1 access. */ /* page, protection -> pte */ _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 19/39] remap file pages protection support: use FAULT_SIGSEGV for protection checking
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> The arch handler used to check itself protection, now we must possibly move that to the generic VM if the VMA is non-uniform. For now, do_file_page installs the PTE and doesn't check the fault type, if it was wrong, then it'll do another fault and die only then. I've left this for now to exercise more the code (and it works anyway). I've also changed do_no_pages to fault in pages with their *exact* permissions for non-uniform VMAs. The approach for checking is a bit clumsy because we are given a VM_{READ,WRITE,EXEC} mask so we do *strict* checking. For instance, a VM_EXEC mapping (which won't have VM_READ in vma->vm_flags) would give a fault on read. To fix that properly, we should get a pgprot mask and test pte_read()/write/exec; for now I workaround that in the i386/UML handler, I have patches for fixing that subsequently. Also, there is a (potential) problem: on VM_NONUNIFORM vmas, in handle_pte_fault(), if the PTE is present we return VM_FAULT_SIGSEGV. This has proven to be a bit strict, at least for UML - so this may break other arches too (only for new functionality). At least, peculiar ones - this problem was due to handle_mm_fault() called for TLB faults rather than PTE faults. Another problem I've just discovered is that PTRACE_POKETEXT access_process_vm on VM_NONUNIFORM write-protected vma's won't work. That's not a big problem. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/i386/mm/fault.c | 28 +++-- linux-2.6.git-paolo/include/linux/mm.h | 11 +++ linux-2.6.git-paolo/mm/memory.c | 96 --- 3 files changed, 108 insertions(+), 27 deletions(-) diff -puN arch/i386/mm/fault.c~rfp-fault-sigsegv-2 arch/i386/mm/fault.c --- linux-2.6.git/arch/i386/mm/fault.c~rfp-fault-sigsegv-2 2005-08-11 14:21:01.0 +0200 +++ linux-2.6.git-paolo/arch/i386/mm/fault.c2005-08-11 16:12:46.0 +0200 @@ -219,6 +219,7 @@ fastcall void do_page_fault(struct pt_re unsigned long address; unsigned long page; int write; + int access_mask = 0; siginfo_t info; /* get the address */ @@ -324,23 +325,24 @@ good_area: /* fall through */ case 2: /* write, not present */ if (!(vma->vm_flags & VM_WRITE)) - goto bad_area; + goto bad_area_prot; write++; break; - case 1: /* read, present */ + case 1: /* read, present - when does this happen? Maybe for NX exceptions? */ goto bad_area; case 0: /* read, not present */ if (!(vma->vm_flags & (VM_READ | VM_EXEC))) - goto bad_area; + goto bad_area_prot; } - survive: + access_mask = write ? VM_WRITE : 0; +handle_fault: /* * If for any reason at all we couldn't handle the fault, * make sure we exit gracefully rather than endlessly redo * the fault. */ - switch (handle_mm_fault(mm, vma, address, write)) { + switch (__handle_mm_fault(mm, vma, address, access_mask)) { case VM_FAULT_MINOR: tsk->min_flt++; break; @@ -368,6 +370,20 @@ good_area: up_read(&mm->mmap_sem); return; + /* If the PTE is not present, the vma protection are not accurate if +* VM_NONUNIFORM; present PTE's are correct for VM_NONUNIFORM and were +* already handled otherwise. */ +bad_area_prot: + if (unlikely(vma->vm_flags & VM_NONUNIFORM)) { + access_mask = write ? VM_WRITE : 0; + /* Otherwise, on a legitimate read fault on a page mapped as +* exec-only, we get problems. Probably, we should lower +* requirements... we should always test just +* pte_read/write/exec, on vma->vm_page_prot! This way is +* cumbersome. However, for now things should work for i386. */ + access_mask |= vma->vm_flags & VM_EXEC ? VM_EXEC : VM_READ; + goto handle_fault; + } /* * Something tried to access memory that isn't in our memory map.. * Fix it, but check if it's kernel or user first.. @@ -481,7 +497,7 @@ out_of_memory: if (tsk->pid == 1) { yield(); down_read(&mm->mmap_sem); - goto survive; + goto handle_fault; } printk("VM: killing process %s\n", tsk->comm); if (error_code & 4) diff -puN mm/memory.c~rfp-fault-
[patch 12/39] remap_file_pages protection support: enhance syscall interface and swapout code
From: Ingo Molnar <[EMAIL PROTECTED]>, Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> This is the "main" patch for the syscall code, containing the core of what was sent by Ingo Molnar, variously reworked. Differently from his patch, I've *not* added a new syscall, choosing to add a new flag (MAP_NOINHERIT) which the application must specify to get the new behavior (prot != 0 is accepted and prot == 0 means PROT_NONE). The changes to the page fault handler have been separated, even because that has required considerable amount of effort. Handle the possibility that remap_file_pages changes protections in various places. * Enable the 'prot' parameter for shared-writable mappings (the ones which are the primary target for remap_file_pages), without breaking up the vma * Use pte_file PTE's also when protections don't match, not only when the offset doesn't match; and add set_nonlinear_pte() for this testing * Save the current protection too when clearing a nonlinear PTE, by replacing pgoff_to_pte() uses with pgoff_prot_to_pte(). * Use the supplied protections on restore and on populate (partially uncomplete, fixed in subsequent patches) Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/linux/pagemap.h | 19 ++ linux-2.6.git-paolo/mm/fremap.c | 50 +--- linux-2.6.git-paolo/mm/memory.c | 14 --- linux-2.6.git-paolo/mm/rmap.c |3 - 4 files changed, 60 insertions(+), 26 deletions(-) diff -puN include/linux/pagemap.h~rfp-enhance-syscall-and-swapout-code include/linux/pagemap.h --- linux-2.6.git/include/linux/pagemap.h~rfp-enhance-syscall-and-swapout-code 2005-08-11 22:59:47.0 +0200 +++ linux-2.6.git-paolo/include/linux/pagemap.h 2005-08-11 22:59:47.0 +0200 @@ -159,6 +159,25 @@ static inline pgoff_t linear_page_index( return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT); } +/*** + * Checks if the PTE is nonlinear, and if yes sets it. + * @vma: the VMA in which @addr is; we don't check if it's VM_NONLINEAR, just + * if this PTE is nonlinear. + * @addr: the addr which @pte refers to. + * @pte: the old PTE value (to read its protections. + * @ptep: the PTE pointer (for setting it). + * @mm: passed to set_pte_at. + * @page: the page which was installed (to read its ->index, i.e. the old + * offset inside the file. + */ +static inline void set_nonlinear_pte(pte_t pte, pte_t * ptep, struct vm_area_struct *vma, struct mm_struct *mm, struct page* page, unsigned long addr) +{ + pgprot_t pgprot = pte_to_pgprot(pte); + if(linear_page_index(vma, addr) != page->index || + pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot)) + set_pte_at(mm, addr, ptep, pgoff_prot_to_pte(page->index, pgprot)); +} + extern void FASTCALL(__lock_page(struct page *page)); extern void FASTCALL(unlock_page(struct page *page)); diff -puN mm/fremap.c~rfp-enhance-syscall-and-swapout-code mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-enhance-syscall-and-swapout-code 2005-08-11 22:59:47.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:01:14.0 +0200 @@ -54,7 +54,7 @@ static inline void zap_pte(struct mm_str * previously existing mapping. */ int install_page(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, struct page *page, pgprot_t prot) + unsigned long addr, struct page *page, pgprot_t pgprot) { struct inode *inode; pgoff_t size; @@ -94,7 +94,7 @@ int install_page(struct mm_struct *mm, s inc_mm_counter(mm,rss); flush_icache_page(vma, page); - set_pte_at(mm, addr, pte, mk_pte(page, prot)); + set_pte_at(mm, addr, pte, mk_pte(page, pgprot)); page_add_file_rmap(page); pte_val = *pte; pte_unmap(pte); @@ -113,7 +113,7 @@ EXPORT_SYMBOL(install_page); * previously existing mapping. */ int install_file_pte(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, unsigned long pgoff, pgprot_t prot) + unsigned long addr, unsigned long pgoff, pgprot_t pgprot) { int err = -ENOMEM; pte_t *pte; @@ -139,7 +139,7 @@ int install_file_pte(struct mm_struct *m zap_pte(mm, vma, addr, pte); - set_pte_at(mm, addr, pte, pgoff_to_pte(pgoff)); + set_pte_at(mm, addr, pte, pgoff_prot_to_pte(pgoff, pgprot)); pte_val = *pte; pte_unmap(pte); update_mmu_cache(vma, addr, pte_val); @@ -157,31 +157,28 @@ err_unlock: *file within an existing vma. * @start: start of the remapped virtual memory range * @size: size of the remapped virtual memory range - * @prot: new protection bits of the range + * @prot: new protection bits of the range, must be 0 if not us
[patch 09/39] remap_file_pages protection support: improvement for UML bits
Recover one bit by additionally using _PAGE_NEWPROT. Since I wasn't sure this would work, I've split this out. We rely on the fact that pte_newprot always checks first if the PTE is marked present. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-um/pgtable-2level.h | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff -puN include/asm-um/pgtable-2level.h~rfp-arch-uml-improv include/asm-um/pgtable-2level.h --- linux-2.6.git/include/asm-um/pgtable-2level.h~rfp-arch-uml-improv 2005-08-07 19:09:34.0 +0200 +++ linux-2.6.git-paolo/include/asm-um/pgtable-2level.h 2005-08-07 19:09:34.0 +0200 @@ -72,19 +72,19 @@ static inline void set_pte(pte_t *pteptr ((unsigned long) __va(pmd_val(pmd) & PAGE_MASK)) /* - * Bits 0 to 5 are taken, split up the 26 bits of offset + * Bits 0, 1, 3 to 5 are taken, split up the 27 bits of offset * into this range: */ -#define PTE_FILE_MAX_BITS 26 +#define PTE_FILE_MAX_BITS 27 -#define pte_to_pgoff(pte) (pte_val(pte) >> 6) +#define pte_to_pgoff(pte) (((pte_val(pte) >> 6) << 1) | ((pte_val(pte) >> 2) & 0x1)) #define pte_to_pgprot(pte) \ __pgprot((pte_val(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \ | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \ (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED) #define pgoff_prot_to_pte(off, prot) \ - ((pte_t) { ((off) << 6) + \ -(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE }) + __pteoff) >> 1) << 6) + (((off) & 0x1) << 2) + \ +(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE) #endif _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 07/39] uml: fault handler micro-cleanups
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Avoid chomping low bits of address for functions doing it by themselves, fix whitespace, add a correctness checking. I did this for remap-file-pages protection support, it was useful on its own too. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/um/kernel/trap_kern.c | 28 +++-- 1 files changed, 13 insertions(+), 15 deletions(-) diff -puN arch/um/kernel/trap_kern.c~uml-fault-handler-changes arch/um/kernel/trap_kern.c --- linux-2.6.git/arch/um/kernel/trap_kern.c~uml-fault-handler-changes 2005-08-11 11:18:03.0 +0200 +++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c 2005-08-11 11:19:56.0 +0200 @@ -26,6 +26,7 @@ #include "mem.h" #include "mem_kern.h" +/* Note this is constrained to return 0, -EFAULT, -EACCESS, -ENOMEM by segv(). */ int handle_page_fault(unsigned long address, unsigned long ip, int is_write, int is_user, int *code_out) { @@ -35,7 +36,6 @@ int handle_page_fault(unsigned long addr pud_t *pud; pmd_t *pmd; pte_t *pte; - unsigned long page; int err = -EFAULT; *code_out = SEGV_MAPERR; @@ -52,7 +52,7 @@ int handle_page_fault(unsigned long addr else if(expand_stack(vma, address)) goto out; - good_area: +good_area: *code_out = SEGV_ACCERR; if(is_write && !(vma->vm_flags & VM_WRITE)) goto out; @@ -60,9 +60,8 @@ int handle_page_fault(unsigned long addr if(!(vma->vm_flags & (VM_READ | VM_EXEC))) goto out; - page = address & PAGE_MASK; do { - survive: +survive: switch (handle_mm_fault(mm, vma, address, is_write)){ case VM_FAULT_MINOR: current->min_flt++; @@ -79,16 +78,16 @@ int handle_page_fault(unsigned long addr default: BUG(); } - pgd = pgd_offset(mm, page); - pud = pud_offset(pgd, page); - pmd = pmd_offset(pud, page); - pte = pte_offset_kernel(pmd, page); + pgd = pgd_offset(mm, address); + pud = pud_offset(pgd, address); + pmd = pmd_offset(pud, address); + pte = pte_offset_kernel(pmd, address); } while(!pte_present(*pte)); err = 0; *pte = pte_mkyoung(*pte); if(pte_write(*pte)) *pte = pte_mkdirty(*pte); - flush_tlb_page(vma, page); - out: + flush_tlb_page(vma, address); +out: up_read(&mm->mmap_sem); return(err); @@ -144,19 +143,18 @@ unsigned long segv(struct faultinfo fi, panic("Kernel mode fault at addr 0x%lx, ip 0x%lx", address, ip); - if(err == -EACCES){ + if (err == -EACCES) { si.si_signo = SIGBUS; si.si_errno = 0; si.si_code = BUS_ADRERR; si.si_addr = (void *)address; current->thread.arch.faultinfo = fi; force_sig_info(SIGBUS, &si, current); - } - else if(err == -ENOMEM){ + } else if (err == -ENOMEM) { printk("VM: killing process %s\n", current->comm); do_exit(SIGKILL); - } - else { + } else { + BUG_ON(err != -EFAULT); si.si_signo = SIGSEGV; si.si_addr = (void *) address; current->thread.arch.faultinfo = fi; _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 20/39] remap_file_pages protection support: optimize install_file_pte for MAP_POPULATE
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Add an optimization to install_file_pte: if the VMA is uniform, and the PTE was null, it will be installed correctly if needed at fault time - we avoid thus touching the page tables, but we must still do the walk... I'd like to avoid doing the walk altogether, when detecting that the VMA is uniform. I'm wondering why should the PTE have a wrong value? It could be a pte_file PTE installed by a previous MAP_POPULATE or remap_file_pages call with MAP_NONBLOCK, but that would either have been zapped (if we're handling MAP_POPULATE) or it would be correct (if called by remap_file_pages, which is unlikely since we're in a uniform VMA). The protections must be correct, or we'd detect it by seeing VM_NONUNIFORM, and the offset too, otherwise we'd see VM_NONLINEAR. Thus it's just used for MAP_POPULATE|MAP_NONBLOCK. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/fremap.c |9 + 1 files changed, 9 insertions(+) diff -puN mm/fremap.c~rfp-linear-optim-v2 mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-linear-optim-v2 2005-08-11 22:46:58.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 22:57:49.0 +0200 @@ -121,6 +121,9 @@ int install_file_pte(struct mm_struct *m pud_t *pud; pgd_t *pgd; pte_t pte_val; + int uniform = !(vma->vm_flags & (VM_NONUNIFORM | VM_NONLINEAR)); + + BUG_ON(!uniform && !(vma->vm_flags & VM_SHARED)); pgd = pgd_offset(mm, addr); spin_lock(&mm->page_table_lock); @@ -136,6 +139,12 @@ int install_file_pte(struct mm_struct *m pte = pte_alloc_map(mm, pmd, addr); if (!pte) goto err_unlock; + /* +* Skip uniform non-existent ptes: +*/ + err = 0; + if (uniform && pte_none(*pte)) + goto err_unlock; zap_pte(mm, vma, addr, pte); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 01/39] comment typo fix
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> smp_entry_t -> swap_entry_t Too short changelog entry? Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/linux/swapops.h |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff -puN include/linux/swapops.h~fix-typo include/linux/swapops.h --- linux-2.6.git/include/linux/swapops.h~fix-typo 2005-08-11 11:12:23.0 +0200 +++ linux-2.6.git-paolo/include/linux/swapops.h 2005-08-11 11:12:24.0 +0200 @@ -4,7 +4,7 @@ * the low-order bits. * * We arrange the `type' and `offset' fields so that `type' is at the five - * high-order bits of the smp_entry_t and `offset' is right-aligned in the + * high-order bits of the swap_entry_t and `offset' is right-aligned in the * remaining bits. * * swp_entry_t's are *never* stored anywhere in their arch-dependent format. _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 08/39] remap_file_pages protection support: uml bits
Update pte encoding macros for UML. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-um/pgtable-2level.h | 15 ++ linux-2.6.git-paolo/include/asm-um/pgtable-3level.h | 21 +++- 2 files changed, 27 insertions(+), 9 deletions(-) diff -puN include/asm-um/pgtable-2level.h~rfp-arch-uml include/asm-um/pgtable-2level.h --- linux-2.6.git/include/asm-um/pgtable-2level.h~rfp-arch-uml 2005-08-11 11:23:21.0 +0200 +++ linux-2.6.git-paolo/include/asm-um/pgtable-2level.h 2005-08-11 11:23:21.0 +0200 @@ -72,12 +72,19 @@ static inline void set_pte(pte_t *pteptr ((unsigned long) __va(pmd_val(pmd) & PAGE_MASK)) /* - * Bits 0 through 3 are taken + * Bits 0 to 5 are taken, split up the 26 bits of offset + * into this range: */ -#define PTE_FILE_MAX_BITS 28 +#define PTE_FILE_MAX_BITS 26 -#define pte_to_pgoff(pte) (pte_val(pte) >> 4) +#define pte_to_pgoff(pte) (pte_val(pte) >> 6) +#define pte_to_pgprot(pte) \ + __pgprot((pte_val(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \ + | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \ + (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED) -#define pgoff_to_pte(off) ((pte_t) { ((off) << 4) + _PAGE_FILE }) +#define pgoff_prot_to_pte(off, prot) \ + ((pte_t) { ((off) << 6) + \ +(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE }) #endif diff -puN include/asm-um/pgtable-3level.h~rfp-arch-uml include/asm-um/pgtable-3level.h --- linux-2.6.git/include/asm-um/pgtable-3level.h~rfp-arch-uml 2005-08-11 11:23:21.0 +0200 +++ linux-2.6.git-paolo/include/asm-um/pgtable-3level.h 2005-08-11 11:23:21.0 +0200 @@ -140,25 +140,36 @@ static inline pmd_t pfn_pmd(pfn_t page_n } /* - * Bits 0 through 3 are taken in the low part of the pte, + * Bits 0 through 5 are taken in the low part of the pte, * put the 32 bits of offset into the high part. */ #define PTE_FILE_MAX_BITS 32 + #ifdef CONFIG_64BIT #define pte_to_pgoff(p) ((p).pte >> 32) - -#define pgoff_to_pte(off) ((pte_t) { ((off) << 32) | _PAGE_FILE }) +#define pgoff_to_pte(off) ((pte_t) { ((off) << 32) | _PAGE_FILE | \ + (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) }) +#define pte_flags(pte) pte_val(pte) #else #define pte_to_pgoff(pte) ((pte).pte_high) - -#define pgoff_to_pte(off) ((pte_t) { _PAGE_FILE, (off) }) +#define pgoff_prot_to_pte(off, prot) ((pte_t) { \ + (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) | _PAGE_FILE, \ + (off) }) +/* Don't use pte_val below, useless to join the two halves */ +#define pte_flags(pte) ((pte).pte_low) #endif +#define pte_to_pgprot(pte) \ + __pgprot((pte_flags(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \ + | ((pte_flags(pte) & _PAGE_PROTNONE) ? 0 : \ + (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED) +#undef pte_flags + #endif /* _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 05/39] remove stale comment from swapfile.c
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Seems like on 2.4.9.4 this comment got out of sync ;-) I'm not completely sure on which basis we don't need any more to do as the comment suggests, but it seems that when faulting in a second time the same swap page, can_share_swap_page() returns false, and we do an early COW break, so there's no need to write-protect the page. No idea why we don't defer the COW break. Reference commit from GIT version of BKCVS history: 5ee46c7964de4b1969fc5be036167eb2da0de4e2, BKRev 3c603c81PtWl2I1NnVuphvsItrD1hg (v2.4.9.3 -> v2.4.9.4). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/swapfile.c |5 + 1 files changed, 1 insertion(+), 4 deletions(-) diff -puN mm/swapfile.c~remove-stale-comment-swap-file mm/swapfile.c --- linux-2.6.git/mm/swapfile.c~remove-stale-comment-swap-file 2005-08-11 11:13:18.0 +0200 +++ linux-2.6.git-paolo/mm/swapfile.c 2005-08-11 11:13:18.0 +0200 @@ -388,10 +388,7 @@ void free_swap_and_cache(swp_entry_t ent } /* - * Always set the resulting pte to be nowrite (the same as COW pages - * after one process has exited). We don't know just how many PTEs will - * share this swap entry, so be cautious and let do_wp_page work out - * what to do if a write is requested later. + * Since we're swapping it in, we mark it as old. * * vma->vm_mm->page_table_lock is held. */ _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 13/39] remap_file_pages protection support: support private vma for MAP_POPULATE
From: Ingo Molnar <[EMAIL PROTECTED]> If we're not rearranging pages, support even PRIVATE vma. This is needed to make MAP_POPULATE|MAP_PRIVATE to work, since it calls remap_file_pages. Notes from: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> We don't support private VMA because when they're swapped out we need to store the swap entry in the PTE, not the file offset and protections; so, I suppose that with remap-file-pages-prot, we must punt on private VMA even when we're just changing protections. This change is in a separate patch. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/fremap.c | 55 linux-2.6.git-paolo/mm/mmap.c |4 ++ 2 files changed, 38 insertions(+), 21 deletions(-) diff -puN mm/fremap.c~rfp-private-vma mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-private-vma 2005-08-11 23:02:45.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:02:45.0 +0200 @@ -218,34 +218,47 @@ retry: goto out_unlock; if (((prot & PROT_EXEC) && !(vma->vm_flags & VM_MAYEXEC))) goto out_unlock; + err = -EINVAL; pgprot = protection_map[calc_vm_prot_bits(prot) | VM_SHARED]; } else pgprot = vma->vm_page_prot; - if ((vma->vm_flags & VM_SHARED) && - (!vma->vm_private_data || - (vma->vm_flags & (VM_NONLINEAR|VM_RESERVED))) && - vma->vm_ops && vma->vm_ops->populate && - end > start && start >= vma->vm_start && - end <= vma->vm_end) { + if (!vma->vm_ops || !vma->vm_ops->populate || end <= start || start < + vma->vm_start || end > vma->vm_end) + goto out_unlock; + + if (!vma->vm_private_data || + (vma->vm_flags & (VM_NONLINEAR|VM_RESERVED))) { /* Must set VM_NONLINEAR before any pages are populated. */ - if (pgoff != linear_page_index(vma, start) && - !(vma->vm_flags & VM_NONLINEAR)) { - if (!has_write_lock) { - up_read(&mm->mmap_sem); - down_write(&mm->mmap_sem); - has_write_lock = 1; - goto retry; + if (pgoff != linear_page_index(vma, start)) { + if (!(vma->vm_flags & VM_NONLINEAR)) { + if (!has_write_lock) { + up_read(&mm->mmap_sem); + down_write(&mm->mmap_sem); + has_write_lock = 1; + goto retry; + } + /* XXX: we check VM_SHARED after re-getting the +* (write) semaphore but I guess that we could +* check it earlier as we're not allowed to turn +* a VM_PRIVATE vma into a VM_SHARED one! */ + if (!(vma->vm_flags & VM_SHARED)) + goto out_unlock; + + mapping = vma->vm_file->f_mapping; + spin_lock(&mapping->i_mmap_lock); + flush_dcache_mmap_lock(mapping); + vma->vm_flags |= VM_NONLINEAR; + vma_prio_tree_remove(vma, &mapping->i_mmap); + vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear); + flush_dcache_mmap_unlock(mapping); + spin_unlock(&mapping->i_mmap_lock); + } else { + /* Won't drop the lock, check it here.*/ + if (!(vma->vm_flags & VM_SHARED)) + goto out_unlock; } - mapping = vma->vm_file->f_mapping; - spin_lock(&mapping->i_mmap_lock); - flush_dcache_mmap_lock(mapping); - vma->vm_flags |= VM_NONLINEAR; - vma_prio_tree_remove(vma, &mapping->i_mmap); - vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear); - flush_dcache_mmap_unlock(mapping); - spin_unlock(&mappi
[patch 16/39] remap_file_pages protection support: readd lock downgrading
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Even now, we'll sometimes take the write lock. So, in that case, we could downgrade it; after a tiny bit of thought, I've choosen doing that when we'll either do any I/O or we'll alter a lot of PTEs. About how much "a lot" is, I've copied the values from this code in mm/memory.c: #ifdef CONFIG_PREEMPT # define ZAP_BLOCK_SIZE (8 * PAGE_SIZE) #else /* No preempt: go for improved straight-line efficiency */ # define ZAP_BLOCK_SIZE (1024 * PAGE_SIZE) #endif I'm not sure about the trade-offs - we used to have a down_write, now we have a down_read() and a possible up_read()down_write(), and with this patch, the fast-path still takes only down_read, but the slow path will do down_read(), down_write(), downgrade_write(). This will increase the number of atomic operation but increase concurrency wrt mmap and similar operations - I don't know how much contention there is on that lock. Also, drop a bust comment: we cannot clear VM_NONLINEAR simply because code elsewhere is going to use it. At the very least, madvise_dontneed() relies on that flag being set (remaining non-linear truncation read the mapping list), but the list is probably longer and going to increase in the next patches of this series. Just in case this wasn't clear: this patch is not strictly related to protection support, I was just too lazy to move it up in the hierarchy. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/fremap.c | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff -puN mm/fremap.c~rfp-downgrade-lock mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-downgrade-lock2005-08-11 23:04:39.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:04:39.0 +0200 @@ -152,6 +152,13 @@ err_unlock: } +#ifdef CONFIG_PREEMPT +# define INSTALL_SIZE (8 * PAGE_SIZE) +#else +/* No preempt: go for improved straight-line efficiency */ +# define INSTALL_SIZE (1024 * PAGE_SIZE) +#endif + /*** * sys_remap_file_pages - remap arbitrary pages of a shared backing store *file within an existing vma. @@ -266,14 +273,15 @@ retry: } } + /* Do NOT hold the write lock while doing any I/O, nor when +* iterating over too many PTEs. Values might need tuning. */ + if (has_write_lock && (!(flags & MAP_NONBLOCK) || size > INSTALL_SIZE)) { + downgrade_write(&mm->mmap_sem); + has_write_lock = 0; + } err = vma->vm_ops->populate(vma, start, size, pgprot, pgoff, flags & MAP_NONBLOCK); - /* -* We can't clear VM_NONLINEAR because we'd have to do -* it after ->populate completes, and that would prevent -* downgrading the lock. (Locks can't be upgraded). -*/ } out_unlock: _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 02/39] shmem_populate: avoid an useless check, and some comments
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Either shmem_getpage returns a failure, or it found a page, or it was told it couldn't do any I/O. So it's useless to check nonblock in the else branch. We could add a BUG() there but I preferred to comment the offending function. This was taken out from one Ingo Molnar's old patch I'm resurrecting. References: commit b103e8b204b317d52834671d5f09db95645523c2 of old-2.6-bkcvs, pointing to BKrev: 3f5ed0c1llm6NnNwNXtPv-Z0IYzkwA Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/filemap.c |7 +++ linux-2.6.git-paolo/mm/shmem.c |6 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff -puN mm/shmem.c~mm-populate-optim-comment mm/shmem.c --- linux-2.6.git/mm/shmem.c~mm-populate-optim-comment 2005-08-11 11:12:39.0 +0200 +++ linux-2.6.git-paolo/mm/shmem.c 2005-08-11 11:12:39.0 +0200 @@ -1195,6 +1195,7 @@ static int shmem_populate(struct vm_area err = shmem_getpage(inode, pgoff, &page, sgp, NULL); if (err) return err; + /* Page may still be null, but only if nonblock was set. */ if (page) { mark_page_accessed(page); err = install_page(mm, vma, addr, page, prot); @@ -1202,7 +1203,10 @@ static int shmem_populate(struct vm_area page_cache_release(page); return err; } - } else if (nonblock) { + } else { + /* No page was found just because we can't read it in +* now (being here implies nonblock != 0), but the page +* may exist, so set the PTE to fault it in later. */ err = install_file_pte(mm, vma, addr, pgoff, prot); if (err) return err; diff -puN mm/filemap.c~mm-populate-optim-comment mm/filemap.c --- linux-2.6.git/mm/filemap.c~mm-populate-optim-comment2005-08-11 11:12:39.0 +0200 +++ linux-2.6.git-paolo/mm/filemap.c2005-08-11 11:12:39.0 +0200 @@ -1505,8 +1505,12 @@ repeat: return -EINVAL; page = filemap_getpage(file, pgoff, nonblock); + + /* XXX: This is wrong, a filesystem I/O error may have happened. Fix that as +* done in shmem_populate calling shmem_getpage */ if (!page && !nonblock) return -ENOMEM; + if (page) { err = install_page(mm, vma, addr, page, prot); if (err) { @@ -1514,6 +1518,9 @@ repeat: return err; } } else { + /* No page was found just because we can't read it in now (being +* here implies nonblock != 0), but the page may exist, so set +* the PTE to fault it in later. */ err = install_file_pte(mm, vma, addr, pgoff, prot); if (err) return err; _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 18/39] remap_file_pages protection support: add VM_FAULT_SIGSEGV
From: Ingo Molnar <[EMAIL PROTECTED]> Since with remap_file_pages w/prot we may put PROT_NONE on a single PTE rather than a VMA, we must handle that inside handle_mm_fault. This value must be handled in the arch-specific fault handlers, and this change must be ported to every arch on the world; now the new support is not in a separate syscall, so this *must* be done unless we want stability / security issues (the *BUG()* for unknown return values of handle_mm_fault() is triggerable from userspace calling remap_file_pages, and on other archs, we have VM_FAULT_OOM which is worse). However, I've alleviated this need via the previous "safety net" patch. This patch includes the arch-specific part for i386. Note, however, that _proper_ support is more intrusive; we can allow a write on a readonly VMA, but the arch fault handler currently stops that; it should test VM_NONUNIFORM instead and call handle_mm_fault() in case it's set. And it will have to do on its own all protection checks. This is in the following patches. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/i386/mm/fault.c |2 ++ linux-2.6.git-paolo/include/linux/mm.h |9 + linux-2.6.git-paolo/mm/memory.c | 12 3 files changed, 19 insertions(+), 4 deletions(-) diff -puN arch/i386/mm/fault.c~rfp-add-vm_fault_sigsegv arch/i386/mm/fault.c --- linux-2.6.git/arch/i386/mm/fault.c~rfp-add-vm_fault_sigsegv 2005-08-11 14:19:57.0 +0200 +++ linux-2.6.git-paolo/arch/i386/mm/fault.c2005-08-11 14:19:58.0 +0200 @@ -351,6 +351,8 @@ good_area: goto do_sigbus; case VM_FAULT_OOM: goto out_of_memory; + case VM_FAULT_SIGSEGV: + goto bad_area; default: BUG(); } diff -puN include/linux/mm.h~rfp-add-vm_fault_sigsegv include/linux/mm.h --- linux-2.6.git/include/linux/mm.h~rfp-add-vm_fault_sigsegv 2005-08-11 14:19:58.0 +0200 +++ linux-2.6.git-paolo/include/linux/mm.h 2005-08-11 14:19:58.0 +0200 @@ -632,10 +632,11 @@ static inline int page_mapped(struct pag * Used to decide whether a process gets delivered SIGBUS or * just gets major/minor fault counters bumped up. */ -#define VM_FAULT_OOM (-1) -#define VM_FAULT_SIGBUS0 -#define VM_FAULT_MINOR 1 -#define VM_FAULT_MAJOR 2 +#define VM_FAULT_OOM (-1) +#define VM_FAULT_SIGBUS0 +#define VM_FAULT_MINOR 1 +#define VM_FAULT_MAJOR 2 +#define VM_FAULT_SIGSEGV 3 #define offset_in_page(p) ((unsigned long)(p) & ~PAGE_MASK) diff -puN mm/memory.c~rfp-add-vm_fault_sigsegv mm/memory.c --- linux-2.6.git/mm/memory.c~rfp-add-vm_fault_sigsegv 2005-08-11 14:19:58.0 +0200 +++ linux-2.6.git-paolo/mm/memory.c 2005-08-11 14:19:58.0 +0200 @@ -1995,6 +1995,18 @@ static inline int handle_pte_fault(struc return do_swap_page(mm, vma, address, pte, pmd, entry, write_access); } + /* +* Generate a SIGSEGV if a PROT_NONE page is accessed; this is handled +* in arch-specific code if the whole VMA has PROT_NONE, and here if +* just this pte has PROT_NONE (which can be done only with +* remap_file_pages). +*/ + if (pgprot_val(pte_to_pgprot(entry)) == pgprot_val(__P000)) { + pte_unmap(pte); + spin_unlock(&mm->page_table_lock); + return VM_FAULT_SIGSEGV; + } + if (write_access) { if (!pte_write(entry)) return do_wp_page(mm, vma, address, pte, pmd, entry); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] [patch 0/39] remap_file_pages protection support, try 2
Ok, I've been working for the past two weeks learning well the Linux VM, understanding the Ingo's remap_file_pages protection support and its various weakness (due to lack of time on his part), and splitting and finishing it. Here follow a series of 39 _little_ patches against the git-commit-id 889371f61fd5bb914d0331268f12432590cf7e85, which means between 2.6.13-rc4 and -rc5. Actually, the first 7 ones are unrelated trivial cleanups which somehow get in the way on this work and that can probably be merged even now (many are just comment fixes). Since I was a VM newbie until two weeks ago, I've separated my changes into many little patches. To avoid the noise, I'm CC:ing many people only on this message, while I'm sending the full patch series only to akpm, mingo and LKML. Or actually, I'm trying - my provider seem not to like me sending so many patches. I attached an exported tarball to this mail, since it's very little. I hope these changes can be included inside -mm, but I guess that they'll probably conflict with pagefault scalability patches, and that some of them are not completely polished. Still, the patch is IMHO in better shape, in many ways, than when it was in -mm last time. I'll appreciate any comments. == Changes from 2.6.5-mm1/dropped version of the patches: == *) Actually implemented _real_ and _anal_ protection support, safe against swapout; programs get SIGSEGV *always* when they should. I've used the attached test program (an improved version of Ingo's one) to check that. I tested just until patch 25, onto UML. The subsequent ones are either patches for foreign archs or proposed *) Fixed many changes present in the patches. *) Fixed UML bits *) Added several headaches for arches ports. I've also included some patches which reduce this *) No more usage of a new syscall slot: to use the new interface, application will use the new MAP_NOINHERIT flag I've added. I've still the patches to use the old -mm ABI, if there's any reason they're needed. *) Fixed a regression wrt using mprotect() against remapped area (see patch 15) == Still to do: == *) fix mprotect VS remap_file_pages(MAP_NOINHERIT) interaction - see long discussion in patch 15 changelog *) ->populate flushes each TLB individually, instead of using mmu_gathers as it should; this was suggested even by Ingo when sending the patch, but it seems he didn't get the time to finish this. Seems rewriting the kernel locking is a quite time-consuming task! == Patch summaries == Each patch has an attached changelog, but I'm giving here a summary (sorry for using the patch numbers, but I found no other way). The first 7 are just generic cleanups (mostly for comments) which bugged me along the way, however some of them are needed for the subsequent patches to apply. 08-11 ones are arch bits for some arches (the ones I have access too). 12 is the core change for generic code, 13-17 are various changes to the syscall code, as 20, 21 and 23, 35 and 36, to review individually. Most of those changes (except #23, which is a fix for try_to_unmap_one I missed initially) are just speedups, and it should be possible to individually drop them. 18, 19, 22, 32, 33, 34 move partially the handling of protection checks from the arches' page faults handler to the generic code, by introducing VM_FAULT_SIGSEGV. In fact, the VMA protection are not reliable for VM_NONUNIFORM areas. This aspect was just begun in Ingo's code, and was the weakest area of his patch. I must now pass the *full* kind of fault to the generic code, and test it against the PTE or possibly the VMA protections. However, in these patches it's done in a kludgy way, because we check the VMA protections against VM_READ/WRITE/EXEC with no consideration of the architecture-specific dependencies between them (like READ_IMPLIES_EXEC and so on), so arches have to workaround this. This is fixed in patch 33, which is untested however. 24 and 25 are some fixes for UML code, needed to make it work even with this change. 26-31 are other arch's compile fix for the introduction of pte_to_pgoff. The last three ones (37-39) are not to apply - they are some possible changes I'm either really uncertain about, or which I'm sure are wrong in that form but express possibly correct ideas. 36 should be a fixed version of the #37 one, but I wrote it in the past few minutes. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade fremap-prot-complete-broken-out.tar.bz2 Description: application/tbz fremap-test-complete.c.bz2 Description: BZip2 compressed data
Re: [patch 1/3] uml: share page bits handling between 2 and 3 level pagetables
On Saturday 30 July 2005 18:02, Jeff Dike wrote: > On Thu, Jul 28, 2005 at 08:56:53PM +0200, [EMAIL PROTECTED] wrote: > > As obvious, a "core code nice cleanup" is not a "stability-friendly > > patch" so usual care applies. > These look reasonable, as they are what we discussed in Ottawa. > I'll put them in my tree and see if I see any problems. I would > suggest sending these in early after 2.6.13 if they seem OK. I've discovered that we're not the only one to miss dirty / accessed "hardware" bits: see include/asm-alpha/pgtable.h (they don't have the accessed bit). So maybe we could drop the "fault-on-access" thing. Also, note the comment before handle_pte_fault: /* * These routines also need to handle stuff like marking pages dirty * and/or accessed for architectures that don't do it in hardware (most * RISC architectures). The early dirtying is also good on the i386. */ I'm not able to find where we clean the dirty bit on a pte, however it's not only done by pte_mkclean, there are some macros like ptep_clear... in asm-generic/pgtable.h -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 27/39] remap_file_pages protection support: fixups to ppc32 bits
From: Paul Mackerras <[EMAIL PROTECTED]> When I tried -mm4 on a ppc32 box, it hit a BUG because I hadn't excluded _PAGE_FILE from the bits used for swap entries. While looking at that I realised that the pte_to_pgoff and pgoff_prot_to_pte macros were wrong for 4xx and 8xx (embedded) PPC chips, since they use Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-ppc/pgtable.h | 48 +- 1 files changed, 39 insertions(+), 9 deletions(-) diff -puN include/asm-ppc/pgtable.h~rfp-arch-ppc32-pgtable-fixes include/asm-ppc/pgtable.h --- linux-2.6.git/include/asm-ppc/pgtable.h~rfp-arch-ppc32-pgtable-fixes 2005-08-12 18:18:44.0 +0200 +++ linux-2.6.git-paolo/include/asm-ppc/pgtable.h 2005-08-12 18:18:44.0 +0200 @@ -205,6 +205,7 @@ extern unsigned long ioremap_bot, iorema */ #define _PAGE_PRESENT 0x0001 /* S: PTE valid */ #define_PAGE_RW0x0002 /* S: Write permission */ +#define _PAGE_FILE 0x0004 /* S: nonlinear file mapping */ #define_PAGE_DIRTY 0x0004 /* S: Page dirty */ #define _PAGE_ACCESSED 0x0008 /* S: Page referenced */ #define _PAGE_HWWRITE 0x0010 /* H: Dirty & RW */ @@ -213,7 +214,6 @@ extern unsigned long ioremap_bot, iorema #define_PAGE_ENDIAN0x0080 /* H: E bit */ #define_PAGE_GUARDED 0x0100 /* H: G bit */ #define_PAGE_COHERENT 0x0200 /* H: M bit */ -#define _PAGE_FILE 0x0400 /* S: nonlinear file mapping */ #define_PAGE_NO_CACHE 0x0400 /* H: I bit */ #define_PAGE_WRITETHRU 0x0800 /* H: W bit */ @@ -724,20 +724,50 @@ extern void paging_init(void); #define __swp_type(entry) ((entry).val & 0x1f) #define __swp_offset(entry)((entry).val >> 5) #define __swp_entry(type, offset) ((swp_entry_t) { (type) | ((offset) << 5) }) + +#if defined(CONFIG_4xx) || defined(CONFIG_8xx) +/* _PAGE_FILE and _PAGE_PRESENT are in the bottom 3 bits on all these chips. */ #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) >> 3 }) #define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 }) +#else /* Classic PPC */ +#define __pte_to_swp_entry(pte)\ +((swp_entry_t) { ((pte_val(pte) >> 3) & ~1) | ((pte_val(pte) >> 2) & 1) }) +#define __swp_entry_to_pte(x) \ +((pte_t) { (((x).val & ~1) << 3) | (((x).val & 1) << 2) }) +#endif /* Encode and decode a nonlinear file mapping entry */ -#define PTE_FILE_MAX_BITS 27 -#define pte_to_pgoff(pte) (((pte_val(pte) & ~0x7ff) >> 5) \ -| ((pte_val(pte) & 0x3f0) >> 4)) -#define pte_to_pgprot(pte) \ -__pgprot((pte_val(pte) & (_PAGE_USER|_PAGE_RW|_PAGE_PRESENT)) | _PAGE_ACCESSED) +/* We can't use any the _PAGE_PRESENT, _PAGE_FILE, _PAGE_USER, _PAGE_RW, + or _PAGE_HASHPTE bits for storing a page offset. */ +#if defined(CONFIG_40x) +/* 40x, avoid the 0x53 bits - to simplify things, avoid 0x73 */ */ +#define __pgoff_split(x) x) << 5) & ~0x7f) | (((x) << 2) & 0xc)) +#define __pgoff_glue(x)x) & ~0x7f) >> 5) | (((x) & 0xc) >> 2)) +#elif defined(CONFIG_44x) +/* 44x, avoid the 0x47 bits */ +#define __pgoff_split(x) x) << 4) & ~0x7f) | (((x) << 3) & 0x38)) +#define __pgoff_glue(x)x) & ~0x7f) >> 4) | (((x) & 0x38) >> 3)) +#elif defined(CONFIG_8xx) +/* 8xx, avoid the 0x843 bits */ +#define __pgoff_split(x) x) << 4) & ~0xfff) | (((x) << 3) & 0x780) \ +| (((x) << 2) & 0x3c)) +#define __pgoff_glue(x)x) & ~0xfff) >> 4) | (((x) & 0x780) >> 3))\ +| (((x) & 0x3c) >> 2)) +#else +/* classic PPC, avoid the 0x40f bits */ +#define __pgoff_split(x) x) << 5) & ~0x7ff) | (((x) << 4) & 0x3f0)) +#define __pgoff_glue(x)x) & ~0x7ff) >> 5) | (((x) & 0x3f0) >> 4)) +#endif +#define PTE_FILE_MAX_BITS 27 +#define pte_to_pgoff(pte) __pgoff_glue(pte_val(pte)) #define pgoff_prot_to_pte(off, prot) \ - ((pte_t) { (((off) << 5) & ~0x7ff) | (((off) << 4) & 0x3f0) \ - | (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) \ - | _PAGE_FILE }) + ((pte_t) { __pgoff_split(off) | _PAGE_FILE |\ + (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) }) + +#de
[patch 24/39] remap_file_pages protection support: adapt to uml peculiarities
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Uml is particular in respect with other architectures (and possibly this is to fix) in the fact that our arch fault handler handles indifferently both TLB and page faults. In particular, we may get to call handle_mm_fault() when the PTE is already correct, but simply it's not flushed. And rfp-fault-sigsegv-2 breaks this, because when getting a fault on a pte_present PTE and non-uniform VMA, it assumes the fault is due to a protection fault, and signals the caller a SIGSEGV must be sent. This isn't the final fix for UML, that's the next one. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/um/kernel/trap_kern.c | 19 +++ 1 files changed, 15 insertions(+), 4 deletions(-) diff -puN arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-3 arch/um/kernel/trap_kern.c --- linux-2.6.git/arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-3 2005-08-11 23:13:06.0 +0200 +++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c 2005-08-11 23:14:26.0 +0200 @@ -75,8 +75,21 @@ handle_fault: err = -EACCES; goto out; case VM_FAULT_SIGSEGV: - err = -EFAULT; - goto out; + /* Duplicate this code here. */ + pgd = pgd_offset(mm, address); + pud = pud_offset(pgd, address); + pmd = pmd_offset(pud, address); + pte = pte_offset_kernel(pmd, address); + if (likely (pte_newpage(*pte) || pte_newprot(*pte))) { + /* This wasn't done by __handle_mm_fault(), and +* the page hadn't been flushed. */ + *pte = pte_mkyoung(*pte); + if(pte_write(*pte)) *pte = pte_mkdirty(*pte); + break; + } else { + err = -EFAULT; + goto out; + } case VM_FAULT_OOM: err = -ENOMEM; goto out_of_memory; @@ -89,8 +102,6 @@ handle_fault: pte = pte_offset_kernel(pmd, address); } while(!pte_present(*pte)); err = 0; - *pte = pte_mkyoung(*pte); - if(pte_write(*pte)) *pte = pte_mkdirty(*pte); flush_tlb_page(vma, address); /* If the PTE is not present, the vma protection are not accurate if _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 34/39] remap_file_pages protection support: restrict permission testing
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Yet to test. Currently we install a PTE when one is missing irrispective of the fault type, and if the access type is prohibited we'll get another fault and kill the process only then. With this, we check the access type on the 1st fault. We could also use this code for testing present PTE's, if the current assumption (fault on present PTE's in VM_NONUNIFORM vma's means access violation) proves problematic for architectures other than UML (which I already fixed), but I hope it's not needed. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/memory.c | 16 1 files changed, 16 insertions(+) diff -puN mm/memory.c~rfp-fault-sigsegv-3 mm/memory.c --- linux-2.6.git/mm/memory.c~rfp-fault-sigsegv-3 2005-08-12 17:19:17.0 +0200 +++ linux-2.6.git-paolo/mm/memory.c 2005-08-12 17:19:17.0 +0200 @@ -1963,6 +1963,7 @@ static int do_file_page(struct mm_struct unsigned long pgoff; pgprot_t pgprot; int err; + pte_t test_entry; BUG_ON(!vma->vm_ops || !vma->vm_ops->nopage); /* @@ -1983,6 +1984,21 @@ static int do_file_page(struct mm_struct pgoff = pte_to_pgoff(*pte); pgprot = vma->vm_flags & VM_NONUNIFORM ? pte_to_pgprot(*pte): vma->vm_page_prot; + /* If this is not enabled, we'll get another fault after return next +* time, check we handle that one, and that this code works. */ +#if 1 + /* We just want to test pte_{read,write,exec} */ + test_entry = mk_pte(0, pgprot); + if (unlikely(vma->vm_flags & VM_NONUNIFORM) && !pte_file(*pte)) { + if ((access_mask & VM_WRITE) && !pte_write(test_entry)) + goto out_segv; + if ((access_mask & VM_READ) && !pte_read(test_entry)) + goto out_segv; + if ((access_mask & VM_EXEC) && !pte_exec(test_entry)) + goto out_segv; + } +#endif + pte_unmap(pte); spin_unlock(&mm->page_table_lock); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 26/39] remap_file_pages protection support: ppc32 bits
From: Ingo Molnar <[EMAIL PROTECTED]> PPC32 bits of RFP - as in original patch. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-ppc/pgtable.h | 15 +++ 1 files changed, 11 insertions(+), 4 deletions(-) diff -puN include/asm-ppc/pgtable.h~rfp-arch-ppc include/asm-ppc/pgtable.h --- linux-2.6.git/include/asm-ppc/pgtable.h~rfp-arch-ppc2005-08-12 18:18:43.0 +0200 +++ linux-2.6.git-paolo/include/asm-ppc/pgtable.h 2005-08-12 18:39:57.0 +0200 @@ -309,8 +309,8 @@ extern unsigned long ioremap_bot, iorema /* Definitions for 60x, 740/750, etc. */ #define _PAGE_PRESENT 0x001 /* software: pte contains a translation */ #define _PAGE_HASHPTE 0x002 /* hash_page has made an HPTE for this pte */ -#define _PAGE_FILE 0x004 /* when !present: nonlinear file mapping */ #define _PAGE_USER 0x004 /* usermode access allowed */ +#define _PAGE_FILE 0x008 /* when !present: nonlinear file mapping */ #define _PAGE_GUARDED 0x008 /* G: prohibit speculative access */ #define _PAGE_COHERENT 0x010 /* M: enforce memory coherence (SMP systems) */ #define _PAGE_NO_CACHE 0x020 /* I: cache inhibit */ @@ -728,9 +728,16 @@ extern void paging_init(void); #define __swp_entry_to_pte(x) ((pte_t) { (x).val << 3 }) /* Encode and decode a nonlinear file mapping entry */ -#define PTE_FILE_MAX_BITS 29 -#define pte_to_pgoff(pte) (pte_val(pte) >> 3) -#define pgoff_to_pte(off) ((pte_t) { ((off) << 3) | _PAGE_FILE }) +#define PTE_FILE_MAX_BITS 27 +#define pte_to_pgoff(pte) (((pte_val(pte) & ~0x7ff) >> 5) \ +| ((pte_val(pte) & 0x3f0) >> 4)) +#define pte_to_pgprot(pte) \ +__pgprot((pte_val(pte) & (_PAGE_USER|_PAGE_RW|_PAGE_PRESENT)) | _PAGE_ACCESSED) + +#define pgoff_prot_to_pte(off, prot) \ + ((pte_t) { (((off) << 5) & ~0x7ff) | (((off) << 4) & 0x3f0) \ + | (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) \ + | _PAGE_FILE }) /* CONFIG_APUS */ /* For virtual address to physical address conversion */ _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 33/39] remap_file_pages protection support: VM_FAULT_SIGSEGV permission checking rework
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Simplify the generic arch permission checking: the previous one was clumsy, as it didn't account arch-specific implications (read implies exec, write implies read, and so on). Still to undo fixes for the archs (i386 and UML) which were modified for the previous scheme. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/memory.c | 49 ++-- 1 files changed, 33 insertions(+), 16 deletions(-) diff -puN mm/memory.c~rfp-sigsegv-4 mm/memory.c --- linux-2.6.git/mm/memory.c~rfp-sigsegv-4 2005-08-12 17:18:55.0 +0200 +++ linux-2.6.git-paolo/mm/memory.c 2005-08-12 17:18:55.0 +0200 @@ -1923,6 +1923,35 @@ oom: goto out; } +static inline int check_perms(struct vm_area_struct * vma, int access_mask) { + if (unlikely(vm_flags & VM_NONUNIFORM)) { + /* we used to check protections in arch handler, but with +* VM_NONUNIFORM the check is skipped. */ +#if 0 + if ((access_mask & VM_WRITE) > (vm_flags & VM_WRITE)) + goto err; + if ((access_mask & VM_READ) > (vm_flags & VM_READ)) + goto err; + if ((access_mask & VM_EXEC) > (vm_flags & VM_EXEC)) + goto err; +#else + /* access_mask contains the type of the access, vm_flags are the +* declared protections, pte has the protection which will be +* given to the PTE's in that area. */ + //pte_t pte = pfn_pte(0UL, protection_map[vm_flags & 0x0f|VM_SHARED]); + pte_t pte = pfn_pte(0UL, vma->vm_page_prot); + if ((access_mask & VM_WRITE) && ! pte_write(pte)) + goto err; + if ((access_mask & VM_READ) && ! pte_read(pte)) + goto err; + if ((access_mask & VM_EXEC) && ! pte_exec(pte)) + goto err; +#endif + } + return 0; +err: + return -EPERM; +} /* * Fault of a previously existing named mapping. Repopulate the pte * from the encoded file_pte if possible. This enables swappable @@ -1944,14 +1973,8 @@ static int do_file_page(struct mm_struct ((access_mask & VM_WRITE) && !(vma->vm_flags & VM_SHARED))) { /* We're behaving as if pte_file was cleared, so check * protections like in handle_pte_fault. */ - if (unlikely(vma->vm_flags & VM_NONUNIFORM)) { - if ((access_mask & VM_WRITE) > (vma->vm_flags & VM_WRITE)) - goto out_segv; - if ((access_mask & VM_READ) > (vma->vm_flags & VM_READ)) - goto out_segv; - if ((access_mask & VM_EXEC) > (vma->vm_flags & VM_EXEC)) - goto out_segv; - } + if (check_perms(vma, access_mask)) + goto out_segv; pte_clear(mm, address, pte); return do_no_page(mm, vma, address, access_mask & VM_WRITE, pte, pmd); @@ -2007,14 +2030,8 @@ static inline int handle_pte_fault(struc /* when pte_file(), the VMA protections are useless. Otherwise, * we used to check protections in arch handler, but with * VM_NONUNIFORM the check is skipped. */ - if (unlikely(vma->vm_flags & VM_NONUNIFORM) && !pte_file(entry)) { - if ((access_mask & VM_WRITE) > (vma->vm_flags & VM_WRITE)) - goto out_segv; - if ((access_mask & VM_READ) > (vma->vm_flags & VM_READ)) - goto out_segv; - if ((access_mask & VM_EXEC) > (vma->vm_flags & VM_EXEC)) - goto out_segv; - } + if (!pte_file(entry) && check_perms(vma, access_mask)) + goto out_segv; /* * If it truly wasn't present, we know that kswapd _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 30/39] remap_file_pages protection support: ia64 bits
From: Ingo Molnar <[EMAIL PROTECTED]> I've attached a 'blind' port of the prot bits of fremap to ia64. I've compiled it with a cross-compiler but otherwise it's untested. (and it's very likely i got the pte bits wrong - but it's roughly OK.) This should at least make ia64 compile. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-ia64/pgtable.h | 17 + 1 files changed, 13 insertions(+), 4 deletions(-) diff -puN include/asm-ia64/pgtable.h~rfp-arch-ia64 include/asm-ia64/pgtable.h --- linux-2.6.git/include/asm-ia64/pgtable.h~rfp-arch-ia64 2005-08-12 19:27:03.0 +0200 +++ linux-2.6.git-paolo/include/asm-ia64/pgtable.h 2005-08-12 19:27:03.0 +0200 @@ -433,7 +433,8 @@ extern void paging_init (void); * Format of file pte: * bit 0 : present bit (must be zero) * bit 1 : _PAGE_FILE (must be one) - * bits 2-62: file_offset/PAGE_SIZE + * bit 2 : _PAGE_AR_RW + * bits 3-62: file_offset/PAGE_SIZE * bit 63 : _PAGE_PROTNONE bit */ #define __swp_type(entry) (((entry).val >> 2) & 0x7f) @@ -442,9 +443,17 @@ extern void paging_init (void); #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) }) #define __swp_entry_to_pte(x) ((pte_t) { (x).val }) -#define PTE_FILE_MAX_BITS 61 -#define pte_to_pgoff(pte) ((pte_val(pte) << 1) >> 3) -#define pgoff_to_pte(off) ((pte_t) { ((off) << 2) | _PAGE_FILE }) +#define PTE_FILE_MAX_BITS 59 +#define pte_to_pgoff(pte) ((pte_val(pte) << 1) >> 4) + +#define pte_to_pgprot(pte) \ + __pgprot((pte_val(pte) & (_PAGE_AR_RW | _PAGE_PROTNONE)) \ + | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \ + (__ACCESS_BITS | _PAGE_PL_3))) + +#define pgoff_prot_to_pte(off, prot) \ + ((pte_t) { _PAGE_FILE + \ + (pgprot_val(prot) & (_PAGE_AR_RW | _PAGE_PROTNONE)) + (off) }) /* XXX is this right? */ #define io_remap_page_range(vma, vaddr, paddr, size, prot) \ _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 25/39] remap_file_pages protection support: fix unflushed TLB errors detection
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> We got unflushed PTE's marked up-to-date, because they were protected to get dirtying / accessing faults. So, don't test the PTE for being up-to-date, but check directly the permission (since the PTE is not protected for that). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/um/kernel/trap_kern.c | 28 +++-- 1 files changed, 22 insertions(+), 6 deletions(-) diff -puN arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-3-fix arch/um/kernel/trap_kern.c --- linux-2.6.git/arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-3-fix 2005-08-11 23:14:58.0 +0200 +++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c 2005-08-11 23:14:58.0 +0200 @@ -35,7 +35,7 @@ int handle_page_fault(unsigned long addr pgd_t *pgd; pud_t *pud; pmd_t *pmd; - pte_t *pte; + pte_t *pte, entry; int err = -EFAULT; int access_mask = 0; @@ -75,16 +75,32 @@ handle_fault: err = -EACCES; goto out; case VM_FAULT_SIGSEGV: + WARN_ON(!(vma->vm_flags & VM_NONUNIFORM)); /* Duplicate this code here. */ pgd = pgd_offset(mm, address); pud = pud_offset(pgd, address); pmd = pmd_offset(pud, address); pte = pte_offset_kernel(pmd, address); - if (likely (pte_newpage(*pte) || pte_newprot(*pte))) { - /* This wasn't done by __handle_mm_fault(), and -* the page hadn't been flushed. */ - *pte = pte_mkyoung(*pte); - if(pte_write(*pte)) *pte = pte_mkdirty(*pte); + if (likely (pte_newpage(*pte) || pte_newprot(*pte)) || + (is_write ? pte_write(*pte) : pte_read(*pte)) ) { + /* The page hadn't been flushed, or it had been +* flushed but without access to get a dirtying +* / accessing fault. */ + + /* __handle_mm_fault() didn't dirty / young this +* PTE, probably we won't get another fault for +* this page, so fix things now. */ + entry = *pte; + entry = pte_mkyoung(*pte); + if(pte_write(entry)) + entry = pte_mkdirty(entry); + /* Yes, this will set the page as NEWPAGE. We +* want this, otherwise things won't work. +* Indeed, the +* *pte = pte_mkyoung(*pte); +* we used to have (uselessly) didn't work at +* all! */ + set_pte(pte, entry); break; } else { err = -EFAULT; _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [uml-devel] Re: [RFC] [patch 0/39] remap_file_pages protection support, try 2
On Friday 12 August 2005 20:29, David S. Miller wrote: > Please do not BOMB linux-kernel with 39 patches in one > go, that will kill the list server. > Try to consolidate your patch groups into smaller pieces, > like so about 10 or 15 at a time. And send any that remain > on some later date. Whoops - some later date for me means a week unfortunately or some minutes. I'm trying for the latter. However, I sent the initial tarball containing all them, so I hope that will be useful. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 31/39] remap_file_pages protection support: s390 bits
From: Martin Schwidefsky <[EMAIL PROTECTED]> s390 memory management changes for remap-file-pages-prot patch: - Add pgoff_prot_to_pte/pte_to_pgprot, remove pgoff_to_pte (required for 'prot' parameteter in shared-writeable mappings). - Handle VM_FAULT_SIGSEGV from handle_mm_fault in do_exception. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/s390/mm/fault.c |2 linux-2.6.git-paolo/include/asm-s390/pgtable.h | 90 - 2 files changed, 60 insertions(+), 32 deletions(-) diff -puN arch/s390/mm/fault.c~rfp-arch-s390 arch/s390/mm/fault.c --- linux-2.6.git/arch/s390/mm/fault.c~rfp-arch-s3902005-08-12 19:27:58.0 +0200 +++ linux-2.6.git-paolo/arch/s390/mm/fault.c2005-08-12 19:27:58.0 +0200 @@ -260,6 +260,8 @@ survive: goto do_sigbus; case VM_FAULT_OOM: goto out_of_memory; + case VM_FAULT_SIGSEGV: + goto bad_area; default: BUG(); } diff -puN include/asm-s390/pgtable.h~rfp-arch-s390 include/asm-s390/pgtable.h --- linux-2.6.git/include/asm-s390/pgtable.h~rfp-arch-s390 2005-08-12 19:27:58.0 +0200 +++ linux-2.6.git-paolo/include/asm-s390/pgtable.h 2005-08-12 19:27:58.0 +0200 @@ -211,16 +211,41 @@ extern char empty_zero_page[PAGE_SIZE]; * C : changed bit */ -/* Hardware bits in the page table entry */ +/* Hardware bits in the page table entry. */ #define _PAGE_RO0x200 /* HW read-only */ #define _PAGE_INVALID 0x400 /* HW invalid */ -/* Mask and four different kinds of invalid pages. */ -#define _PAGE_INVALID_MASK 0x601 +/* Software bits in the page table entry. */ +#define _PAGE_FILE 0x001 +#define _PAGE_PROTNONE 0x002 + +/* + * We have 8 different page "types", two valid types and 6 invalid types + * (p = page address, o = swap offset, t = swap type, f = file offset): + * 0 xxx 0IP0 yy NF + * valid rw: 0 <p> <--0-> 00 + * valid ro: 0 <p> 0010 <--0-> 00 + * invalid none: 0 <p> 0100 <--0-> 10 + * invalid empty: 0 <0> 0100 <--0-> 00 + * invalid swap: 0 <o> 0110 <--t-> 00 + * invalid file rw:0 <f> 0100 <--f-> 01 + * invalid file ro:0 <f> 0110 <--f-> 01 + * invaild file none: 0 <f> 0100 <--f-> 11 + * + * The format for 64 bit is almost identical, there isn't a leading zero + * and the number of bits in the page address part of the pte is 52 bits + * instead of 19. + */ + #define _PAGE_INVALID_EMPTY0x400 -#define _PAGE_INVALID_NONE 0x401 #define _PAGE_INVALID_SWAP 0x600 -#define _PAGE_INVALID_FILE 0x601 +#define _PAGE_INVALID_FILE 0x401 + +#define _PTE_IS_VALID(__pte) (!(pte_val(__pte) & _PAGE_INVALID)) +#define _PTE_IS_NONE(__pte)((pte_val(__pte) & 0x603) == 0x402) +#define _PTE_IS_EMPTY(__pte) ((pte_val(__pte) & 0x603) == 0x400) +#define _PTE_IS_SWAP(__pte)((pte_val(__pte) & 0x603) == 0x600) +#define _PTE_IS_FILE(__pte)((pte_val(__pte) & 0x401) == 0x401) #ifndef __s390x__ @@ -281,13 +306,11 @@ extern char empty_zero_page[PAGE_SIZE]; /* * No mapping available */ -#define PAGE_NONE_SHARED __pgprot(_PAGE_INVALID_NONE) -#define PAGE_NONE_PRIVATE __pgprot(_PAGE_INVALID_NONE) -#define PAGE_RO_SHARED __pgprot(_PAGE_RO) -#define PAGE_RO_PRIVATE __pgprot(_PAGE_RO) -#define PAGE_COPY__pgprot(_PAGE_RO) -#define PAGE_SHARED __pgprot(0) -#define PAGE_KERNEL __pgprot(0) +#define PAGE_NONE __pgprot(_PAGE_INVALID | _PAGE_PROTNONE) +#define PAGE_READONLY __pgprot(_PAGE_RO) +#define PAGE_COPY __pgprot(_PAGE_RO) +#define PAGE_SHARED__pgprot(0) +#define PAGE_KERNEL__pgprot(0) /* * The S390 can't do page protection for execute, and considers that the @@ -295,21 +318,21 @@ extern char empty_zero_page[PAGE_SIZE]; * the closest we can get.. */ /*xwr*/ -#define __P000 PAGE_NONE_PRIVATE -#define __P001 PAGE_RO_PRIVATE +#define __P000 PAGE_NONE +#define __P001 PAGE_READONLY #define __P010 PAGE_COPY #define __P011 PAGE_COPY -#define __P100 PAGE_RO_PRIVATE -#define __P101 PAGE_RO_PRIVATE +#define __P100 PAGE_READONLY +#define __P101 PAGE_READONLY #define __P110 PAGE_COPY #define __P111 PAGE_COPY -#define __S000 PAGE_NONE_SHARED -#define __S001 PAGE_RO_SHARED +#define __S000 PAGE_NONE +#define __S001 PAGE_READONLY #define __S010 PAGE_SHARED #define __S011 PAGE_SHARED -#define __S100 PAGE_RO_SHARED -#define __S101 PAGE_RO_SHARED +#define __S100 PAGE_READONLY +#define __S101 PAGE_READONLY #define __S110 PAGE_
[patch 28/39] remap_file_pages protection support: sparc64 bits.
From: William Lee Irwin III <[EMAIL PROTECTED]> Implement remap_file_pages-with-per-page-protections for sparc64. See ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.4/2.6.4-mm1/broken-out/remap-file-pages-prot-2.6.4-rc1-mm1-A1.patch and ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.4/2.6.4-mm1/broken-out/remap-file-pages-prot-ia64-2.6.4-rc2-mm1-A0.patch Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-sparc64/pgtable.h | 13 ++--- 1 files changed, 10 insertions(+), 3 deletions(-) diff -puN include/asm-sparc64/pgtable.h~rfp-arch-sparc64 include/asm-sparc64/pgtable.h --- linux-2.6.git/include/asm-sparc64/pgtable.h~rfp-arch-sparc64 2005-08-12 18:41:31.0 +0200 +++ linux-2.6.git-paolo/include/asm-sparc64/pgtable.h 2005-08-12 18:41:31.0 +0200 @@ -367,9 +367,16 @@ static inline pte_t mk_pte_io(unsigned l /* File offset in PTE support. */ #define pte_file(pte) (pte_val(pte) & _PAGE_FILE) -#define pte_to_pgoff(pte) (pte_val(pte) >> PAGE_SHIFT) -#define pgoff_to_pte(off) (__pte(((off) << PAGE_SHIFT) | _PAGE_FILE)) -#define PTE_FILE_MAX_BITS (64UL - PAGE_SHIFT - 1UL) +#define __pte_to_pgprot(pte) \ + __pgprot(pte_val(pte) & (_PAGE_READ|_PAGE_WRITE)) +#define __file_pte_to_pgprot(pte) \ + __pgprot(((pte_val(pte) >> PAGE_SHIFT) & 0x3UL) << 8) +#define pte_to_pgprot(pte) \ + (pte_file(pte) ? __file_pte_to_pgprot(pte) : __pte_to_pgprot(pte)) +#define pte_to_pgoff(pte) (pte_val(pte) >> (PAGE_SHIFT+2)) +#define pgoff_prot_to_pte(off, prot) \ + ((__pte(((off) | ((pgprot_val(prot) >> 8) & 0x3UL << (PAGE_SHIFT+2) | _PAGE_FILE) +#define PTE_FILE_MAX_BITS (64UL - PAGE_SHIFT - 3UL) extern unsigned long prom_virt_to_phys(unsigned long, int *); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 21/39] remap_file_pages protection support: use EOVERFLOW ret code
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Use -EOVERFLOW ("Value too large for defined data type") rather than -EINVAL when we cannot store the file offset in the PTE. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/fremap.c |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff -puN mm/fremap.c~rfp-ef2big-ret-code mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-ef2big-ret-code 2005-08-11 23:04:59.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:04:59.0 +0200 @@ -213,7 +213,7 @@ asmlinkage long sys_remap_file_pages(uns /* Can we represent this offset inside this architecture's pte's? */ #if PTE_FILE_MAX_BITS < BITS_PER_LONG if (pgoff + (size >> PAGE_SHIFT) >= (1UL << PTE_FILE_MAX_BITS)) - return err; + return -EOVERFLOW; #endif /* We need down_write() to change vma->vm_flags. */ _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 29/39] remap_file_pages protection support: ppc64 bits
From: Paul Mackerras <[EMAIL PROTECTED]> ppc64 bits for remap_file_pages w/prot (no syscall table). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/asm-ppc64/pgtable.h | 12 +--- 1 files changed, 9 insertions(+), 3 deletions(-) diff -puN include/asm-ppc64/pgtable.h~rfp-arch-ppc64 include/asm-ppc64/pgtable.h --- linux-2.6.git/include/asm-ppc64/pgtable.h~rfp-arch-ppc642005-08-12 18:42:20.0 +0200 +++ linux-2.6.git-paolo/include/asm-ppc64/pgtable.h 2005-08-12 18:42:20.0 +0200 @@ -62,8 +62,8 @@ */ #define _PAGE_PRESENT 0x0001 /* software: pte contains a translation */ #define _PAGE_USER 0x0002 /* matches one of the PP bits */ -#define _PAGE_FILE 0x0002 /* (!present only) software: pte holds file offset */ #define _PAGE_EXEC 0x0004 /* No execute on POWER4 and newer (we invert) */ +#define _PAGE_FILE 0x0008 /* !present: pte holds file offset */ #define _PAGE_GUARDED 0x0008 #define _PAGE_COHERENT 0x0010 /* M: enforce memory coherence (SMP systems) */ #define _PAGE_NO_CACHE 0x0020 /* I: cache inhibit */ @@ -492,9 +492,15 @@ extern void update_mmu_cache(struct vm_a #define __swp_entry(type, offset) ((swp_entry_t) { ((type) << 1) | ((offset) << 8) }) #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) >> PTE_SHIFT }) #define __swp_entry_to_pte(x) ((pte_t) { (x).val << PTE_SHIFT }) -#define pte_to_pgoff(pte) (pte_val(pte) >> PTE_SHIFT) -#define pgoff_to_pte(off) ((pte_t) {((off) << PTE_SHIFT)|_PAGE_FILE}) + #define PTE_FILE_MAX_BITS (BITS_PER_LONG - PTE_SHIFT) +#define pte_to_pgoff(pte) (pte_val(pte) >> PTE_SHIFT) +#define pte_to_pgprot(pte) \ +__pgprot((pte_val(pte) & (_PAGE_USER|_PAGE_RW|_PAGE_PRESENT)) | _PAGE_ACCESSED) + +#define pgoff_prot_to_pte(off, prot) \ + ((pte_t) { ((off) << PTE_SHIFT) | _PAGE_FILE\ + | (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) }) /* * kern_addr_valid is intended to indicate whether an address is a valid _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 22/39] remap file pages protection support: use FAULT_SIGSEGV for protection checking, uml bits
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> This adapts the changes to the i386 handler to the UML one. It isn't enough to make UML work, however, because UML has some peculiarities. Subsequent patches fix this. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/um/kernel/trap_kern.c | 32 + 1 files changed, 27 insertions(+), 5 deletions(-) diff -puN arch/um/kernel/trap_kern.c~rfp-fault-sigsegv-2-uml arch/um/kernel/trap_kern.c --- linux-2.6.git/arch/um/kernel/trap_kern.c~rfp-fault-sigsegv-2-uml 2005-08-11 23:09:32.0 +0200 +++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c 2005-08-11 23:09:32.0 +0200 @@ -37,6 +37,7 @@ int handle_page_fault(unsigned long addr pmd_t *pmd; pte_t *pte; int err = -EFAULT; + int access_mask = 0; *code_out = SEGV_MAPERR; down_read(&mm->mmap_sem); @@ -55,14 +56,15 @@ int handle_page_fault(unsigned long addr good_area: *code_out = SEGV_ACCERR; if(is_write && !(vma->vm_flags & VM_WRITE)) - goto out; + goto prot_bad; if(!(vma->vm_flags & (VM_READ | VM_EXEC))) -goto out; +goto prot_bad; + access_mask = is_write ? VM_WRITE : 0; do { -survive: - switch (handle_mm_fault(mm, vma, address, is_write)){ +handle_fault: + switch (__handle_mm_fault(mm, vma, address, access_mask)) { case VM_FAULT_MINOR: current->min_flt++; break; @@ -72,6 +74,9 @@ survive: case VM_FAULT_SIGBUS: err = -EACCES; goto out; + case VM_FAULT_SIGSEGV: + err = -EFAULT; + goto out; case VM_FAULT_OOM: err = -ENOMEM; goto out_of_memory; @@ -87,10 +92,27 @@ survive: *pte = pte_mkyoung(*pte); if(pte_write(*pte)) *pte = pte_mkdirty(*pte); flush_tlb_page(vma, address); + + /* If the PTE is not present, the vma protection are not accurate if +* VM_NONUNIFORM; present PTE's are correct for VM_NONUNIFORM and were +* already handled otherwise. */ out: up_read(&mm->mmap_sem); return(err); +prot_bad: + if (unlikely(vma->vm_flags & VM_NONUNIFORM)) { + access_mask = is_write ? VM_WRITE : 0; + /* Otherwise, on a legitimate read fault on a page mapped as +* exec-only, we get problems. Probably, we should lower +* requirements... we should always test just +* pte_read/write/exec, on vma->vm_page_prot! This way is +* cumbersome. However, for now things should work for UML. */ + access_mask |= vma->vm_flags & VM_EXEC ? VM_EXEC : VM_READ; + goto handle_fault; + } + goto out; + /* * We ran out of memory, or some other thing happened to us that made * us unable to handle the page fault gracefully. @@ -100,7 +122,7 @@ out_of_memory: up_read(&mm->mmap_sem); yield(); down_read(&mm->mmap_sem); - goto survive; + goto handle_fault; } goto out; } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 23/39] remap_file_pages protection support: fix try_to_unmap_one for VM_NONUNIFORM vma's
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> When unmapping linear but non uniform VMA's in try_to_unmap_one, we must encode the prots in the PTE. However, we shouldn't use the generic set_nonlinear_pte() function as it allows for nonlinear offsets, on which we should instead BUG() in this code path. Additionally, add a missing TLB flush in both locations. However, there'is some excess of flushes in these functions. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/rmap.c |5 + 1 files changed, 5 insertions(+) diff -puN mm/rmap.c~rfp-fix-unmap-linear mm/rmap.c --- linux-2.6.git/mm/rmap.c~rfp-fix-unmap-linear2005-08-11 23:07:12.0 +0200 +++ linux-2.6.git-paolo/mm/rmap.c 2005-08-11 23:07:12.0 +0200 @@ -543,6 +543,10 @@ static int try_to_unmap_one(struct page flush_cache_page(vma, address, page_to_pfn(page)); pteval = ptep_clear_flush(vma, address, pte); + /* If nonlinear, store the file page offset in the pte. */ + set_nonlinear_pte(pteval, pte, vma, mm, page, address); + flush_tlb_page(vma, address); + /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) set_page_dirty(page); @@ -661,6 +665,7 @@ static void try_to_unmap_cluster(unsigne /* If nonlinear, store the file page offset in the pte. */ set_nonlinear_pte(pteval, pte, vma, mm, page, address); + flush_tlb_page(vma, address); /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 36/39] remap_file_pages protection support: avoid lookup of pages for PROT_NONE remapping
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> This optimization avoid looking up pages for PROT_NONE mappings. The idea was taken from the "wrong "historical" code for review - 1" one (the next one) from mingo, but I fixed it, by adding another "detail" parameter. I've also fixed the other callers to clear this parameter, and fixed madvise_dontneed() to use memset(0) on its parameter - currently it's probably a bug. Not even-compile tested, just written off the top of my head. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/include/linux/mm.h |1 + linux-2.6.git-paolo/mm/filemap.c | 18 ++ linux-2.6.git-paolo/mm/madvise.c | 10 ++ linux-2.6.git-paolo/mm/memory.c| 11 --- linux-2.6.git-paolo/mm/shmem.c | 11 +++ 5 files changed, 44 insertions(+), 7 deletions(-) diff -puN mm/filemap.c~rfp-avoid-lookup-pages-miss-mapping mm/filemap.c --- linux-2.6.git/mm/filemap.c~rfp-avoid-lookup-pages-miss-mapping 2005-08-12 18:42:23.0 +0200 +++ linux-2.6.git-paolo/mm/filemap.c2005-08-12 19:14:39.0 +0200 @@ -1495,6 +1495,24 @@ int filemap_populate(struct vm_area_stru struct page *page; int err; + /* +* mapping-removal fastpath: +*/ + if ((vma->vm_flags & VM_SHARED) && + (pgprot_val(prot) == pgprot_val(PAGE_NONE))) { + struct zap_details details; + + /* Still do error-checking! */ + size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; + if (pgoff + (len >> PAGE_CACHE_SHIFT) > size) + return -EINVAL; + + memset(&details, 0, sizeof(details)); + details.prot_none_ptes = 1; + zap_page_range(vma, addr, len, &details); + return 0; + } + if (!nonblock) force_page_cache_readahead(mapping, vma->vm_file, pgoff, len >> PAGE_CACHE_SHIFT); diff -puN mm/shmem.c~rfp-avoid-lookup-pages-miss-mapping mm/shmem.c --- linux-2.6.git/mm/shmem.c~rfp-avoid-lookup-pages-miss-mapping 2005-08-12 18:42:23.0 +0200 +++ linux-2.6.git-paolo/mm/shmem.c 2005-08-12 19:11:52.0 +0200 @@ -1186,6 +1186,17 @@ static int shmem_populate(struct vm_area if (pgoff >= size || pgoff + (len >> PAGE_SHIFT) > size) return -EINVAL; + /* +* mapping-removal fastpath: +*/ + if ((vma->vm_flags & VM_SHARED) && + (pgprot_val(prot) == pgprot_val(PAGE_NONE))) { + memset(&details, 0, sizeof(details)); + details.prot_none_ptes = 1; + zap_page_range(vma, addr, len, &details); + return 0; + } + while ((long) len > 0) { struct page *page = NULL; int err; diff -puN mm/memory.c~rfp-avoid-lookup-pages-miss-mapping mm/memory.c --- linux-2.6.git/mm/memory.c~rfp-avoid-lookup-pages-miss-mapping 2005-08-12 18:44:29.0 +0200 +++ linux-2.6.git-paolo/mm/memory.c 2005-08-12 19:09:50.0 +0200 @@ -575,11 +575,14 @@ static void zap_pte_range(struct mmu_gat * If details->check_mapping, we leave swap entries; * if details->nonlinear_vma, we leave file entries. */ - if (unlikely(details)) + if (unlikely(details) && !details->prot_none_ptes) continue; if (!pte_file(ptent)) free_swap_and_cache(pte_to_swp_entry(ptent)); - pte_clear(tlb->mm, addr, pte); + if (unlikely(details->prot_none_ptes)) + set_pte_at(mm, addr, pte, pfn_pte(0, __S000)); + else + pte_clear(tlb->mm, addr, pte); } while (pte++, addr += PAGE_SIZE, addr != end); pte_unmap(pte - 1); } @@ -623,7 +626,8 @@ static void unmap_page_range(struct mmu_ pgd_t *pgd; unsigned long next; - if (details && !details->check_mapping && !details->nonlinear_vma) + if (details && !details->check_mapping && !details->nonlinear_vma && + !details->prot_none_ptes) details = NULL; BUG_ON(addr >= end); @@ -1499,6 +1503,7 @@ void unmap_mapping_range(struct address_ if (details.last_index < details.first_index) details.last_index = ULONG_MAX; details.i_mmap_lock = &mapping->i_mmap_lock; + details.prot_none_ptes = 0; spin_lock(&mapping->i_mmap_lock); diff -puN include/lin
[patch 37/39] remap_file_pages protection support: wrong "historical" code for review - 1
From: Ingo Molnar <[EMAIL PROTECTED]>, Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> This "fast-path" was contained in the original remap-file-pages-prot-2.6.4-rc1-mm1-A1.patch from Ingo Molnar*; I think this code is wrong, but I'm sending it for review anyway, because I'm unsure (and in fact, in the end I found the reason for this). What I think is that this patch (done only for filemap_populate, not for shmem_populate) calls zap_page_range() when installing mappings with PROT_NONE protection. The purpose is to avoid an useless page lookup; but the PTE's will be simply marked as absent, not as _PAGE_NONE. So, with this fastpath, pages would be remapped again in their "default" position. In this case, probably a possible fix is to add yet another param in "zap_details" to mark all PTE's as PROT_NONE ones. Using details->nonlinear_vma has the inconvenient of using details->{first,last}_index and of leaving file entries unchanged. * available at http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.5/2.6.5-mm1/dropped/remap-file-pages-prot-2.6.4-rc1-mm1-A1.patch Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/filemap.c |9 + 1 files changed, 9 insertions(+) diff -puN mm/filemap.c~rfp-wrong2 mm/filemap.c --- linux-2.6.git/mm/filemap.c~rfp-wrong2 2005-08-12 18:31:32.0 +0200 +++ linux-2.6.git-paolo/mm/filemap.c2005-08-12 18:31:32.0 +0200 @@ -1495,6 +1495,15 @@ int filemap_populate(struct vm_area_stru struct page *page; int err; + /* +* mapping-removal fastpath: +*/ + if ((vma->vm_flags & VM_SHARED) && + (pgprot_val(prot) == pgprot_val(PAGE_NONE))) { + zap_page_range(vma, addr, len, NULL); + return 0; + } + if (!nonblock) force_page_cache_readahead(mapping, vma->vm_file, pgoff, len >> PAGE_CACHE_SHIFT); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 38/39] [RFC] remap_file_pages protection support: avoid dirtying on read faults for NONUNIFORM pages
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> When installing pages on non-uniform VMA's, even for read faults we must install them writable if the VMA is writable (we won't have a chance to fix that). Normally, on write faults, we install the PTE as dirty (there's a comment about 80386 on this), but maybe it's not needed here on read faults. I've looked for more info about that comment - unfortunately, it's there almost unchanged since 2.4.0, so I've found no info. However, UML does depend on the old behaviour currently (trivial to cure, anyway). And if other arch's don't have an hardware "dirty" bit, they'll depend on this too. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/memory.c |4 +++- 1 files changed, 3 insertions(+), 1 deletion(-) diff -puN mm/memory.c~rfp-fault-optim-risky mm/memory.c --- linux-2.6.git/mm/memory.c~rfp-fault-optim-risky 2005-08-12 19:25:16.0 +0200 +++ linux-2.6.git-paolo/mm/memory.c 2005-08-12 19:25:16.0 +0200 @@ -1899,8 +1899,10 @@ retry: * been set (we can have a writeable VMA with a read-only PTE), * so we must set the *exact* permission on fault, and avoid * calling do_wp_page on write faults. */ - if (write_access || unlikely(vma->vm_flags & VM_NONUNIFORM)) + if (write_access) entry = maybe_mkwrite(pte_mkdirty(entry), vma); + else if (unlikely(vma->vm_flags & VM_NONUNIFORM)) + entry = maybe_mkwrite(entry, vma); set_pte_at(mm, address, page_table, entry); if (anon) { lru_cache_add_active(new_page); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 35/39] remap_file_pages protection support: avoid redundant pte_file PTE's
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> For linear VMA's, there is no need to install pte_file PTEs to remember the offset. We could probably go as far as checking directly the address and protection like in include/linux/pagemap.h:set_nonlinear_pte(), instead of vma->vm_flags. Also add some warnings on the path which used to cope with such PTE's. Untested yet. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/fremap.c | 12 ++-- linux-2.6.git-paolo/mm/memory.c |5 + 2 files changed, 11 insertions(+), 6 deletions(-) diff -puN mm/fremap.c~rfp-linear-optim-v3 mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-linear-optim-v3 2005-08-11 23:20:09.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:20:09.0 +0200 @@ -125,6 +125,12 @@ int install_file_pte(struct mm_struct *m BUG_ON(!uniform && !(vma->vm_flags & VM_SHARED)); + /* We're being called by mmap(MAP_NONBLOCK|MAP_POPULATE) on an uniform +* VMA. So don't need to take the lock, and to install a PTE for the +* page we'd fault in anyway. */ + if (uniform) + return 0; + pgd = pgd_offset(mm, addr); spin_lock(&mm->page_table_lock); @@ -139,12 +145,6 @@ int install_file_pte(struct mm_struct *m pte = pte_alloc_map(mm, pmd, addr); if (!pte) goto err_unlock; - /* -* Skip uniform non-existent ptes: -*/ - err = 0; - if (uniform && pte_none(*pte)) - goto err_unlock; zap_pte(mm, vma, addr, pte); diff -puN mm/memory.c~rfp-linear-optim-v3 mm/memory.c --- linux-2.6.git/mm/memory.c~rfp-linear-optim-v3 2005-08-11 23:20:09.0 +0200 +++ linux-2.6.git-paolo/mm/memory.c 2005-08-11 23:20:09.0 +0200 @@ -1969,9 +1969,14 @@ static int do_file_page(struct mm_struct /* * Fall back to the linear mapping if the fs does not support * ->populate; in this case do the protection checks. +* Could have been installed by install_file_pte, for a MAP_NONBLOCK +* pagetable population. */ if (!vma->vm_ops->populate || ((access_mask & VM_WRITE) && !(vma->vm_flags & VM_SHARED))) { + /* remap_file_pages should disallow this, now that +* install_file_pte skips linear ones. */ + WARN_ON(1); /* We're behaving as if pte_file was cleared, so check * protections like in handle_pte_fault. */ if (check_perms(vma, access_mask)) _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 32/39] remap_file_pages protection support: fix i386 handler
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Actually, with the current model, we should get a failure with VMA's mapped with only PROT_WRITE (even if I wasn't able to verify that in UML, which has similar code). To test! Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/arch/i386/mm/fault.c |3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) diff -puN arch/i386/mm/fault.c~rfp-fault-sigsegv-3-i386 arch/i386/mm/fault.c --- linux-2.6.git/arch/i386/mm/fault.c~rfp-fault-sigsegv-3-i386 2005-08-12 17:12:51.0 +0200 +++ linux-2.6.git-paolo/arch/i386/mm/fault.c2005-08-12 17:12:51.0 +0200 @@ -381,7 +381,8 @@ bad_area_prot: * requirements... we should always test just * pte_read/write/exec, on vma->vm_page_prot! This way is * cumbersome. However, for now things should work for i386. */ - access_mask |= vma->vm_flags & VM_EXEC ? VM_EXEC : VM_READ; + access_mask |= vma->vm_flags & VM_EXEC ? VM_EXEC : + (vma->vm_flags & VM_READ ? VM_READ : VM_WRITE ); goto handle_fault; } /* _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 39/39] remap_file_pages protection support: wrong "historical" code for review - 2
From: Ingo Molnar <[EMAIL PROTECTED]>, Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> This "fast-path" was contained in the original remap-file-pages-prot-2.6.4-rc1-mm1-A1.patch from Ingo Molnar; I think this code is wrong, but I'm sending it for review anyway, because I'm unsure (and in fact, in the end I found the reason for this). I guess this code is intended for when we're called by sys_remap_file_page, without altering pgoffset or protections (otherwise we'd refuse operation on a private mapping). This cannot happen with mmap(MAP_POPULATE) because we clear old mappings. And the code makes sense only if we COW'ed a page, because otherwise the old mapping is already correct. I'm not sure whether we should fail here - maybe skipping the PTE would be more appropriate. Or we could anyway turn the nonblock param into a bitmask and pass O_TRUNC there. However, this is wrong because both routines can be called from within do_file_page, which is called when !pte_present(pte) && !pte_none(pte) && pte_file(pte). I.e. the pte is not zeroed, so it has been used, but the page has been swapped out, or the page hasn't been loaded in first place (for instance for MAP_NONBLOCK). More accurately, in that situation ->populate is called with nonblock == 0, so only install_page can be called there. If ->populate fails, the faulting process will get an inappropriate SIGBUS. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.git-paolo/mm/fremap.c | 15 +++ 1 files changed, 15 insertions(+) diff -puN mm/fremap.c~rfp-wrong mm/fremap.c --- linux-2.6.git/mm/fremap.c~rfp-wrong 2005-08-12 18:42:23.0 +0200 +++ linux-2.6.git-paolo/mm/fremap.c 2005-08-12 18:42:23.0 +0200 @@ -90,6 +90,14 @@ int install_page(struct mm_struct *mm, s if (!page->mapping || page->index >= size) goto err_unlock; + /* +* Only install a new page for a non-shared mapping if it's +* not existent yet: +*/ + err = -EEXIST; + if (!pte_none(*pte) && !(vma->vm_flags & VM_SHARED)) + goto err_unlock; + zap_pte(mm, vma, addr, pte); inc_mm_counter(mm,rss); @@ -145,6 +153,13 @@ int install_file_pte(struct mm_struct *m pte = pte_alloc_map(mm, pmd, addr); if (!pte) goto err_unlock; + /* +* Only install a new page for a non-shared mapping if it's +* not existent yet: +*/ + err = -EEXIST; + if (!pte_none(*pte) && !(vma->vm_flags & VM_SHARED)) + goto err_unlock; zap_pte(mm, vma, addr, pte); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/8] irq code: Add coherence test for PREEMPT_ACTIVE
On Friday 27 May 2005 02:38, [EMAIL PROTECTED] wrote: > After porting this fixlet to UML: > > http://linux.bkbits.net:8080/linux-2.5/[EMAIL PROTECTED] > > , I've also added a warning which should refuse compilation with insane > values for PREEMPT_ACTIVE... maybe we should simply move PREEMPT_ACTIVE out > of architectures using GENERIC_IRQS. Ok, a grep shows that possible culprits (i.e. giving success to grep GENERIC_HARDIRQS arch/*/Kconfig, and using 0x400 as PREEMPT_ACTIVE, as given by grep PREEMPT_ACTIVE include/asm-*/thread_info.h) are (at a first glance): frv, sh, sh64. After a bit of checking, I also verified if they had overriden the value of HARDIRQ_BITS. Which they haven't (it seems it's defined exactly where CONFIG_HARDIRQS is not used, i.e. nobody is using the ability to override it currently). This was not a very deep investigation, however, so feel free to verify this better. -- Paolo Giarrusso, aka Blaisorblade Skype user "PaoloGiarrusso" Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it
Re: [patch 4/8] irq code: Add coherence test for PREEMPT_ACTIVE
On Friday 27 May 2005 15:33, David Howells wrote: > Blaisorblade <[EMAIL PROTECTED]> wrote: > > Ok, a grep shows that possible culprits (i.e. giving success to > > grep GENERIC_HARDIRQS arch/*/Kconfig, and using 0x400 as > > PREEMPT_ACTIVE, as given by grep PREEMPT_ACTIVE > > include/asm-*/thread_info.h) are (at a first glance): frv, sh, sh64. > > For FRV that's simply because it got copied from the parent arch along with > other stuff. Feel free to move it... Do you want me to make you up a patch > to do so? Sorry but fix that yourself, otherwise you get a chance I'll forget since I'm quite busy. Thanks a lot for attention. -- Paolo Giarrusso, aka Blaisorblade Skype user "PaoloGiarrusso" Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.beta.messenger.yahoo.com
Re: [patch 4/8] irq code: Add coherence test for PREEMPT_ACTIVE
On Friday 27 May 2005 05:31, Paul Mundt wrote: > On Fri, May 27, 2005 at 03:06:09AM +0200, Blaisorblade wrote: > > On Friday 27 May 2005 02:38, [EMAIL PROTECTED] wrote: > > Ok, a grep shows that possible culprits (i.e. giving success to > > grep GENERIC_HARDIRQS arch/*/Kconfig, and using 0x400 as > > PREEMPT_ACTIVE, as given by grep PREEMPT_ACTIVE > > include/asm-*/thread_info.h) are (at a first glance): frv, sh, sh64. > Yeah, that's bogus for sh and sh64 anyways, this should do it. > It would be nice to move PRREMPT_ACTIVE so it isn't per-arch anymore, > there's not many users that use a different value (at least for the ones > using generic hardirqs, ia64 seems to be the only one?). Then in the generic headers #ifndef PREEMPT_ACTIVE #define PREEMPT_ACTIVE #else #endif Would be ok, right? -- Paolo Giarrusso, aka Blaisorblade Skype user "PaoloGiarrusso" Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade ___ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it