from:"blaisorblade"

Re: Bug with multiple help messages, the last one is shown

2005-04-05 Thread Blaisorblade

Sorry for the late answer, Yahoo put your mail in its Spam folder, and I 
didn't check until now.

On Tuesday 22 March 2005 21:00, Roman Zippel wrote:
> Hi,
>
> On Tue, 22 Mar 2005, Blaisorblade wrote:
> > I've verified multiple times that if we have a situation like this
> >
> > bool A
> > depends on TRUE
> > help
> >   Bla bla1
> >
> > and
> >
> > bool A
> > depends on FALSE
> > help
> >   Bla bla2
> >
> > even if the first option is the displayed one, the help text used is the
> > one for the second option (the absence of "prompt" is not relevant here)!
>
> Is this based on a real problem?
Yes, look at the multiple help texts in lib/Kconfig.debug in vanilla 2.6.11, 
or, in the current bk tree, in lib/Kconfig.debug and arch/um/Kconfig for 
MAGIC_SYSRQ. For UML we need different help texts, so I'd like this solved.

If you definitely don't want to fix this, we can use the old 2.4 trick of 
having CONFIG_MAGIC_SYSRQ2, for instance, with the right help and defining 
MAGIC_SYSRQ as equal to MAGIC_SYSRQ2.
> I know that there's currently one help 
> text per symbol

> and the behaviour for multiple help texts is basically 
> undefined.
Yes, it's what I saw (actually I guess and seem to have verified that the last 
read text is used).
-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [08/08] uml: va_copy fix

2005-04-05 Thread Blaisorblade

On Tuesday 05 April 2005 20:47, Renate Meijer wrote:
> On Apr 5, 2005, at 6:48 PM, Greg KH wrote:
> > -stable review patch.  If anyone has any objections, please let us
> > know.
> >
> > --
> >
> > Uses __va_copy instead of va_copy since some old versions of gcc
> > (2.95.4
> > for instance) don't accept va_copy.
>
> Are there many kernels still being built with 2.95.4? It's quite
> antiquated, as far as
> i'm aware.
>
> The use of '__' violates compiler namespace.
Why? The symbol is defined by the compiler itself.
> If 2.95.4 were not easily 
> replaced by
> a much better version (3.3.x? 3.4.x) I would see a reason to disregard
> this, but a fix
> merely to satisfy an obsolete compiler?

Let's not flame, Linus Torvalds said "we support GCC 2.95.3, because the newer 
versions are worse compilers in most cases". One user complained, even 
because he uses Debian, and I cannot do less than make sure that we comply 
with the requirements we have choosen (compiling with that GCC).

Please let's not start a flame on this. Consider me as having no opinion on 
this except not wanting to break on purpose Debian users. If you want, submit 
a patch removing Gcc 2.95.3 from supported versions, and get ready to fight 
for it (and probably loose).

Also, that GCC has discovered some syscall table errors in UML - I sent a 
separate patch, which was a bit big sadly (in the reduced version, about 70 
lines + description).
-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [08/08] uml: va_copy fix

2005-04-06 Thread Blaisorblade

For Jörn Engel and the issue he opened: at the end of this mail I describe 
another bug caught by 2.95 and not by 3.x.

On Tuesday 05 April 2005 22:18, Renate Meijer wrote:
> On Apr 5, 2005, at 8:53 PM, Blaisorblade wrote:
> > On Tuesday 05 April 2005 20:47, Renate Meijer wrote:
> >> On Apr 5, 2005, at 6:48 PM, Greg KH wrote:

> >> The use of '__' violates compiler namespace.
> >
> > Why? The symbol is defined by the compiler itself.

> If a function is prefixed with a double underscore, this implies the
> function is internal to
> the compiler, and may change at any time, since it's not governed by
> some sort of standard.
> Hence that code may start suffering from bitrot and complaining to the
> compiler guys won't help.

> They'll just tell you to RTFM.
Ok, agreed in general. However, the -stable tree is for "current" GCC. Your 
objections would better refer to the fact that the same patch has already 
been merged into the main trunk.

Also, they have no point in doing this, probably. And the __va_copy name was 
used in the draft C99 standard so it's widespread (I've read this on "man 3 
va_copy").
> >> If 2.95.4 were not easily
> >> replaced by
> >> a much better version (3.3.x? 3.4.x) I would see a reason to disregard
> >> this, but a fix
> >> merely to satisfy an obsolete compiler?
> >
> > Let's not flame, Linus Torvalds said "we support GCC 2.95.3, because
> > the newer
> > versions are worse compilers in most cases".

> You make it sound as if you were reciting Ye Holy Scribings. When did
> Linus Thorvalds say this? In the Redhat-2.96 debacle? Before or after
> 3.3? I have searched for that quote,
Sorry for the quote marks, it was a resume of what he said (and from 
re-reading, it's still a correct resume).
> but could not find it, and having 
> suffered under 3.1.1, I can well understand his wearyness for the
> earlier versions.
I've read the same kerneltrap article you quote.
> See
>
> http://kerneltrap.org/node/4126, halfway down.
Ok, read.
> For the cold, hard facts...
>
> http://www.suse.de/~aj/SPEC/
Linus pointed out that SPEC performances are not a good test case for the 
kernel compilation in that article. Point out a kernel compilation case.
> 
>
> > Consider me as having no opinion on this except not wanting to break
> > on purpose Debian users.
>
> If Debian users are stuck with a pretty outdated compiler, i'd
> seriously suggest migrating to some
> other distro which allows more freedom.
I guess they can, if they want, upgrade some selected packages from newer 
trees, maybe by recompiling (at least, on Gentoo it's trivial, maybe on a 
binary distro like Debian it's harder).
> If linux itself is holding them 
> back, there's a need for some serious patching.

> If there are serious 
> issues in the gcc compiler, which hinder migration to a more up-to-date
> version our efforts should be directed at solving them in that project,
> not this.
Linus spoke about the compiler speed, which isn't such a bad reason. He's 
unfair in saying that GCC 3.x does not optimize better than older releases, 
probably; I guess that the compilation flags (I refer to 
-fno-strict-aliasing, which disables some optimizations) make some 
difference, as do the memory barriers (as pointed in the comments).

> > If you want, submit a patch removing Gcc 2.95.3 from supported
> > versions, and get ready to fight
> > for it (and probably loose).

> I don't fight over things like that, i'm not interested in politics. I
> merely point out the problem. And yes.
> I do think support for obsolete compiler should be dumped in favor of a
> more modern version. Especially if that compiler requires invasions of
> compiler-namespace. The patch, as presented, is not guaranteed to be
> portable over versions, and may thus introduce another problem with
> future versions of GCC.
When and if that will happen, I'll come with an hack. UML already has need for 
some GCC - version specific behaviour (see arch/um/kernel/gmon_syms.c on a 
recent BitKeeper snapshot, even -rc1-bk5 has this code).

> > Also, that GCC has discovered some syscall table errors in UML - I
> > sent a
> > separate patch, which was a bit big sadly (in the reduced version,
> > about 70
> > lines + description).

> I am not quite sure what is intended here... Please explain.
I'm reattaching the patch, so that you can look at the changelog (I'm also 
resending it as a separate email so that it is reviewed and possibly merged). 
Basically this is an error in GCC 2 and not in GCC 3:

int [] list = {
 [0] = 1,
 [0] = 1
}
(I've not tested the above itsel

Re: [08/08] uml: va_copy fix

2005-04-06 Thread Blaisorblade

On Wednesday 06 April 2005 14:04, Renate Meijer wrote:
> On Apr 6, 2005, at 1:32 PM, Jörn Engel wrote:
> > On Tue, 5 April 2005 22:18:26 +0200, Renate Meijer wrote:

> >
> > You did read include/linux/compiler.h, didn't you?

> So instead of applying this patch, simply
>
> #ifdef VERSION_MINOR < WHATEVER
> #define va_copy __va_copy
> #endif
>
> in include/linux/compiler_gcc2.h
>
> Thus solving the problem without having to invade compiler namespace all
> over the place, but doing so in *one* place only.
About this one: thanks for suggesting this and being constructive, I'll do 
ASAP (if I don't forget) this for the -bk tree. However, I think that Greg KH 
for the stable tree would prefer a local tested patch rather than a global 
one with possible side effects, right Greg?

Also, I hope this discussion does not count as a vote for the -stable tree 
inclusion (since dropping GCC 2 support in the -stable tree is exactly the 
purpose of this tree, right ;-) ? ).
-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Fwd: [uml-devel] [UML/2.6] -bk7 tree does not run when compiled as SKAS-only

2005-04-06 Thread Blaisorblade

Andrew, could you please put this in your -rc regressions folder? Thanks.

--  Forwarded Message  --

Subject: [uml-devel] [UML/2.6] -bk7 tree does not run when compiled as 
SKAS-only
Date: Tuesday 22 March 2005 18:32
From: Blaisorblade <[EMAIL PROTECTED]>
To: Jeff Dike <[EMAIL PROTECTED]>, Bodo Stroesser 
<[EMAIL PROTECTED]>
Cc: user-mode-linux-devel@lists.sourceforge.net

Just verified that without TT mode enabled, 2.6.11-bk7 tree compiles (when
CONFIG_SYSCALL_DEBUG is disabled) but does not run if when compiled TT mode
was disabled. I've verified this with a clean compile (I had this doubt),
 both with static link enabled and disabled. Sample output:

./vmlinux ubd0=~/Uml/toms.rootfs
Checking for /proc/mm...found
Checking for the skas3 patch in the host...found
Checking PROT_EXEC mmap in /tmp...OK

[end of output]

2.6.11 works in the same situation (both with static link enabled and
disabled).

I'm investigating but busy with other stuff, however there are not many
patches which went in for this release.

Jeff, any ideas?
--
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade





---
This SF.net email is sponsored by: 2005 Windows Mobile Application Contest
Submit applications for Windows Mobile(tm)-based Pocket PCs or Smartphones
for the chance to win $25,000 and application distribution. Enter today at
http://ads.osdn.com/?ad_id=6882&alloc_id=15148&op=click
___
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

---

-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/1] uml: quick fix syscall table [urgent]

2005-04-06 Thread blaisorblade


CC: <[EMAIL PROTECTED]>

I'm resending this for inclusion in the -stable tree. I've deleted whitespace
cleanups, and hope this can be merged. I've been asked to split the former
patch, I don't know if I must split again this one, even because I don't want
to split this correct patch into multiple non-correct ones by mistake.

Uml 2.6.11 does not compile with gcc 2.95.4 because some entries are
duplicated, and that GCC does not accept this (unlike gcc 3). Plus various
other bugs in the syscall table definitions, resulting in probable wrong
syscall entries:

  *) 223 is a syscall hole (i.e. ni_syscall) only on i386, on x86_64 it's a
  valid syscall (thus a duplicated one).

  *) __NR_vserver must be only once with sys_ni_syscall, and not multiple
  times with different values!

  *) syscalls duplicated in SUBARCHs and in common files (thus assigning twice
  to the same array entry and causing the GCC 2.95.4 failure mentioned above):
  sys_utimes, which is common, and sys_fadvise64_64, sys_statfs64,
  sys_fstatfs64, which exist only on i386.

  *) syscalls duplicated in each SUBARCH, to put in common files:
  sys_remap_file_pages, sys_utimes, sys_fadvise64

  *) 285 is a syscall hole (i.e. ni_syscall) only on i386, on x86_64 the range
  does not arrive to that point.

  *) on x86_64, the macro name is __NR_kexec_load and not __NR_sys_kexec_load.
  Use the correct name in either case.

Note: as you can see, part of the syscall table definition in UML is
arch-independent (with everywhere defined syscalls), and part is
arch-dependant. This has created confusion (some syscalls are listed in both
places, some in the wrong one, some are wrong on one arch or another).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 clean-linux-2.6.11-paolo/arch/um/include/sysdep-i386/syscalls.h   |   12 
+-
 clean-linux-2.6.11-paolo/arch/um/include/sysdep-x86_64/syscalls.h |5 
 clean-linux-2.6.11-paolo/arch/um/kernel/sys_call_table.c  |   11 
+++--
 3 files changed, 10 insertions(+), 18 deletions(-)

diff -puN 
arch/um/include/sysdep-i386/syscalls.h~uml-quick-fix-syscall-table-for-stable 
arch/um/include/sysdep-i386/syscalls.h
--- 
clean-linux-2.6.11/arch/um/include/sysdep-i386/syscalls.h~uml-quick-fix-syscall-table-for-stable
2005-04-05 16:56:57.0 +0200
+++ clean-linux-2.6.11-paolo/arch/um/include/sysdep-i386/syscalls.h 
2005-04-05 16:56:57.0 +0200
@@ -23,6 +23,9 @@ extern long sys_mmap2(unsigned long addr
  unsigned long prot, unsigned long flags,
  unsigned long fd, unsigned long pgoff);
 
+/* On i386 they choose a meaningless naming.*/
+#define __NR_kexec_load __NR_sys_kexec_load
+
 #define ARCH_SYSCALLS \
[ __NR_waitpid ] = (syscall_handler_t *) sys_waitpid, \
[ __NR_break ] = (syscall_handler_t *) sys_ni_syscall, \
@@ -101,15 +104,12 @@ extern long sys_mmap2(unsigned long addr
[ 223 ] = (syscall_handler_t *) sys_ni_syscall, \
[ __NR_set_thread_area ] = (syscall_handler_t *) sys_ni_syscall, \
[ __NR_get_thread_area ] = (syscall_handler_t *) sys_ni_syscall, \
-   [ __NR_fadvise64 ] = (syscall_handler_t *) sys_fadvise64, \
[ 251 ] = (syscall_handler_t *) sys_ni_syscall, \
-[ __NR_remap_file_pages ] = (syscall_handler_t *) 
sys_remap_file_pages, \
-   [ __NR_utimes ] = (syscall_handler_t *) sys_utimes, \
-   [ __NR_vserver ] = (syscall_handler_t *) sys_ni_syscall,
-
+   [ 285 ] = (syscall_handler_t *) sys_ni_syscall,
+
 /* 222 doesn't yet have a name in include/asm-i386/unistd.h */
 
-#define LAST_ARCH_SYSCALL __NR_vserver
+#define LAST_ARCH_SYSCALL 285
 
 /*
  * Overrides for Emacs so that we follow Linus's tabbing style.
diff -puN 
arch/um/include/sysdep-x86_64/syscalls.h~uml-quick-fix-syscall-table-for-stable 
arch/um/include/sysdep-x86_64/syscalls.h
--- 
clean-linux-2.6.11/arch/um/include/sysdep-x86_64/syscalls.h~uml-quick-fix-syscall-table-for-stable
  2005-04-05 16:56:57.0 +0200
+++ clean-linux-2.6.11-paolo/arch/um/include/sysdep-x86_64/syscalls.h   
2005-04-05 16:56:57.0 +0200
@@ -71,12 +71,7 @@ extern syscall_handler_t sys_arch_prctl;
[ __NR_iopl ] = (syscall_handler_t *) sys_ni_syscall, \
[ __NR_set_thread_area ] = (syscall_handler_t *) sys_ni_syscall, \
[ __NR_get_thread_area ] = (syscall_handler_t *) sys_ni_syscall, \
-[ __NR_remap_file_pages ] = (syscall_handler_t *) 
sys_remap_file_pages, \
[ __NR_semtimedop ] = (syscall_handler_t *) sys_semtimedop, \
-   [ __NR_fadvise64 ] = (syscall_handler_t *) sys_fadvise64, \
-   [ 223 ] = (syscall_handler_t *) sys_ni_syscall, \
-   [ __NR_utimes ] = (syscall_handler_t *) sys_utimes, \
-   [ __NR_vserver ] = (syscall_handler_t *) sys_ni_syscall, \
[ 251 ] = (syscall_handler_t *) sys_ni_syscall,
 
 #define LAST_ARCH_SYSCALL 251
diff -puN 
a

Re: [uml-devel] [linux-2.6-bk] UML compile broken!

2005-04-06 Thread Blaisorblade

On Wednesday 06 April 2005 15:16, Anton Altaparmakov wrote:
> Uml compile is btoken in current linus bk 2.6:
>
>   CC  arch/um/kernel/ptrace.o
> arch/um/kernel/ptrace.c: In function `send_sigtrap':
> arch/um/kernel/ptrace.c:324: warning: implicit declaration of function
> `SC_IP'
> arch/um/kernel/ptrace.c:324: error: union has no member named `tt'
> arch/um/kernel/ptrace.c:324: error: union has no member named `tt'
> arch/um/kernel/ptrace.c:324: error: invalid lvalue in unary `&'
> make[1]: *** [arch/um/kernel/ptrace.o] Error 1
> make: *** [arch/um/kernel] Error 2
>
> My .config is attached.  I suspect it is because I am not compiling in
> TT support and only SKAS...
Well, good guess - you're getting more and more used with UML!

Yes, the fix is in -mm.

Quoting from -rc2-mm1 announce:

+uml-fix-compilation-for-__choose_mode-addition.patch

 UML fix

Andrew, can you merge it now, if you want, after Anton verifies it's the 
correct fix indeed for his problem? I *do* expect his situation to fail 
without the patch, but just to be more sure.

However, I recall with 2.6.11-bk7 a slightly different problem, when compiling 
only SKAS mode in, and I don't think this has been fixed:

[uml-devel] [UML/2.6] -bk7 tree does not run when compiled as SKAS-only

I'm forwarding that mail to LKML and you, Andrew - for your -rc regressions 
mail folder.
-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [stable] [patch 1/1] uml: quick fix syscall table [urgent]

2005-04-07 Thread Blaisorblade

On Wednesday 06 April 2005 22:21, Greg KH wrote:
> On Wed, Apr 06, 2005 at 08:38:00PM +0200, [EMAIL PROTECTED] wrote:
> > CC: <[EMAIL PROTECTED]>
> >
> > I'm resending this for inclusion in the -stable tree. I've deleted
> > whitespace cleanups, and hope this can be merged. I've been asked to
> > split the former patch, I don't know if I must split again this one, even
> > because I don't want to split this correct patch into multiple
> > non-correct ones by mistake.
>
> Is this patch already in 2.6.12-rc2?
Yes, with whitespace cleanups.
-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [08/08] uml: va_copy fix

2005-04-07 Thread Blaisorblade

On Thursday 07 April 2005 11:16, Renate Meijer wrote:
> On Apr 6, 2005, at 9:09 PM, Blaisorblade wrote:

> > Btw: I've not investigated which one of the two behaviours is the
> > buggy one -
> > if you know, maybe you or I can report it.
>
>  From a strict ISO-C point of view, both are. It's a gcc-specific
> "feature" which (agreed) does come in handy sometimes.

Well, for "range" assignments GCC mustn't complain, but for the rest the 
double assignment laziness is not very useful. Could they at least add a 
-Wsomething inside -Wall or -W for this problem?

> However it makes 
> it quite hard to say which is the buggy version, since the
> "appropriate" behavior
> is a question of definition (by the gcc-folks). They may even argue
> that, having changed their minds about it, neither is buggy, but both
> conform to the specifications (for that specific functionality).
>
> That's pretty much the trouble with relying on gcc-extensions: since
> there's no standard, it's difficult to tell what's wrong and what's
> right. I'll dive into it.
>
> Regards,
>
> Renate Meijer.

-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/1] reiserfs: make resize option auto-get new device size

2005-04-07 Thread blaisorblade


Cc: <[EMAIL PROTECTED]>, , <[EMAIL PROTECTED]>

It's trivial for the resize option to auto-get the underlying device size, while
it's harder for the user. I've copied the code from jfs.

Since of the different reiserfs option parser (which does not use the superior
match_token used by almost every other filesystem), I've had to use the
"resize=auto" and not "resize" option to specify this behaviour. Changing the
option parser to the kernel one wouldn't be bad but I've no time to do this
cleanup in this moment.

Btw, the mount(8) man page should be updated to include this option. Cc the
relevant people, please (I hope I cc'ed the right people).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.11-paolo/fs/reiserfs/super.c |   21 ++---
 1 files changed, 14 insertions(+), 7 deletions(-)

diff -puN fs/reiserfs/super.c~reiserfs-resize-option-like-jfs-auto-get 
fs/reiserfs/super.c
--- linux-2.6.11/fs/reiserfs/super.c~reiserfs-resize-option-like-jfs-auto-get   
2005-04-07 20:37:58.0 +0200
+++ linux-2.6.11-paolo/fs/reiserfs/super.c  2005-04-08 01:01:18.0 
+0200
@@ -889,12 +889,18 @@ static int reiserfs_parse_options (struc
char * p;

p = NULL;
-   /* "resize=NNN" */
-   *blocks = simple_strtoul (arg, &p, 0);
-   if (*p != '\0') {
-   /* NNN does not look like a number */
-   reiserfs_warning (s, "reiserfs_parse_options: bad value %s", 
arg);
-   return 0;
+   /* "resize=NNN" or "resize=auto" */
+
+   if (!strcmp(arg, "auto")) {
+   /* From JFS code, to auto-get the size.*/
+   *blocks = s->s_bdev->bd_inode->i_size >> 
s->s_blocksize_bits;
+   } else {
+   *blocks = simple_strtoul (arg, &p, 0);
+   if (*p != '\0') {
+   /* NNN does not look like a number */
+   reiserfs_warning (s, "reiserfs_parse_options: bad value 
%s", arg);
+   return 0;
+   }
}
}
 
@@ -903,7 +909,8 @@ static int reiserfs_parse_options (struc
unsigned long val = simple_strtoul (arg, &p, 0);
/* commit=NNN (time in seconds) */
if ( *p != '\0' || val >= (unsigned int)-1) {
-   reiserfs_warning (s, "reiserfs_parse_options: bad value 
%s", arg);  return 0;
+   reiserfs_warning (s, "reiserfs_parse_options: bad value 
%s", arg);
+   return 0;
}
*commit_max_age = (unsigned int)val;
}
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/1] reiserfs: make resize option auto-get new device size

2005-04-08 Thread Blaisorblade

On Friday 08 April 2005 10:10, Alex Zarochentsev wrote:
> Hi,
>
> On Fri, Apr 08, 2005 at 06:55:50AM +0200, [EMAIL PROTECTED] wrote:
> > Cc: <[EMAIL PROTECTED]>, ,
> > <[EMAIL PROTECTED]>
> >
> > It's trivial for the resize option to auto-get the underlying device
> > size, while it's harder for the user. I've copied the code from jfs.
> >
> > Since of the different reiserfs option parser (which does not use the
> > superior match_token used by almost every other filesystem), I've had to
> > use the "resize=auto" and not "resize" option to specify this behaviour.
> > Changing the option parser to the kernel one wouldn't be bad but I've no
> > time to do this cleanup in this moment.
>
> do people really need it?
Note we are speaking of 2 lines of code. And there's no point in omitting 
this.
> user-level utility reisize_reiserfs, being called w/o size argument,
> calculates the device size and uses resize mount option with correct value.
Yes, I know this. But the old versions (the one shipped on Mdk) didn't work 
for online resizing (this was verified by me with lots of warnings and a Oops 
on reiserfs code); in fact, this ability is so new that is not even 
documented in manpages.
-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/1] uml: add nfsd syscall when nfsd is modular

2005-04-12 Thread blaisorblade


CC: <[EMAIL PROTECTED]>

This trick is useless, because sys_ni.c will handle this problem by itself,
like it does even on UML for other syscalls.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 clean-linux-2.6.11-paolo/arch/um/kernel/sys_call_table.c |8 +---
 1 files changed, 1 insertion(+), 7 deletions(-)

diff -puN arch/um/kernel/sys_call_table.c~uml-nfsd-syscall 
arch/um/kernel/sys_call_table.c
--- clean-linux-2.6.11/arch/um/kernel/sys_call_table.c~uml-nfsd-syscall 
2005-04-10 13:50:29.0 +0200
+++ clean-linux-2.6.11-paolo/arch/um/kernel/sys_call_table.c2005-04-10 
13:51:19.0 +0200
@@ -14,12 +14,6 @@
 #include "sysdep/syscalls.h"
 #include "kern_util.h"
 
-#ifdef CONFIG_NFSD
-#define NFSSERVCTL sys_nfsservctl
-#else
-#define NFSSERVCTL sys_ni_syscall
-#endif
-
 #define LAST_GENERIC_SYSCALL __NR_keyctl
 
 #if LAST_GENERIC_SYSCALL > LAST_ARCH_SYSCALL
@@ -190,7 +184,7 @@ syscall_handler_t *sys_call_table[] = {
[ __NR_getresuid ] = (syscall_handler_t *) sys_getresuid16,
[ __NR_query_module ] = (syscall_handler_t *) sys_ni_syscall,
[ __NR_poll ] = (syscall_handler_t *) sys_poll,
-   [ __NR_nfsservctl ] = (syscall_handler_t *) NFSSERVCTL,
+   [ __NR_nfsservctl ] = (syscall_handler_t *) sys_nfsservctl,
[ __NR_setresgid ] = (syscall_handler_t *) sys_setresgid16,
[ __NR_getresgid ] = (syscall_handler_t *) sys_getresgid16,
[ __NR_prctl ] = (syscall_handler_t *) sys_prctl,
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/1] uml: add nfsd syscall when nfsd is modular

2005-04-12 Thread blaisorblade


CC: <[EMAIL PROTECTED]>

This trick is useless, because sys_ni.c will handle this problem by itself,
like it does even on UML for other syscalls.
Also, it does not provide the NFSD syscall when NFSD is compiled as a module,
which is a big problem.

This should be merged currently in both 2.6.11-stable and the current tree.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 clean-linux-2.6.11-paolo/arch/um/kernel/sys_call_table.c |8 +---
 1 files changed, 1 insertion(+), 7 deletions(-)

diff -puN arch/um/kernel/sys_call_table.c~uml-nfsd-syscall 
arch/um/kernel/sys_call_table.c
--- clean-linux-2.6.11/arch/um/kernel/sys_call_table.c~uml-nfsd-syscall 
2005-04-10 13:50:29.0 +0200
+++ clean-linux-2.6.11-paolo/arch/um/kernel/sys_call_table.c2005-04-10 
13:51:19.0 +0200
@@ -14,12 +14,6 @@
 #include "sysdep/syscalls.h"
 #include "kern_util.h"
 
-#ifdef CONFIG_NFSD
-#define NFSSERVCTL sys_nfsservctl
-#else
-#define NFSSERVCTL sys_ni_syscall
-#endif
-
 #define LAST_GENERIC_SYSCALL __NR_keyctl
 
 #if LAST_GENERIC_SYSCALL > LAST_ARCH_SYSCALL
@@ -190,7 +184,7 @@ syscall_handler_t *sys_call_table[] = {
[ __NR_getresuid ] = (syscall_handler_t *) sys_getresuid16,
[ __NR_query_module ] = (syscall_handler_t *) sys_ni_syscall,
[ __NR_poll ] = (syscall_handler_t *) sys_poll,
-   [ __NR_nfsservctl ] = (syscall_handler_t *) NFSSERVCTL,
+   [ __NR_nfsservctl ] = (syscall_handler_t *) sys_nfsservctl,
[ __NR_setresgid ] = (syscall_handler_t *) sys_setresgid16,
[ __NR_getresgid ] = (syscall_handler_t *) sys_getresgid16,
[ __NR_prctl ] = (syscall_handler_t *) sys_prctl,
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

CONFIG_REGPARM - prevent_tail_call doubts (context: SKAS3 bug in detail)

2005-07-11 Thread Blaisorblade

I just diagnosed (and announced) a big bug affecting the SKAS3 patch: namely, 
syscall parameter values stored in registers may be corrupted for some 
syscalls on return, when called through int 0x80, and when CONFIG_REGPARM is 
enabled.

Ok, the diagnosys of the SKAS3 bug I just noticed is that simply, this 
construct:

int do_foo(params...) {
}

asmlinkage int sys_foo(params...) {
return do_foo(a_new_param, params...);
}

does not work, because sys_foo() is optimized to reorder parameters on the 
stack and to tail-call do_foo. The corrupted parameters on the stack will 
then be restored (when calling with int $0x80) inside the userspace 
registers. From entry.S, especially from this comment:

/* if something modifies registers it must also disable sysexit */

it's clear that when using SYSENTER registers are not restored (even verified 
through sys_iopl() code, which touched EFLAGS).

I've used prevent_tail_call to fix this, and it works (verified with tests and 
assembly inspection). I even think I've understood why it works... it's clear 
why it disallows tail call, but I thought that GCC could create a normal call 
reusing some space from the stack frame of sys_foo, to create the stack frame 
of do_foo... it's just that it wouldn't improve speed.

This construct is used for four syscalls (sys_mmap2, old_mmap, sys_mprotect, 
sys_modify_ldt) and I verified the bug for all sys_mmap2 and sys_mprotect, 
and I'm sure about modify_ldt because the compiled code is identical to 
sys_mprotect().

I initially noticed this with the errno-vs-NPTL fix I and Al Viro discussed 
some time ago: it used indeed mmap2() and triggered the bug.

Luckily, strace reads the correct data (since syscall params are read before 
the syscall is done) so I couldn't do anything else than understand something 
bad was happening.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade





___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [uml-devel] Re: [patch 1/1] uml: fix lvalue for gcc4

2005-07-11 Thread Blaisorblade

On Saturday 09 July 2005 13:07, Russell King wrote:
> On Sat, Jul 09, 2005 at 01:01:33PM +0200, [EMAIL PROTECTED] wrote:
> > diff -puN arch/um/sys-x86_64/signal.c~uml-fix-for-gcc4-lvalue
> > arch/um/sys-x86_64/signal.c ---
> > linux-2.6.git/arch/um/sys-x86_64/signal.c~uml-fix-for-gcc4-lvalue   2005-07
> >-09 13:01:03.0 +0200 +++
> > linux-2.6.git-paolo/arch/um/sys-x86_64/signal.c 2005-07-09
> > 13:01:03.0 +0200 @@ -168,7 +168,7 @@ int
> > setup_signal_stack_si(unsigned long
> >
> > frame = (struct rt_sigframe __user *)
> > round_down(stack_top - sizeof(struct rt_sigframe), 16) - 8;
> > -   ((unsigned char *) frame) -= 128;
> > +   frame -= 128 / sizeof(frame);
>
> Are you sure these two are identical?
SORRY, I've become crazy, I meant sizeof(*frame)... thanks for 
noticing.

> The above code fragment looks suspicious anyway, particularly:
>
>   frame = (struct rt_sigframe __user *)
>   round_down(stack_top - sizeof(struct rt_sigframe), 16) - 8;
>
> which will put the frame at 8 * sizeof(struct rt_sigframe) below
> the point which round_down() would return (which would be 1 struct
> rt_sigframe below stack_top, rounded down).

You're completely right.

The code is copied from arch/x86_64/kernel/signal.c:setup_rt_frame(), so it 
should make some sense; but in the source, the cast is to (void*).

Surely Jeff, seeing that the result is assigned to a struct rt_sigframe 
__user, "fixed" it. The line I'm patching is new from Jeff, and I don't know 
what's about (I just remember that 

Also, the below access_ok() called on fp (which is still NULL) is surely 
completely wrong, though it won't fail (after all, NULL is under TASK_SIZE. 
right?).

On x86_64 the code is always used from arch/um/kernel/signal_kern.c, since 
CONFIG_whatever is not enabled.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade






___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [uml-devel] unregister_netdevice: waiting for tap24 to become free

2005-07-11 Thread Blaisorblade

On Monday 11 July 2005 21:12, Peter wrote:
> Hi.  I am hitting a bug that manifests in an unregister_netdevice error
> message.  After the problem is triggered processes like ifconfig, tunctl
> and route refuse to exit, even with killed.
Even from the "D" state below, it's clear that there was a deadlock on some 
semaphore, related to tap24... Could you search your kernel logs for traces 
of an Oops?
> And the only solution I 
> have found to regaining control of the server is issuing a reboot.

> The server is running a number of tap devices.  (It is a UML host server
> running the skas patches http://www.user-mode-linux.org/~blaisorblade/).
>
> Regards, Peter
>
> # uname -r
> 2.6.11.7-skas3-v8
>
> unregister_netdevice: waiting for tap24 to become free. Usage count = 1
> unregister_netdevice: waiting for tap24 to become free. Usage count = 1
> unregister_netdevice: waiting for tap24 to become free. Usage count = 1
> unregister_netdevice: waiting for tap24 to become free. Usage count = 1
> unregister_netdevice: waiting for tap24 to become free. Usage count = 1
> unregister_netdevice: waiting for tap24 to become free. Usage count = 1
> unregister_netdevice: waiting for tap24 to become free. Usage count = 1
>
>
> 30684 ?DW 0:45  \_ [tunctl]
> 31974 ?S  0:00 /bin/bash ./monitorbw.sh
> 31976 ?S  0:00  \_ /bin/bash ./monitorbw.sh
> 31978 ?D  0:00  \_ /sbin/ifconfig
> 31979 ?S  0:00  \_ grep \(tap\)\|\(RX bytes\)
> 32052 ?S  0:00 /bin/bash /opt/uml/umlcontrol.sh start --user
> gildersleeve.de
> 32112 ?S  0:00  \_ /bin/bash /opt/uml/umlrun.sh --user
> gildersleeve.de
> 32152 ?S  0:00  \_ /bin/bash ./umlnetworksetup.sh
> --check --user gildersleeve.de
> 32176 ?D  0:00  \_ tunctl -u gildersleeve.de -t tap24
>
>
> ---
> This SF.Net email is sponsored by the 'Do More With Dual!' webinar
> happening July 14 at 8am PDT/11am EDT. We invite you to explore the latest
> in dual core and dual graphics technology at this free one hour event
> hosted by HP, AMD, and NVIDIA.  To register visit
> http://www.hp.com/go/dualwebinar
> ___
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


___ 
Yahoo! Messenger: chiamate gratuite in tutto il mondo 
http://it.beta.messenger.yahoo.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: unregister_netdevice: waiting for tap24 to become free

2005-07-11 Thread Blaisorblade

On Tuesday 12 July 2005 00:26, Peter wrote:
> Nothing in the logs prior to the first error message.
>
> I've hit this before at different times on other servers.  If there are
> some commands I can run to gather more diagnostics on the problem,
> please let me know and I'll capture more information next time.
>
> I see the error was reported with older 2.6 kernels and a patch was
> floating around.  I'm not sure if that is integrated into the current
> 2.6.11 kernel.
The patch named there has been integrated, verifyable at 
http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED]

However this time the bug is probably due to something entirely different, the 
message is not very specific.

Tried 2.6.12? SKAS has been already updated (plus there's an important update 
for SKAS, from -V8 to -V8.2).
> http://www.google.com/search?q=unregister_netdevice%3A+waiting
>
> Regards, Peter

-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade





___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/1] uml: fix TT mode by reverting "use fork instead of clone"

2005-07-12 Thread blaisorblade


From: Jeff Dike <[EMAIL PROTECTED]>, Paolo 'Blaisorblade' Giarrusso <[EMAIL 
PROTECTED]>

Revert the following patch, because of miscompilation problems in different
environments leading to UML not working *at all* in TT mode; it was merged
lately in 2.6 development cycle, a little after being written, and has caused
problems to lots of people; I know it's a bit too long, but it shouldn't have
been merged in first place, so I still apply for inclusion in the -stable
tree. Anyone using this feature currently is either using some older kernel
(some reports even used 2.6.12-rc4-mm2) or using this patch, as included in my
-bs patchset.

For now there's not yet a fix for this patch, so for now the best thing is to
drop it (which was widely reported to give a working kernel).

"Convert the boot-time host ptrace testing from clone to fork.  They were
essentially doing fork anyway.  This cleans up the code a bit, and makes
valgrind a bit happier about grinding it."

URL:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=98fdffccea6cc3fe9dba32c0fcc310bcb5d71529

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 vanilla-linux-2.6.12-paolo/arch/um/kernel/process.c |   48 
 1 files changed, 29 insertions(+), 19 deletions(-)

diff -puN arch/um/kernel/process.c~uml-revert-fork-instead-of-clone 
arch/um/kernel/process.c
--- 
vanilla-linux-2.6.12/arch/um/kernel/process.c~uml-revert-fork-instead-of-clone  
2005-07-12 18:22:03.0 +0200
+++ vanilla-linux-2.6.12-paolo/arch/um/kernel/process.c 2005-07-12 
18:22:03.0 +0200
@@ -130,7 +130,7 @@ int start_fork_tramp(void *thread_arg, u
return(arg.pid);
 }
 
-static int ptrace_child(void)
+static int ptrace_child(void *arg)
 {
int ret;
int pid = os_getpid(), ppid = getppid();
@@ -159,16 +159,20 @@ static int ptrace_child(void)
_exit(ret);
 }
 
-static int start_ptraced_child(void)
+static int start_ptraced_child(void **stack_out)
 {
+   void *stack;
+   unsigned long sp;
int pid, n, status;

-   pid = fork();
-   if(pid == 0)
-   ptrace_child();
-
+   stack = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC,
+MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+   if(stack == MAP_FAILED)
+   panic("check_ptrace : mmap failed, errno = %d", errno);
+   sp = (unsigned long) stack + PAGE_SIZE - sizeof(void *);
+   pid = clone(ptrace_child, (void *) sp, SIGCHLD, NULL);
if(pid < 0)
-   panic("check_ptrace : fork failed, errno = %d", errno);
+   panic("check_ptrace : clone failed, errno = %d", errno);
CATCH_EINTR(n = waitpid(pid, &status, WUNTRACED));
if(n < 0)
panic("check_ptrace : wait failed, errno = %d", errno);
@@ -176,6 +180,7 @@ static int start_ptraced_child(void)
panic("check_ptrace : expected SIGSTOP, got status = %d",
  status);
 
+   *stack_out = stack;
return(pid);
 }
 
@@ -183,12 +188,12 @@ static int start_ptraced_child(void)
  * just avoid using sysemu, not panic, but only if SYSEMU features are broken.
  * So only for SYSEMU features we test mustpanic, while normal host features
  * must work anyway!*/
-static int stop_ptraced_child(int pid, int exitcode, int mustexit)
+static int stop_ptraced_child(int pid, void *stack, int exitcode, int 
mustpanic)
 {
int status, n, ret = 0;
 
if(ptrace(PTRACE_CONT, pid, 0, 0) < 0)
-   panic("stop_ptraced_child : ptrace failed, errno = %d", errno);
+   panic("check_ptrace : ptrace failed, errno = %d", errno);
CATCH_EINTR(n = waitpid(pid, &status, 0));
if(!WIFEXITED(status) || (WEXITSTATUS(status) != exitcode)) {
int exit_with = WEXITSTATUS(status);
@@ -199,13 +204,15 @@ static int stop_ptraced_child(int pid, i
printk("check_ptrace : child exited with exitcode %d, while "
  "expecting %d; status 0x%x", exit_with,
  exitcode, status);
-   if (mustexit)
+   if (mustpanic)
panic("\n");
else
printk("\n");
ret = -1;
}
 
+   if(munmap(stack, PAGE_SIZE) < 0)
+   panic("check_ptrace : munmap failed, errno = %d", errno);
return ret;
 }
 
@@ -227,11 +234,12 @@ __uml_setup("nosysemu", nosysemu_cmd_par
 
 static void __init check_sysemu(void)
 {
+   void *stack;
int pid, syscall, n, status, count=0;
 
printk("Checking syscall emulation patch for ptrace...");
sysemu_

Re: [stable] [patch 1/1] uml: fix TT mode by reverting "use fork instead of clone"

2005-07-13 Thread Blaisorblade

On Tuesday 12 July 2005 20:50, Chris Wright wrote:
> * [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote:
> > Revert the following patch, because of miscompilation problems in
> > different environments leading to UML not working *at all* in TT mode; it
> > was merged lately in 2.6 development cycle, a little after being written,
> > and has caused problems to lots of people; I know it's a bit too long,
> > but it shouldn't have been merged in first place, so I still apply for
> > inclusion in the -stable tree. Anyone using this feature currently is
> > either using some older kernel (some reports even used 2.6.12-rc4-mm2) or
> > using this patch, as included in my -bs patchset.

> > For now there's not yet a fix for this patch, so for now the best thing
> > is to drop it (which was widely reported to give a working kernel).

> And upstream will leave this in, working to real fix?
Preferably yes, but this depends on whether the fix is found. Otherwise this 
exact patch will be merged upstream too.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade





___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 3/9] uml: consolidate modify_ldt

2005-07-13 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

*) Reorganize the two cases of sys_modify_ldt to share all the reasonably
common code.

*) Avoid memory allocation when unneeded (i.e. when we are writing and the
passed buffer size is known), thus not returning ENOMEM (which isn't allowed
for this syscall, even if there is no strict "specification").

*) Add copy_{from,to}_user to modify_ldt for TT mode.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-broken-paolo/arch/um/sys-i386/ldt.c |  112 +++---
 linux-2.6.git-broken-paolo/include/asm-um/ldt.h   |5 
 2 files changed, 66 insertions(+), 51 deletions(-)

diff -puN arch/um/sys-i386/ldt.c~uml-modify-ldt-consolidate 
arch/um/sys-i386/ldt.c
--- linux-2.6.git-broken/arch/um/sys-i386/ldt.c~uml-modify-ldt-consolidate  
2005-07-13 19:41:00.0 +0200
+++ linux-2.6.git-broken-paolo/arch/um/sys-i386/ldt.c   2005-07-13 
19:41:00.0 +0200
@@ -4,96 +4,106 @@
  */
 
 #include "linux/config.h"
+#include "linux/sched.h"
 #include "linux/slab.h"
+#include "linux/types.h"
 #include "asm/uaccess.h"
 #include "asm/ptrace.h"
+#include "asm/smp.h"
+#include "asm/ldt.h"
 #include "choose-mode.h"
 #include "kern.h"
+#include "mode_kern.h"
 
 #ifdef CONFIG_MODE_TT
-extern int modify_ldt(int func, void *ptr, unsigned long bytecount);
 
-/* XXX this needs copy_to_user and copy_from_user */
+extern int modify_ldt(int func, void *ptr, unsigned long bytecount);
 
-int sys_modify_ldt_tt(int func, void __user *ptr, unsigned long bytecount)
+static int do_modify_ldt_tt(int func, void *ptr, unsigned long bytecount)
 {
-   if (!access_ok(VERIFY_READ, ptr, bytecount))
-   return -EFAULT;
-
return modify_ldt(func, ptr, bytecount);
 }
+
 #endif
 
 #ifdef CONFIG_MODE_SKAS
-extern int userspace_pid[];
 
+#include "skas.h"
 #include "skas_ptrace.h"
 
-int sys_modify_ldt_skas(int func, void __user *ptr, unsigned long bytecount)
+static int do_modify_ldt_skas(int func, void *ptr, unsigned long bytecount)
 {
struct ptrace_ldt ldt;
-   void *buf;
-   int res, n;
+   u32 cpu;
+   int res;
+
+   ldt = ((struct ptrace_ldt) { .func  = func,
+.ptr   = ptr,
+.bytecount = bytecount });
 
-   buf = kmalloc(bytecount, GFP_KERNEL);
-   if(buf == NULL)
-   return(-ENOMEM);
+   cpu = get_cpu();
+   res = ptrace(PTRACE_LDT, userspace_pid[cpu], 0, (unsigned long) &ldt);
+   put_cpu();
 
-   res = 0;
+   return res;
+}
+#endif
+
+int sys_modify_ldt(int func, void __user *ptr, unsigned long bytecount)
+{
+   struct user_desc info;
+   int res = 0;
+   void *buf = NULL;
+   void *p = NULL; /* What we pass to host. */
 
switch(func){
case 1:
-   case 0x11:
-   res = copy_from_user(buf, ptr, bytecount);
+   case 0x11: /* write_ldt */
+   /* Do this check now to avoid overflows. */
+   if (bytecount != sizeof(struct user_desc)) {
+   res = -EINVAL;
+   goto out;
+   }
+
+   if(copy_from_user(&info, ptr, sizeof(info))) {
+   res = -EFAULT;
+   goto out;
+   }
+
+   p = &info;
break;
-   }
+   case 0:
+   case 2: /* read_ldt */
 
-   if(res != 0){
-   res = -EFAULT;
+   /* The use of info avoids kmalloc on the write case, not on the
+* read one. */
+   buf = kmalloc(bytecount, GFP_KERNEL);
+   if (!buf) {
+   res = -ENOMEM;
+   goto out;
+   }
+   p = buf;
+   default:
+   res = -ENOSYS;
goto out;
}
 
-   ldt = ((struct ptrace_ldt) { .func  = func,
-.ptr   = buf,
-.bytecount = bytecount });
-#warning Need to look up userspace_pid by cpu
-   res = ptrace(PTRACE_LDT, userspace_pid[0], 0, (unsigned long) &ldt);
+   res = CHOOSE_MODE_PROC(do_modify_ldt_tt, do_modify_ldt_skas, func,
+   p, bytecount);
if(res < 0)
goto out;
 
switch(func){
case 0:
case 2:
-   n = res;
-   res = copy_to_user(ptr, buf, n);
-   if(res != 0)
+   /* Modify_ldt was for reading and returned the number of read
+* bytes.*/
+   if(copy_to_user(ptr, p, res))
res = -EFAULT;
-   else 
-   res = n;

[patch 2/9] uml: workaround host bug in "TT mode vs. NPTL link fix"

2005-07-13 Thread blaisorblade


A big bug has been diagnosed on hosts running the SKAS patch and built with
CONFIG_REGPARM, due to some missing prevent_tail_call().

On these hosts, this workaround is needed to avoid triggering that bug,
because "to" is kept by GCC only in EBX, which is corrupted at the return of
mmap2().

Since to trigger this bug int 0x80 must be used when doing the call, it rarely
manifests itself, so I'd prefer to get this merged to workaround that host
bug, since it should cause no functional change. Still, you might prefer to
drop it, I'll leave this to you.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-broken-paolo/arch/um/sys-i386/unmap.c   |2 +-
 linux-2.6.git-broken-paolo/arch/um/sys-x86_64/unmap.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff -puN arch/um/sys-i386/unmap.c~uml-fix-link-tt-mode-against-nptl 
arch/um/sys-i386/unmap.c
--- 
linux-2.6.git-broken/arch/um/sys-i386/unmap.c~uml-fix-link-tt-mode-against-nptl 
2005-07-13 19:37:10.0 +0200
+++ linux-2.6.git-broken-paolo/arch/um/sys-i386/unmap.c 2005-07-13 
19:37:32.0 +0200
@@ -15,7 +15,7 @@ int switcheroo(int fd, int prot, void *f
if(munmap(to, size) < 0){
return(-1);
}
-   if(mmap2(to, size, prot, MAP_SHARED | MAP_FIXED, fd, 0) != to){
+   if(mmap2(to, size, prot, MAP_SHARED | MAP_FIXED, fd, 0) == (void*) -1 ){
return(-1);
}
if(munmap(from, size) < 0){
diff -puN arch/um/sys-x86_64/unmap.c~uml-fix-link-tt-mode-against-nptl 
arch/um/sys-x86_64/unmap.c
--- 
linux-2.6.git-broken/arch/um/sys-x86_64/unmap.c~uml-fix-link-tt-mode-against-nptl
   2005-07-13 19:37:10.0 +0200
+++ linux-2.6.git-broken-paolo/arch/um/sys-x86_64/unmap.c   2005-07-13 
19:37:32.0 +0200
@@ -15,7 +15,7 @@ int switcheroo(int fd, int prot, void *f
if(munmap(to, size) < 0){
return(-1);
}
-   if(mmap(to, size, prot, MAP_SHARED | MAP_FIXED, fd, 0) != to){
+   if(mmap(to, size, prot, MAP_SHARED | MAP_FIXED, fd, 0) == (void*) -1){
return(-1);
}
if(munmap(from, size) < 0){
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 5/9] uml: fix hppfs error path

2005-07-13 Thread blaisorblade


Fix the error message to refer to the error code, i.e. err, not count, plus
add some cosmetical fixes.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-broken-paolo/fs/hppfs/hppfs_kern.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff -puN fs/hppfs/hppfs_kern.c~uml-hppfs-error-case fs/hppfs/hppfs_kern.c
--- linux-2.6.git-broken/fs/hppfs/hppfs_kern.c~uml-hppfs-error-case 
2005-07-13 19:41:36.0 +0200
+++ linux-2.6.git-broken-paolo/fs/hppfs/hppfs_kern.c2005-07-13 
19:41:36.0 +0200
@@ -233,7 +233,7 @@ static ssize_t read_proc(struct file *fi
set_fs(USER_DS);
 
if(ppos) *ppos = file->f_pos;
-   return(n);
+   return n;
 }
 
 static ssize_t hppfs_read_file(int fd, char *buf, ssize_t count)
@@ -254,7 +254,7 @@ static ssize_t hppfs_read_file(int fd, c
err = os_read_file(fd, new_buf, cur);
if(err < 0){
printk("hppfs_read : read failed, errno = %d\n",
-  count);
+  err);
n = err;
goto out_free;
}
@@ -271,7 +271,7 @@ static ssize_t hppfs_read_file(int fd, c
  out_free:
kfree(new_buf);
  out:
-   return(n);
+   return n;
 }
 
 static ssize_t hppfs_read(struct file *file, char *buf, size_t count,
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 8/9] uml - hostfs : unuse ROOT_DEV

2005-07-13 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
CC: Christoph Hellwig <[EMAIL PROTECTED]>

Minimal patch removing uses of ROOT_DEV; next patch unexports it. I've opposed
this, but I've planned to reintroduce the functionality without using
ROOT_DEV.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-broken-paolo/fs/hostfs/hostfs_kern.c |9 -
 1 files changed, 9 deletions(-)

diff -puN fs/hostfs/hostfs_kern.c~uml-hostfs-remove-root_dev-simple 
fs/hostfs/hostfs_kern.c
--- 
linux-2.6.git-broken/fs/hostfs/hostfs_kern.c~uml-hostfs-remove-root_dev-simple  
2005-07-13 19:58:18.0 +0200
+++ linux-2.6.git-broken-paolo/fs/hostfs/hostfs_kern.c  2005-07-13 
19:58:18.0 +0200
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -160,8 +159,6 @@ static int read_name(struct inode *ino, 
ino->i_size = i_size;
ino->i_blksize = i_blksize;
ino->i_blocks = i_blocks;
-   if((ino->i_sb->s_dev == ROOT_DEV) && (ino->i_uid == getuid()))
-   ino->i_uid = 0;
return(0);
 }
 
@@ -841,16 +838,10 @@ int hostfs_setattr(struct dentry *dentry
attrs.ia_mode = attr->ia_mode;
}
if(attr->ia_valid & ATTR_UID){
-   if((dentry->d_inode->i_sb->s_dev == ROOT_DEV) &&
-  (attr->ia_uid == 0))
-   attr->ia_uid = getuid();
attrs.ia_valid |= HOSTFS_ATTR_UID;
attrs.ia_uid = attr->ia_uid;
}
if(attr->ia_valid & ATTR_GID){
-   if((dentry->d_inode->i_sb->s_dev == ROOT_DEV) &&
-  (attr->ia_gid == 0))
-   attr->ia_gid = getgid();
attrs.ia_valid |= HOSTFS_ATTR_GID;
attrs.ia_gid = attr->ia_gid;
}
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 9/9] remove EXPORT_SYMBOL for root_dev

2005-07-13 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
CC: Christoph Hellwig <[EMAIL PROTECTED]>

Remove ROOT_DEV after unexporting it in the previous patch, as requested time
ago by Christoph Hellwig.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-broken-paolo/init/do_mounts.c |2 --
 1 files changed, 2 deletions(-)

diff -puN init/do_mounts.c~remove-export-root_dev init/do_mounts.c
--- linux-2.6.git-broken/init/do_mounts.c~remove-export-root_dev
2005-07-13 19:59:50.0 +0200
+++ linux-2.6.git-broken-paolo/init/do_mounts.c 2005-07-13 19:59:50.0 
+0200
@@ -25,8 +25,6 @@ static char __initdata saved_root_name[6
 /* this is initialized in init/main.c */
 dev_t ROOT_DEV;
 
-EXPORT_SYMBOL(ROOT_DEV);
-
 static int __init load_ramdisk(char *str)
 {
rd_doload = simple_strtol(str,NULL,0) & 3;
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 6/9] uml: reintroduce pcap support

2005-07-13 Thread blaisorblade


The pcap support was not working because of some linking problems (expressing
the construct in Kbuild was a bit difficult) and because there was no user
request. Now that this has come back, here's the support.

This has been tested and works on both 32 and 64-bit hosts, even when
"cross-"building 32-bit binaries.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-broken-paolo/arch/um/Kconfig_net  |2 +-
 linux-2.6.git-broken-paolo/arch/um/Makefile |   14 +-
 linux-2.6.git-broken-paolo/arch/um/drivers/Makefile |   17 ++---
 3 files changed, 24 insertions(+), 9 deletions(-)

diff -puN arch/um/drivers/Makefile~uml-reallow-pcap arch/um/drivers/Makefile
--- linux-2.6.git-broken/arch/um/drivers/Makefile~uml-reallow-pcap  
2005-07-13 19:43:05.0 +0200
+++ linux-2.6.git-broken-paolo/arch/um/drivers/Makefile 2005-07-13 
19:43:30.0 +0200
@@ -10,7 +10,6 @@ slip-objs := slip_kern.o slip_user.o
 slirp-objs := slirp_kern.o slirp_user.o
 daemon-objs := daemon_kern.o daemon_user.o
 mcast-objs := mcast_kern.o mcast_user.o
-#pcap-objs := pcap_kern.o pcap_user.o $(PCAP)
 net-objs := net_kern.o net_user.o
 mconsole-objs := mconsole_kern.o mconsole_user.o
 hostaudio-objs := hostaudio_kern.o
@@ -18,6 +17,17 @@ ubd-objs := ubd_kern.o ubd_user.o
 port-objs := port_kern.o port_user.o
 harddog-objs := harddog_kern.o harddog_user.o
 
+LDFLAGS_pcap.o := -r $(shell $(CC) $(CFLAGS) -print-file-name=libpcap.a)
+
+$(obj)/pcap.o: $(obj)/pcap_kern.o $(obj)/pcap_user.o
+   $(LD) -r -dp -o $@ $^ $(LDFLAGS) $(LDFLAGS_pcap.o)
+#XXX: The call below does not work because the flags are added before the
+# object name, so nothing from the library gets linked.
+#$(call if_changed,ld)
+
+# When the above is fixed, don't forget to add this too!
+#targets := $(obj)/pcap.o
+
 obj-y := stdio_console.o fd.o chan_kern.o chan_user.o line.o
 obj-$(CONFIG_SSL) += ssl.o
 obj-$(CONFIG_STDERR_CONSOLE) += stderr_console.o
@@ -26,7 +36,7 @@ obj-$(CONFIG_UML_NET_SLIP) += slip.o sli
 obj-$(CONFIG_UML_NET_SLIRP) += slirp.o slip_common.o
 obj-$(CONFIG_UML_NET_DAEMON) += daemon.o 
 obj-$(CONFIG_UML_NET_MCAST) += mcast.o 
-#obj-$(CONFIG_UML_NET_PCAP) += pcap.o $(PCAP)
+obj-$(CONFIG_UML_NET_PCAP) += pcap.o
 obj-$(CONFIG_UML_NET) += net.o 
 obj-$(CONFIG_MCONSOLE) += mconsole.o
 obj-$(CONFIG_MMAPPER) += mmapper_kern.o 
@@ -41,6 +51,7 @@ obj-$(CONFIG_UML_WATCHDOG) += harddog.o
 obj-$(CONFIG_BLK_DEV_COW_COMMON) += cow_user.o
 obj-$(CONFIG_UML_RANDOM) += random.o
 
-USER_OBJS := fd.o null.o pty.o tty.o xterm.o slip_common.o
+# pcap_user.o must be added explicitly.
+USER_OBJS := fd.o null.o pty.o tty.o xterm.o slip_common.o pcap_user.o
 
 include arch/um/scripts/Makefile.rules
diff -puN arch/um/Kconfig_net~uml-reallow-pcap arch/um/Kconfig_net
--- linux-2.6.git-broken/arch/um/Kconfig_net~uml-reallow-pcap   2005-07-13 
19:43:05.0 +0200
+++ linux-2.6.git-broken-paolo/arch/um/Kconfig_net  2005-07-13 
19:43:05.0 +0200
@@ -135,7 +135,7 @@ config UML_NET_MCAST
 
 config UML_NET_PCAP
bool "pcap transport"
-   depends on UML_NET && BROKEN
+   depends on UML_NET && EXPERIMENTAL
help
The pcap transport makes a pcap packet stream on the host look
like an ethernet device inside UML.  This is useful for making 
diff -puN arch/um/Makefile~uml-reallow-pcap arch/um/Makefile
--- linux-2.6.git-broken/arch/um/Makefile~uml-reallow-pcap  2005-07-13 
19:43:05.0 +0200
+++ linux-2.6.git-broken-paolo/arch/um/Makefile 2005-07-13 19:43:05.0 
+0200
@@ -56,17 +56,21 @@ include $(srctree)/$(ARCH_DIR)/Makefile-
 core-y += $(SUBARCH_CORE)
 libs-y += $(SUBARCH_LIBS)
 
-# -Derrno=kernel_errno - This turns all kernel references to errno into
-# kernel_errno to separate them from the libc errno.  This allows -fno-common
-# in CFLAGS.  Otherwise, it would cause ld to complain about the two different
-# errnos.
+# -Dvmap=kernel_vmap affects everything, and prevents anything from
+# referencing the libpcap.o symbol so named.
 
 CFLAGS += $(CFLAGS-y) -D__arch_um__ -DSUBARCH=\"$(SUBARCH)\" \
-   $(ARCH_INCLUDE) $(MODE_INCLUDE)
+   $(ARCH_INCLUDE) $(MODE_INCLUDE) -Dvmap=kernel_vmap
 
 USER_CFLAGS := $(patsubst -I%,,$(CFLAGS))
 USER_CFLAGS := $(patsubst -D__KERNEL__,,$(USER_CFLAGS)) $(ARCH_INCLUDE) \
$(MODE_INCLUDE) $(ARCH_USER_CFLAGS)
+
+# -Derrno=kernel_errno - This turns all kernel references to errno into
+# kernel_errno to separate them from the libc errno.  This allows -fno-common
+# in CFLAGS.  Otherwise, it would cause ld to complain about the two different
+# errnos.
+
 CFLAGS += -Derrno=kernel_errno -Dsigprocmask=kernel_sigprocmask
 CFLAGS += $(call cc-option,-fno-unit-at-a-time,)
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 7/9] uml: allow building as 32-bit binary on 64bit host

2005-07-13 Thread blaisorblade


This patch makes the command:

make ARCH=um SUBARCH=i386

work on x86_64 hosts (with support for building 32-bit binaries). This is
especially needed since 64-bit UMLs don't support 32-bit emulation for guest
binaries, currently. This has been tested in all possible cases and works.

Only exception is that I've built but not tested a 64-bit binary, because I
hadn't a 64-bit filesystem available.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-broken-paolo/arch/um/Makefile   |   11 +
 linux-2.6.git-broken-paolo/arch/um/Makefile-i386  |   30 +-
 linux-2.6.git-broken-paolo/arch/um/Makefile-x86_64|6 +-
 linux-2.6.git-broken-paolo/arch/um/scripts/Makefile.unmap |4 -
 4 files changed, 31 insertions(+), 20 deletions(-)

diff -puN arch/um/Makefile-i386~uml-build-on-64bit-host arch/um/Makefile-i386
--- linux-2.6.git-broken/arch/um/Makefile-i386~uml-build-on-64bit-host  
2005-07-13 19:46:33.0 +0200
+++ linux-2.6.git-broken-paolo/arch/um/Makefile-i3862005-07-13 
19:46:33.0 +0200
@@ -1,4 +1,4 @@
-SUBARCH_CORE := arch/um/sys-i386/ arch/i386/crypto/
+core-y += arch/um/sys-i386/ arch/i386/crypto/
 
 TOP_ADDR := $(CONFIG_TOP_ADDR)
 
@@ -8,21 +8,33 @@ ifeq ($(CONFIG_MODE_SKAS),y)
   endif
 endif
 
+LDFLAGS+= -m elf_i386
+ELF_ARCH   := $(SUBARCH)
+ELF_FORMAT := elf32-$(SUBARCH)
+OBJCOPYFLAGS   := -O binary -R .note -R .comment -S
+
+ifeq ("$(origin SUBARCH)", "command line")
+ifneq ("$(shell uname -m | sed -e s/i.86/i386/)", "$(SUBARCH)")
+CFLAGS += $(call cc-option,-m32)
+USER_CFLAGS+= $(call cc-option,-m32)
+HOSTCFLAGS += $(call cc-option,-m32)
+HOSTLDFLAGS+= $(call cc-option,-m32)
+AFLAGS += $(call cc-option,-m32)
+LINK-y += $(call cc-option,-m32)
+UML_OBJCOPYFLAGS   += -F $(ELF_FORMAT)
+
+export LDFLAGS HOSTCFLAGS HOSTLDFLAGS UML_OBJCOPYFLAGS
+endif
+endif
+
 CFLAGS += -U__$(SUBARCH)__ -U$(SUBARCH)
-ARCH_USER_CFLAGS :=
 
 ifneq ($(CONFIG_GPROF),y)
 ARCH_CFLAGS += -DUM_FASTCALL
 endif
 
-ELF_ARCH := $(SUBARCH)
-ELF_FORMAT := elf32-$(SUBARCH)
-
-OBJCOPYFLAGS  := -O binary -R .note -R .comment -S
-
 SYS_UTIL_DIR   := $(ARCH_DIR)/sys-i386/util
-
-SYS_HEADERS := $(SYS_DIR)/sc.h $(SYS_DIR)/thread.h
+SYS_HEADERS:= $(SYS_DIR)/sc.h $(SYS_DIR)/thread.h
 
 prepare: $(SYS_HEADERS)
 
diff -puN arch/um/Makefile~uml-build-on-64bit-host arch/um/Makefile
--- linux-2.6.git-broken/arch/um/Makefile~uml-build-on-64bit-host   
2005-07-13 19:46:33.0 +0200
+++ linux-2.6.git-broken-paolo/arch/um/Makefile 2005-07-13 19:46:33.0 
+0200
@@ -51,11 +51,6 @@ MRPROPER_DIRS+= $(ARCH_DIR)/include2
 endif
 SYS_DIR:= $(ARCH_DIR)/include/sysdep-$(SUBARCH)
 
-include $(srctree)/$(ARCH_DIR)/Makefile-$(SUBARCH)
-
-core-y += $(SUBARCH_CORE)
-libs-y += $(SUBARCH_LIBS)
-
 # -Dvmap=kernel_vmap affects everything, and prevents anything from
 # referencing the libpcap.o symbol so named.
 
@@ -64,7 +59,7 @@ CFLAGS += $(CFLAGS-y) -D__arch_um__ -DSU
 
 USER_CFLAGS := $(patsubst -I%,,$(CFLAGS))
 USER_CFLAGS := $(patsubst -D__KERNEL__,,$(USER_CFLAGS)) $(ARCH_INCLUDE) \
-   $(MODE_INCLUDE) $(ARCH_USER_CFLAGS)
+   $(MODE_INCLUDE)
 
 # -Derrno=kernel_errno - This turns all kernel references to errno into
 # kernel_errno to separate them from the libc errno.  This allows -fno-common
@@ -74,6 +69,8 @@ USER_CFLAGS := $(patsubst -D__KERNEL__,,
 CFLAGS += -Derrno=kernel_errno -Dsigprocmask=kernel_sigprocmask
 CFLAGS += $(call cc-option,-fno-unit-at-a-time,)
 
+include $(srctree)/$(ARCH_DIR)/Makefile-$(SUBARCH)
+
 #This will adjust *FLAGS accordingly to the platform.
 include $(srctree)/$(ARCH_DIR)/Makefile-os-$(OS)
 
@@ -132,7 +129,7 @@ CPPFLAGS_vmlinux.lds = -U$(SUBARCH) \
 #The wrappers will select whether using "malloc" or the kernel allocator.
 LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc
 
-CFLAGS_vmlinux = $(LINK-y) $(LINK_WRAPS)
+CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS)
 define cmd_vmlinux__
$(CC) $(CFLAGS_vmlinux) -o $@ \
-Wl,-T,$(vmlinux-lds) $(vmlinux-init) \
diff -puN arch/um/Makefile-x86_64~uml-build-on-64bit-host 
arch/um/Makefile-x86_64
--- linux-2.6.git-broken/arch/um/Makefile-x86_64~uml-build-on-64bit-host
2005-07-13 19:46:33.0 +0200
+++ linux-2.6.git-broken-paolo/arch/um/Makefile-x86_64  2005-07-13 
19:46:33.0 +0200
@@ -1,11 +1,13 @@
 # Copyright 2003 - 2004 Pathscale, Inc
 # Released under the GPL
 
-SUBARCH_LIBS := arch/um/sys-x86_64/
+libs-y += arch/um/sys-x86_64/
 START := 0x6000
 
+#We #undef __x86_64__ for kernelspace, not for userspace where
+#it's needed for headers to work!
 CFLAGS += -U__$(SUBARCH)__ -fno-builtin
-ARCH_USER_CFLAGS := -D__x86_64__
+USER_CFLAGS += -fno-builtin
 
 ELF_ARCH := i386:x8

[patch 1/9] uml: fix lvalue for gcc4

2005-07-13 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>, Russell King <[EMAIL 
PROTECTED]>

This construct is refused by GCC 4, so here's the (corrected) fix. Thanks to
Russell for noticing a stupid mistake I did when first sending this.

As he noted, the code is largely suboptimal however it currently works, and
will be fixed shortly. Just read the access_ok check on
fp which is NULL, or the pointer arithmetic below which should be done with a
cast to void*:

frame = (struct rt_sigframe __user *)
round_down(stack_top - sizeof(struct rt_sigframe), 16) - 8;

The code shows clearly that has been taken from
arch/x86_64/kernel/signal.c:setup_rt_frame(), maybe in a bit of a hurry.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-broken-paolo/arch/um/sys-x86_64/signal.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN arch/um/sys-x86_64/signal.c~uml-fix-for-gcc4-lvalue 
arch/um/sys-x86_64/signal.c
--- linux-2.6.git-broken/arch/um/sys-x86_64/signal.c~uml-fix-for-gcc4-lvalue
2005-07-13 19:30:43.0 +0200
+++ linux-2.6.git-broken-paolo/arch/um/sys-x86_64/signal.c  2005-07-13 
19:30:44.0 +0200
@@ -168,7 +168,7 @@ int setup_signal_stack_si(unsigned long 
 
frame = (struct rt_sigframe __user *)
round_down(stack_top - sizeof(struct rt_sigframe), 16) - 8;
-   ((unsigned char *) frame) -= 128;
+   frame -= 128 / sizeof(*frame);
 
if (!access_ok(VERIFY_WRITE, fp, sizeof(struct _fpstate)))
goto out;
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 4/9] uml: gcc 2.95 fix and Makefile cleanup

2005-07-13 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
CC: Raphael Bossek <[EMAIL PROTECTED]>

 1) Cleanup an ugly hyper-nested code in Makefile (now only the arith.
 expression is passed through the host bash).

 2) Fix a problem with GCC 2.95: according to a report from Raphael Bossek,
  .remap_data : { arch/um/sys-SUBARCH/unmap_fin.o (.data .bss) }
 is expanded into:
  .remap_data : { arch/um/sys-i386 /unmap_fin.o (.data .bss) }

(because I didn't use ## to join the two tokens), thus stopping linking. Pass
the whole path from the Makefile as a simple and nice fix.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-broken-paolo/arch/um/Makefile |9 +
 linux-2.6.git-broken-paolo/arch/um/kernel/uml.lds.S |4 ++--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff -puN arch/um/Makefile~uml-cleanup-Makefile-a-bit arch/um/Makefile
--- linux-2.6.git-broken/arch/um/Makefile~uml-cleanup-Makefile-a-bit
2005-07-13 19:41:17.0 +0200
+++ linux-2.6.git-broken-paolo/arch/um/Makefile 2005-07-13 19:41:17.0 
+0200
@@ -116,13 +116,14 @@ CONFIG_KERNEL_STACK_ORDER ?= 2
 STACK_SIZE := $(shell echo $$[ 4096 * (1 << $(CONFIG_KERNEL_STACK_ORDER)) ] )
 
 ifndef START
-  START = $$(($(TOP_ADDR) - $(SIZE)))
+  START = $(shell echo $$[ $(TOP_ADDR) - $(SIZE) ] )
 endif
 
-CPPFLAGS_vmlinux.lds = $(shell echo -U$(SUBARCH) \
+CPPFLAGS_vmlinux.lds = -U$(SUBARCH) \
-DSTART=$(START) -DELF_ARCH=$(ELF_ARCH) \
-   -DELF_FORMAT=\"$(ELF_FORMAT)\" $(CPP_MODE-y) \
-   -DKERNEL_STACK_SIZE=$(STACK_SIZE) -DSUBARCH=$(SUBARCH))
+   -DELF_FORMAT="$(ELF_FORMAT)" $(CPP_MODE-y) \
+   -DKERNEL_STACK_SIZE=$(STACK_SIZE) \
+   -DUNMAP_PATH=arch/um/sys-$(SUBARCH)/unmap_fin.o
 
 #The wrappers will select whether using "malloc" or the kernel allocator.
 LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc
diff -puN arch/um/kernel/uml.lds.S~uml-cleanup-Makefile-a-bit 
arch/um/kernel/uml.lds.S
--- linux-2.6.git-broken/arch/um/kernel/uml.lds.S~uml-cleanup-Makefile-a-bit
2005-07-13 19:41:17.0 +0200
+++ linux-2.6.git-broken-paolo/arch/um/kernel/uml.lds.S 2005-07-13 
19:41:17.0 +0200
@@ -16,8 +16,8 @@ SECTIONS
   __binary_start = .;
 
 #ifdef MODE_TT
-  .remap_data : { arch/um/sys-SUBARCH/unmap_fin.o (.data .bss) }
-  .remap : { arch/um/sys-SUBARCH/unmap_fin.o (.text) }
+  .remap_data : { UNMAP_PATH (.data .bss) }
+  .remap : { UNMAP_PATH (.text) }
 
   . = ALIGN(4096); /* Init code and data */
 #endif
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [uml-devel] Re: [patch 1/9] uml: fix lvalue for gcc4

2005-07-13 Thread Blaisorblade

On Wednesday 13 July 2005 23:29, Andrew Morton wrote:
> Please identify which of these patches you consider to be 2.6.13 material.
All ones are for 2.6.13... except this one, it's still wrong, I overlooked it 
a bit too much, it must be replaced by this (I'll post it in a mail it if 
needed):

http://user-mode-linux.sourceforge.net/work/current/2.6/2.6.12-mm2/patches/x86_64_compile

Bye
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade





___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] readd missing define to arch/um/Makefile-i386

2005-07-17 Thread Blaisorblade

On Sunday 17 July 2005 16:52, Olaf Hering wrote:
> New in 2.6.13-rc3-git4:

> scripts/Makefile.build:13: /Makefile: No such file or directory
> scripts/Makefile.build:64: kbuild: Makefile.build is included improperly

> the define was removed, but its still required to build some targets.

> Signed-off-by: Olaf Hering <[EMAIL PROTECTED]>
Yes, this patch is the correct fix, also for -rc3-mm1 (which has the same 
problem).

Andrew, I hadn't the time to look at how you fixed up rejects in last merge 
([PATCH] uml: allow building as 32-bit binary on 64bit host)*; the rejects 
came from the SKAS0 merge, and while you were fixing the patch up you deleted 
by mistake the line which is readded in this patch.

*
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=20d0021394c1b070bf04b22c5bc8fdb437edd4c5

-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Giving developers clue how many testers verified certain kernel version

2005-07-22 Thread Blaisorblade

Adrian Bunk  stusta.de> writes:
> On Thu, Jul 21, 2005 at 09:40:43PM -0500, Alejandro Bonilla wrote:
> > 
> >How do we know that something is OK or wrong? just by the fact that 
> > it works or not, it doesn't mean like is OK.
> > 
> > There has to be a process for any user to be able to verify and study a 
> > problem. We don't have that yet.

> If the user doesn't notice the difference then there's no problem for 
> him.
Some performance regressions aren't easily noticeable without benchmarks... 
and we've had people claiming unnoticed regressions since 2.6.2 
(http://kerneltrap.org/node/4940)
> If there's a problem the user notices, then the process is to send an 
> email to linux-kernel and/or open a bug in the kernel Bugzilla and 
> follow the "please send the output of foo" and "please test patch bar" 
> instructions.

> What comes nearest to what you are talking about is that you run LTP 
> and/or various benchmarks against every -git and every -mm kernel and 
> report regressions. But this is sinply a task someone could do (and I 
> don't know how much of it is already done e.g. at OSDL), and not 
> something every user could contribute to.

Forgot drivers testing? That is where most of the bugs are hidden, and where 
wide user testing is definitely needed because of the various hardware bugs 
and different configurations existing in real world.

IMHO, I think that publishing statistics about kernel patches downloads would 
be a very Good Thing(tm) to do. Peter, what's your opinion? I think that was 
even talked about at Kernel Summit (or at least I thought of it there), but 
I've not understood if this is going to happen.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/3] uml: share page bits handling between 2 and 3 level pagetables

2005-08-12 Thread Blaisorblade

On Saturday 30 July 2005 18:02, Jeff Dike wrote:
> On Thu, Jul 28, 2005 at 08:56:53PM +0200, [EMAIL PROTECTED] wrote:
> > As obvious, a "core code nice cleanup" is not a "stability-friendly
> > patch" so usual care applies.
>
> These look reasonable, as they are what we discussed in Ottawa.
>
> I'll put them in my tree and see if I see any problems.  I would
> suggest sending these in early after 2.6.13 if they seem OK.
Just noticed: you can drop them (except the first, which is a nice cleanup).

set_pte handles that, and include/asm-generic/pgtable.h uses coherently 
set_pte_at. I've checked UML by examining "grep pte", and either mk_pte or 
set_pte are used.

Exceptions: fixaddr_user_init (but that should be ok as we shouldn't map it 
actually), pte_modify() (which handles that only for present pages).

But pte_modify is used with set_pte, so probably we could as well drop that 
handling.

Also look, on the "set_pte" theme, at the attached patch. I realized this when 
I needed those lines to work - I was getting a segfault loop.

After using set_pte(), things worked. I have now an almost perfectly working 
implementation of remap_file_pages with protection support.
There will probably be some other things to update, like swapping locations, 
but I can't get this kernel to fail (it's easier to find bugs in the 
test-program, it grew quite complex).

And, I'd like to note, original Ingo's version *DID NOT* work properly (it was 
not safe against swapout, it didn't allow write-protecting a page 
successfully).

I'm going to clean up the code and write changelogs, to send then the patches 
for -mm (hoping the page fault scalability patches don't get in the way).
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

The PTE returned from handle_mm_fault is already marked as dirty and accessed
if needed.
Also, since this is not set with set_pte() (which sets NEWPAGE and NEWPROT as
needed), this wouldn't work anyway.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/um/kernel/trap_kern.c |3 +--
 1 files changed, 1 insertion(+), 2 deletions(-)

diff -puN arch/um/kernel/trap_kern.c~uml-avoid-already-done-dirtying arch/um/kernel/trap_kern.c
--- linux-2.6.git/arch/um/kernel/trap_kern.c~uml-avoid-already-done-dirtying	2005-08-10 19:21:13.0 +0200
+++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c	2005-08-10 19:21:13.0 +0200
@@ -83,8 +83,7 @@ survive:
 		pte = pte_offset_kernel(pmd, address);
 	} while(!pte_present(*pte));
 	err = 0;
-	*pte = pte_mkyoung(*pte);
-	if(pte_write(*pte)) *pte = pte_mkdirty(*pte);
+	WARN_ON(!pte_young(*pte) || pte_write(*pte) && !pte_dirty(*pte));
 	flush_tlb_page(vma, address);
 out:
 	up_read(&mm->mmap_sem);
_

Really BAD granularity example in BKCVS output

2005-08-12 Thread Blaisorblade

I've locally downloaded and installed the GIT version of the BitKeeper tree 
(the first existing upload - have been away for a while, don't know if there 
are others), and while browsing the history for some work, I found this 
commit:

http://localhost/~paolo/git/?p=old-2.6-bkcvs/.git;a=commit;h=b035f9332ce7e205af43f7cfdf4e1cf3625f7ad5

(the hashes work on the kernel.org copy of that repository, assuming it wasn't 
re-exported).

Well, that is *awfully* big (543 files touched)! Isn't there anything which 
can be done about that? What is worse, the commit message is truncated!

And yes, sorry if this is a stupid question.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade



___ 
Yahoo! Messenger: chiamate gratuite in tutto il mondo 
http://it.beta.messenger.yahoo.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Feature removal: ACPI S4bios support, ioctl32_conversion

2005-08-12 Thread Blaisorblade

Looking at Documentation/feature-removal-schedule.txt, I've seen an out-dated 
feature still to remove for you, so I thought to drop you a reminder email.

Thanks for attention
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


___ 
Yahoo! Messenger: chiamate gratuite in tutto il mondo 
http://it.beta.messenger.yahoo.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 07/39] uml: fault handler micro-cleanups

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Avoid chomping low bits of address for functions doing it by themselves, fix
whitespace, add a correctness checking.

I did this for remap-file-pages protection support, it was useful on its own
too.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/um/kernel/trap_kern.c |   28 +++--
 1 files changed, 13 insertions(+), 15 deletions(-)

diff -puN arch/um/kernel/trap_kern.c~uml-fault-handler-changes 
arch/um/kernel/trap_kern.c
--- linux-2.6.git/arch/um/kernel/trap_kern.c~uml-fault-handler-changes  
2005-08-11 11:18:03.0 +0200
+++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c  2005-08-11 
11:19:56.0 +0200
@@ -26,6 +26,7 @@
 #include "mem.h"
 #include "mem_kern.h"
 
+/* Note this is constrained to return 0, -EFAULT, -EACCESS, -ENOMEM by segv(). 
*/
 int handle_page_fault(unsigned long address, unsigned long ip, 
  int is_write, int is_user, int *code_out)
 {
@@ -35,7 +36,6 @@ int handle_page_fault(unsigned long addr
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   unsigned long page;
int err = -EFAULT;
 
*code_out = SEGV_MAPERR;
@@ -52,7 +52,7 @@ int handle_page_fault(unsigned long addr
else if(expand_stack(vma, address)) 
goto out;
 
- good_area:
+good_area:
*code_out = SEGV_ACCERR;
if(is_write && !(vma->vm_flags & VM_WRITE)) 
goto out;
@@ -60,9 +60,8 @@ int handle_page_fault(unsigned long addr
 if(!(vma->vm_flags & (VM_READ | VM_EXEC)))
 goto out;
 
-   page = address & PAGE_MASK;
do {
- survive:
+survive:
switch (handle_mm_fault(mm, vma, address, is_write)){
case VM_FAULT_MINOR:
current->min_flt++;
@@ -79,16 +78,16 @@ int handle_page_fault(unsigned long addr
default:
BUG();
}
-   pgd = pgd_offset(mm, page);
-   pud = pud_offset(pgd, page);
-   pmd = pmd_offset(pud, page);
-   pte = pte_offset_kernel(pmd, page);
+   pgd = pgd_offset(mm, address);
+   pud = pud_offset(pgd, address);
+   pmd = pmd_offset(pud, address);
+   pte = pte_offset_kernel(pmd, address);
} while(!pte_present(*pte));
err = 0;
*pte = pte_mkyoung(*pte);
if(pte_write(*pte)) *pte = pte_mkdirty(*pte);
-   flush_tlb_page(vma, page);
- out:
+   flush_tlb_page(vma, address);
+out:
up_read(&mm->mmap_sem);
return(err);
 
@@ -144,19 +143,18 @@ unsigned long segv(struct faultinfo fi, 
panic("Kernel mode fault at addr 0x%lx, ip 0x%lx", 
  address, ip);
 
-   if(err == -EACCES){
+   if (err == -EACCES) {
si.si_signo = SIGBUS;
si.si_errno = 0;
si.si_code = BUS_ADRERR;
si.si_addr = (void *)address;
 current->thread.arch.faultinfo = fi;
force_sig_info(SIGBUS, &si, current);
-   }
-   else if(err == -ENOMEM){
+   } else if (err == -ENOMEM) {
printk("VM: killing process %s\n", current->comm);
do_exit(SIGKILL);
-   }
-   else {
+   } else {
+   BUG_ON(err != -EFAULT);
si.si_signo = SIGSEGV;
si.si_addr = (void *) address;
 current->thread.arch.faultinfo = fi;
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 15/39] remap_file_pages protection support: add VM_NONUNIFORM to fix existing usage of mprotect()

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Distinguish between "normal" VMA and VMA with non-uniform protection. This
will be also useful for fault handling (we must ignore VM_{READ,WRITE,EXEC} in
the arch fault handler).

As said before, with remap-file-pages-prot, we must punt on private VMA even
when we're just changing protections.

Also, with the remap_file_pages protection support, we have indeed a
regression with remap_file_pages VS mprotect. mprotect alters the VMA
protections and walks each installed PTE.

Mprotect'ing a nonlinear VMA used to work, obviously, but now doesn't, because
we must now read the protections from the PTE which haven't been updated; so,
to avoid changing behaviour for old binaries, on uniform VMA's we ignore
protections in the PTE, like we did before.

On non-uniform VMA's, instead, mprotect is currently broken, however we've
never supported it so this is acceptable.

What it does is to split the VMA if needed, assign the new protection to the
VMA and enforce the new protections on all present pages, ignoring all absent
ones (including pte_file() ones), which will keep the current protections. So,
the application has no reliable way to know which pages would actually be
remapped.

What is more, there is IMHO no reason to support using mprotect on non-uniform
VMAs. The only exception is to change the VMA's default protection (which is
used for non-individually remapped pages), but it should still ignore the page
tables.

The only need for that is if I want to change protections without changing the
indexes, which with remap_file_pages you must do one page at a time and
re-specifying the indexes.

It is more reasonable to allow remap_file_pages to change protections on a PTE
range without changing the offsets. I've not implemented this, but if wanted I
can. For sure, UML doesn't need this interface.

However, for now I've implemented no change to mprotect(), I'd like to get
some feedback before about which way to go.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/linux/mm.h |7 +++
 linux-2.6.git-paolo/mm/fremap.c|   13 +
 linux-2.6.git-paolo/mm/memory.c|2 +-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff -puN mm/fremap.c~rfp-add-VM_NONUNIFORM mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-add-VM_NONUNIFORM 2005-08-11 
23:03:51.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:03:51.0 +0200
@@ -252,6 +252,19 @@ retry:
spin_unlock(&mapping->i_mmap_lock);
}
}
+   if (pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot)) {
+   if (!(vma->vm_flags & VM_SHARED))
+   goto out_unlock;
+   if (!(vma->vm_flags & VM_NONUNIFORM)) {
+   if (!has_write_lock) {
+   up_read(&mm->mmap_sem);
+   down_write(&mm->mmap_sem);
+   has_write_lock = 1;
+   goto retry;
+   }
+   vma->vm_flags |= VM_NONUNIFORM;
+   }
+   }
 
err = vma->vm_ops->populate(vma, start, size, pgprot, pgoff,
flags & MAP_NONBLOCK);
diff -puN include/linux/mm.h~rfp-add-VM_NONUNIFORM include/linux/mm.h
--- linux-2.6.git/include/linux/mm.h~rfp-add-VM_NONUNIFORM  2005-08-11 
23:03:51.0 +0200
+++ linux-2.6.git-paolo/include/linux/mm.h  2005-08-11 23:03:51.0 
+0200
@@ -160,7 +160,14 @@ extern unsigned int kobjsize(const void 
 #define VM_ACCOUNT 0x0010  /* Is a VM accounted object */
 #define VM_HUGETLB 0x0040  /* Huge TLB Page VM */
 #define VM_NONLINEAR   0x0080  /* Is non-linear (remap_file_pages) */
+
+#ifndef CONFIG_MMU
 #define VM_MAPPED_COPY 0x0100  /* T if mapped copy of data (nommu 
mmap) */
+#else
+#define VM_NONUNIFORM  0x0100  /* The VM individual pages have
+  different protections
+  (remap_file_pages)*/
+#endif
 
 #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
diff -puN mm/memory.c~rfp-add-VM_NONUNIFORM mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-add-VM_NONUNIFORM 2005-08-11 
23:03:51.0 +0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-11 23:03:51.0 +0200
@@ -1941,7 +1941,7 @@ static int do_file_page(struct mm_struct
}
 
pgoff = pte_to_pgoff(*pte);
-   pgprot = pte_to_pgprot(*pte);
+   pgpr

[patch 06/39] correct _PAGE_FILE comment

2005-08-12 Thread blaisorblade


_PAGE_FILE does not indicate whether a file is in page / swap cache, it is set
just for non-linear PTE's. Correct the comment for i386, x86_64, UML. Also
clearify _PAGE_NONE.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-i386/pgtable.h   |   10 +-
 linux-2.6.git-paolo/include/asm-um/pgtable.h |8 +---
 linux-2.6.git-paolo/include/asm-x86_64/pgtable.h |2 +-
 3 files changed, 11 insertions(+), 9 deletions(-)

diff -puN include/asm-i386/pgtable.h~correct-_PAGE_FILE-comment 
include/asm-i386/pgtable.h
--- linux-2.6.git/include/asm-i386/pgtable.h~correct-_PAGE_FILE-comment 
2005-08-11 11:17:04.0 +0200
+++ linux-2.6.git-paolo/include/asm-i386/pgtable.h  2005-08-11 
11:17:04.0 +0200
@@ -86,9 +86,7 @@ void paging_init(void);
 #endif
 
 /*
- * The 4MB page is guessing..  Detailed in the infamous "Chapter H"
- * of the Pentium details, but assuming intel did the straightforward
- * thing, this bit set in the page directory entry just means that
+ * _PAGE_PSE set in the page directory entry just means that
  * the page directory entry points directly to a 4MB-aligned block of
  * memory. 
  */
@@ -119,8 +117,10 @@ void paging_init(void);
 #define _PAGE_UNUSED2  0x400
 #define _PAGE_UNUSED3  0x800
 
-#define _PAGE_FILE 0x040   /* set:pagecache unset:swap */
-#define _PAGE_PROTNONE 0x080   /* If not present */
+/* If _PAGE_PRESENT is clear, we use these: */
+#define _PAGE_FILE 0x040   /* nonlinear file mapping, saved PTE; 
unset:swap */
+#define _PAGE_PROTNONE 0x080   /* if the user mapped it with PROT_NONE;
+  pte_present gives true */
 #ifdef CONFIG_X86_PAE
 #define _PAGE_NX   (1ULL<<_PAGE_BIT_NX)
 #else
diff -puN include/asm-x86_64/pgtable.h~correct-_PAGE_FILE-comment 
include/asm-x86_64/pgtable.h
--- linux-2.6.git/include/asm-x86_64/pgtable.h~correct-_PAGE_FILE-comment   
2005-08-11 11:17:04.0 +0200
+++ linux-2.6.git-paolo/include/asm-x86_64/pgtable.h2005-08-11 
11:17:04.0 +0200
@@ -143,7 +143,7 @@ extern inline void pgd_clear (pgd_t * pg
 #define _PAGE_ACCESSED 0x020
 #define _PAGE_DIRTY0x040
 #define _PAGE_PSE  0x080   /* 2MB page */
-#define _PAGE_FILE 0x040   /* set:pagecache, unset:swap */
+#define _PAGE_FILE 0x040   /* nonlinear file mapping, saved PTE; 
unset:swap */
 #define _PAGE_GLOBAL   0x100   /* Global TLB entry */
 
 #define _PAGE_PROTNONE 0x080   /* If not present */
diff -puN include/asm-um/pgtable.h~correct-_PAGE_FILE-comment 
include/asm-um/pgtable.h
--- linux-2.6.git/include/asm-um/pgtable.h~correct-_PAGE_FILE-comment   
2005-08-11 11:17:04.0 +0200
+++ linux-2.6.git-paolo/include/asm-um/pgtable.h2005-08-11 
11:17:04.0 +0200
@@ -16,13 +16,15 @@
 
 #define _PAGE_PRESENT  0x001
 #define _PAGE_NEWPAGE  0x002
-#define _PAGE_NEWPROT   0x004
-#define _PAGE_FILE 0x008   /* set:pagecache unset:swap */
-#define _PAGE_PROTNONE 0x010   /* If not present */
+#define _PAGE_NEWPROT  0x004
 #define _PAGE_RW   0x020
 #define _PAGE_USER 0x040
 #define _PAGE_ACCESSED 0x080
 #define _PAGE_DIRTY0x100
+/* If _PAGE_PRESENT is clear, we use these: */
+#define _PAGE_FILE 0x008   /* nonlinear file mapping, saved PTE; 
unset:swap */
+#define _PAGE_PROTNONE 0x010   /* if the user mapped it with PROT_NONE;
+  pte_present gives true */
 
 #ifdef CONFIG_3_LEVEL_PGTABLES
 #include "asm/pgtable-3level.h"
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 33/39] remap_file_pages protection support: VM_FAULT_SIGSEGV permission checking rework

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Simplify the generic arch permission checking: the previous one was clumsy, as
it didn't account arch-specific implications (read implies exec, write implies
read, and so on).

Still to undo fixes for the archs (i386 and UML) which were modified for the
previous scheme.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/memory.c |   49 ++--
 1 files changed, 33 insertions(+), 16 deletions(-)

diff -puN mm/memory.c~rfp-sigsegv-4 mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-sigsegv-4 2005-08-12 17:18:55.0 
+0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-12 17:18:55.0 +0200
@@ -1923,6 +1923,35 @@ oom:
goto out;
 }
 
+static inline int check_perms(struct vm_area_struct * vma, int access_mask) {
+   if (unlikely(vm_flags & VM_NONUNIFORM)) {
+   /* we used to check protections in arch handler, but with
+* VM_NONUNIFORM the check is skipped. */
+#if 0
+   if ((access_mask & VM_WRITE) > (vm_flags & VM_WRITE))
+   goto err;
+   if ((access_mask & VM_READ) > (vm_flags & VM_READ))
+   goto err;
+   if ((access_mask & VM_EXEC) > (vm_flags & VM_EXEC))
+   goto err;
+#else
+   /* access_mask contains the type of the access, vm_flags are the
+* declared protections, pte has the protection which will be
+* given to the PTE's in that area. */
+   //pte_t pte = pfn_pte(0UL, protection_map[vm_flags & 
0x0f|VM_SHARED]);
+   pte_t pte = pfn_pte(0UL, vma->vm_page_prot);
+   if ((access_mask & VM_WRITE) && ! pte_write(pte))
+   goto err;
+   if ((access_mask & VM_READ) && ! pte_read(pte))
+   goto err;
+   if ((access_mask & VM_EXEC) && ! pte_exec(pte))
+   goto err;
+#endif
+   }
+   return 0;
+err:
+   return -EPERM;
+}
 /*
  * Fault of a previously existing named mapping. Repopulate the pte
  * from the encoded file_pte if possible. This enables swappable
@@ -1944,14 +1973,8 @@ static int do_file_page(struct mm_struct
((access_mask & VM_WRITE) && !(vma->vm_flags & 
VM_SHARED))) {
/* We're behaving as if pte_file was cleared, so check
 * protections like in handle_pte_fault. */
-   if (unlikely(vma->vm_flags & VM_NONUNIFORM)) {
-   if ((access_mask & VM_WRITE) > (vma->vm_flags & 
VM_WRITE))
-   goto out_segv;
-   if ((access_mask & VM_READ) > (vma->vm_flags & VM_READ))
-   goto out_segv;
-   if ((access_mask & VM_EXEC) > (vma->vm_flags & VM_EXEC))
-   goto out_segv;
-   }
+   if (check_perms(vma, access_mask))
+   goto out_segv;
 
pte_clear(mm, address, pte);
return do_no_page(mm, vma, address, access_mask & VM_WRITE, 
pte, pmd);
@@ -2007,14 +2030,8 @@ static inline int handle_pte_fault(struc
/* when pte_file(), the VMA protections are useless. Otherwise,
 * we used to check protections in arch handler, but with
 * VM_NONUNIFORM the check is skipped. */
-   if (unlikely(vma->vm_flags & VM_NONUNIFORM) && 
!pte_file(entry)) {
-   if ((access_mask & VM_WRITE) > (vma->vm_flags & 
VM_WRITE))
-   goto out_segv;
-   if ((access_mask & VM_READ) > (vma->vm_flags & VM_READ))
-   goto out_segv;
-   if ((access_mask & VM_EXEC) > (vma->vm_flags & VM_EXEC))
-   goto out_segv;
-   }
+   if (!pte_file(entry) && check_perms(vma, access_mask))
+   goto out_segv;
 
/*
 * If it truly wasn't present, we know that kswapd
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 12/39] remap_file_pages protection support: enhance syscall interface and swapout code

2005-08-12 Thread blaisorblade


From: Ingo Molnar <[EMAIL PROTECTED]>, Paolo 'Blaisorblade' Giarrusso <[EMAIL 
PROTECTED]>

This is the "main" patch for the syscall code, containing the core of what was
sent by Ingo Molnar, variously reworked.

Differently from his patch, I've *not* added a new syscall, choosing to add a
new flag (MAP_NOINHERIT) which the application must specify to get the new
behavior (prot != 0 is accepted and prot == 0 means PROT_NONE).

The changes to the page fault handler have been separated, even because that
has required considerable amount of effort.

Handle the possibility that remap_file_pages changes protections in 
various places.

* Enable the 'prot' parameter for shared-writable mappings (the ones
  which are the primary target for remap_file_pages), without breaking up the
  vma
* Use pte_file PTE's also when protections don't match, not only when the
  offset doesn't match; and add set_nonlinear_pte() for this testing
* Save the current protection too when clearing a nonlinear PTE, by
  replacing pgoff_to_pte() uses with pgoff_prot_to_pte().
* Use the supplied protections on restore and on populate (partially
  uncomplete, fixed in subsequent patches)

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/linux/pagemap.h |   19 ++
 linux-2.6.git-paolo/mm/fremap.c |   50 +---
 linux-2.6.git-paolo/mm/memory.c |   14 ---
 linux-2.6.git-paolo/mm/rmap.c   |3 -
 4 files changed, 60 insertions(+), 26 deletions(-)

diff -puN include/linux/pagemap.h~rfp-enhance-syscall-and-swapout-code 
include/linux/pagemap.h
--- linux-2.6.git/include/linux/pagemap.h~rfp-enhance-syscall-and-swapout-code  
2005-08-11 22:59:47.0 +0200
+++ linux-2.6.git-paolo/include/linux/pagemap.h 2005-08-11 22:59:47.0 
+0200
@@ -159,6 +159,25 @@ static inline pgoff_t linear_page_index(
return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 }
 
+/***
+ * Checks if the PTE is nonlinear, and if yes sets it.
+ * @vma: the VMA in which @addr is; we don't check if it's VM_NONLINEAR, just
+ * if this PTE is nonlinear.
+ * @addr: the addr which @pte refers to.
+ * @pte: the old PTE value (to read its protections.
+ * @ptep: the PTE pointer (for setting it).
+ * @mm: passed to set_pte_at.
+ * @page: the page which was installed (to read its ->index, i.e. the old
+ * offset inside the file.
+ */
+static inline void set_nonlinear_pte(pte_t pte, pte_t * ptep, struct 
vm_area_struct *vma, struct mm_struct *mm, struct page* page, unsigned long 
addr)
+{
+   pgprot_t pgprot = pte_to_pgprot(pte);
+   if(linear_page_index(vma, addr) != page->index || 
+   pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot))
+   set_pte_at(mm, addr, ptep, pgoff_prot_to_pte(page->index, 
pgprot));
+}
+
 extern void FASTCALL(__lock_page(struct page *page));
 extern void FASTCALL(unlock_page(struct page *page));
 
diff -puN mm/fremap.c~rfp-enhance-syscall-and-swapout-code mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-enhance-syscall-and-swapout-code  
2005-08-11 22:59:47.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:01:14.0 +0200
@@ -54,7 +54,7 @@ static inline void zap_pte(struct mm_str
  * previously existing mapping.
  */
 int install_page(struct mm_struct *mm, struct vm_area_struct *vma,
-   unsigned long addr, struct page *page, pgprot_t prot)
+   unsigned long addr, struct page *page, pgprot_t pgprot)
 {
struct inode *inode;
pgoff_t size;
@@ -94,7 +94,7 @@ int install_page(struct mm_struct *mm, s
 
inc_mm_counter(mm,rss);
flush_icache_page(vma, page);
-   set_pte_at(mm, addr, pte, mk_pte(page, prot));
+   set_pte_at(mm, addr, pte, mk_pte(page, pgprot));
page_add_file_rmap(page);
pte_val = *pte;
pte_unmap(pte);
@@ -113,7 +113,7 @@ EXPORT_SYMBOL(install_page);
  * previously existing mapping.
  */
 int install_file_pte(struct mm_struct *mm, struct vm_area_struct *vma,
-   unsigned long addr, unsigned long pgoff, pgprot_t prot)
+   unsigned long addr, unsigned long pgoff, pgprot_t pgprot)
 {
int err = -ENOMEM;
pte_t *pte;
@@ -139,7 +139,7 @@ int install_file_pte(struct mm_struct *m
 
zap_pte(mm, vma, addr, pte);
 
-   set_pte_at(mm, addr, pte, pgoff_to_pte(pgoff));
+   set_pte_at(mm, addr, pte, pgoff_prot_to_pte(pgoff, pgprot));
pte_val = *pte;
pte_unmap(pte);
update_mmu_cache(vma, addr, pte_val);
@@ -157,31 +157,28 @@ err_unlock:
  *file within an existing vma.
  * @start: start of the remapped virtual memory range
  * @size: size of the remapped virtual memory range
- * @prot: new protection bits of the range
+ * @prot: new protection bits of the range, must be 0 if not us

[patch 30/39] remap_file_pages protection support: ia64 bits

2005-08-12 Thread blaisorblade


From: Ingo Molnar <[EMAIL PROTECTED]>

I've attached a 'blind' port of the prot bits of fremap to ia64.  I've
compiled it with a cross-compiler but otherwise it's untested.  (and it's
very likely i got the pte bits wrong - but it's roughly OK.)

This should at least make ia64 compile.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-ia64/pgtable.h |   17 +
 1 files changed, 13 insertions(+), 4 deletions(-)

diff -puN include/asm-ia64/pgtable.h~rfp-arch-ia64 include/asm-ia64/pgtable.h
--- linux-2.6.git/include/asm-ia64/pgtable.h~rfp-arch-ia64  2005-08-12 
19:27:03.0 +0200
+++ linux-2.6.git-paolo/include/asm-ia64/pgtable.h  2005-08-12 
19:27:03.0 +0200
@@ -433,7 +433,8 @@ extern void paging_init (void);
  * Format of file pte:
  * bit   0   : present bit (must be zero)
  * bit   1   : _PAGE_FILE (must be one)
- * bits  2-62: file_offset/PAGE_SIZE
+ * bit   2   : _PAGE_AR_RW
+ * bits  3-62: file_offset/PAGE_SIZE
  * bit  63   : _PAGE_PROTNONE bit
  */
 #define __swp_type(entry)  (((entry).val >> 2) & 0x7f)
@@ -442,9 +443,17 @@ extern void paging_init (void);
 #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) })
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val })
 
-#define PTE_FILE_MAX_BITS  61
-#define pte_to_pgoff(pte)  ((pte_val(pte) << 1) >> 3)
-#define pgoff_to_pte(off)  ((pte_t) { ((off) << 2) | _PAGE_FILE })
+#define PTE_FILE_MAX_BITS  59
+#define pte_to_pgoff(pte)  ((pte_val(pte) << 1) >> 4)
+
+#define pte_to_pgprot(pte) \
+   __pgprot((pte_val(pte) & (_PAGE_AR_RW | _PAGE_PROTNONE)) \
+   | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \
+   (__ACCESS_BITS | _PAGE_PL_3)))
+
+#define pgoff_prot_to_pte(off, prot) \
+   ((pte_t) { _PAGE_FILE + \
+   (pgprot_val(prot) & (_PAGE_AR_RW | _PAGE_PROTNONE)) + (off) })
 
 /* XXX is this right? */
 #define io_remap_page_range(vma, vaddr, paddr, size, prot) \
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 27/39] remap_file_pages protection support: fixups to ppc32 bits

2005-08-12 Thread blaisorblade


From: Paul Mackerras <[EMAIL PROTECTED]>

When I tried -mm4 on a ppc32 box, it hit a BUG because I hadn't excluded
_PAGE_FILE from the bits used for swap entries.  While looking at that I
realised that the pte_to_pgoff and pgoff_prot_to_pte macros were wrong for
4xx and 8xx (embedded) PPC chips, since they use

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-ppc/pgtable.h |   48 +-
 1 files changed, 39 insertions(+), 9 deletions(-)

diff -puN include/asm-ppc/pgtable.h~rfp-arch-ppc32-pgtable-fixes 
include/asm-ppc/pgtable.h
--- linux-2.6.git/include/asm-ppc/pgtable.h~rfp-arch-ppc32-pgtable-fixes
2005-08-12 18:18:44.0 +0200
+++ linux-2.6.git-paolo/include/asm-ppc/pgtable.h   2005-08-12 
18:18:44.0 +0200
@@ -205,6 +205,7 @@ extern unsigned long ioremap_bot, iorema
  */
 #define _PAGE_PRESENT  0x0001  /* S: PTE valid */
 #define_PAGE_RW0x0002  /* S: Write permission 
*/
+#define _PAGE_FILE 0x0004  /* S: nonlinear file mapping */
 #define_PAGE_DIRTY 0x0004  /* S: Page dirty */
 #define _PAGE_ACCESSED 0x0008  /* S: Page referenced */
 #define _PAGE_HWWRITE  0x0010  /* H: Dirty & RW */
@@ -213,7 +214,6 @@ extern unsigned long ioremap_bot, iorema
 #define_PAGE_ENDIAN0x0080  /* H: E bit */
 #define_PAGE_GUARDED   0x0100  /* H: G bit */
 #define_PAGE_COHERENT  0x0200  /* H: M bit */
-#define _PAGE_FILE 0x0400  /* S: nonlinear file mapping */
 #define_PAGE_NO_CACHE  0x0400  /* H: I bit */
 #define_PAGE_WRITETHRU 0x0800  /* H: W bit */
 
@@ -724,20 +724,50 @@ extern void paging_init(void);
 #define __swp_type(entry)  ((entry).val & 0x1f)
 #define __swp_offset(entry)((entry).val >> 5)
 #define __swp_entry(type, offset)  ((swp_entry_t) { (type) | ((offset) << 
5) })
+
+#if defined(CONFIG_4xx) || defined(CONFIG_8xx)
+/* _PAGE_FILE and _PAGE_PRESENT are in the bottom 3 bits on all these chips. */
 #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) 
>> 3 })
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val << 3 })
+#else  /* Classic PPC */
+#define __pte_to_swp_entry(pte)\
+((swp_entry_t) { ((pte_val(pte) >> 3) & ~1) | ((pte_val(pte) >> 2) & 1) })
+#define __swp_entry_to_pte(x)  \
+((pte_t) { (((x).val & ~1) << 3) | (((x).val & 1) << 2) })
+#endif
 
 /* Encode and decode a nonlinear file mapping entry */
-#define PTE_FILE_MAX_BITS  27
-#define pte_to_pgoff(pte)  (((pte_val(pte) & ~0x7ff) >> 5) \
-| ((pte_val(pte) & 0x3f0) >> 4))
-#define pte_to_pgprot(pte) \
-__pgprot((pte_val(pte) & (_PAGE_USER|_PAGE_RW|_PAGE_PRESENT)) | _PAGE_ACCESSED)
+/* We can't use any the _PAGE_PRESENT, _PAGE_FILE, _PAGE_USER, _PAGE_RW,
+   or _PAGE_HASHPTE bits for storing a page offset. */
+#if defined(CONFIG_40x)
+/* 40x, avoid the 0x53 bits - to simplify things, avoid 0x73 */ */
+#define __pgoff_split(x)   x) << 5) & ~0x7f) | (((x) << 2) & 0xc))
+#define __pgoff_glue(x)x) & ~0x7f) >> 5) | (((x) & 0xc) >> 
2))
+#elif defined(CONFIG_44x)
+/* 44x, avoid the 0x47 bits */
+#define __pgoff_split(x)   x) << 4) & ~0x7f) | (((x) << 3) & 0x38))
+#define __pgoff_glue(x)x) & ~0x7f) >> 4) | (((x) & 0x38) 
>> 3))
+#elif defined(CONFIG_8xx)
+/* 8xx, avoid the 0x843 bits */
+#define __pgoff_split(x)   x) << 4) & ~0xfff) | (((x) << 3) & 0x780) \
+| (((x) << 2) & 0x3c))
+#define __pgoff_glue(x)x) & ~0xfff) >> 4) | (((x) & 0x780) 
>> 3))\
+| (((x) & 0x3c) >> 2))
+#else
+/* classic PPC, avoid the 0x40f bits */
+#define __pgoff_split(x)   x) << 5) & ~0x7ff) | (((x) << 4) & 0x3f0))
+#define __pgoff_glue(x)x) & ~0x7ff) >> 5) | (((x) & 0x3f0) 
>> 4))
+#endif
 
+#define PTE_FILE_MAX_BITS  27
+#define pte_to_pgoff(pte)  __pgoff_glue(pte_val(pte))
 #define pgoff_prot_to_pte(off, prot)   \
-   ((pte_t) { (((off) << 5) & ~0x7ff) | (((off) << 4) & 0x3f0) \
-  | (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) \
-  | _PAGE_FILE })
+   ((pte_t) { __pgoff_split(off) | _PAGE_FILE |\
+  (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) })
+
+#de

[patch 11/39] remap_file_pages protection support: add MAP_NOINHERIT flag

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Add the MAP_NOINHERIT flag to arch headers, for use with remap-file-pages.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-i386/mman.h   |1 +
 linux-2.6.git-paolo/include/asm-ia64/mman.h   |1 +
 linux-2.6.git-paolo/include/asm-ppc/mman.h|1 +
 linux-2.6.git-paolo/include/asm-ppc64/mman.h  |1 +
 linux-2.6.git-paolo/include/asm-s390/mman.h   |1 +
 linux-2.6.git-paolo/include/asm-x86_64/mman.h |1 +
 6 files changed, 6 insertions(+)

diff -puN include/asm-i386/mman.h~rfp-map-noinherit include/asm-i386/mman.h
--- linux-2.6.git/include/asm-i386/mman.h~rfp-map-noinherit 2005-08-11 
12:06:40.0 +0200
+++ linux-2.6.git-paolo/include/asm-i386/mman.h 2005-08-11 12:06:40.0 
+0200
@@ -22,6 +22,7 @@
 #define MAP_NORESERVE  0x4000  /* don't check for reservations */
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
+#define MAP_NOINHERIT  0x2 /* don't inherit the protection bits of 
the underlying vma*/
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_INVALIDATE  2   /* invalidate the caches */
diff -puN include/asm-ia64/mman.h~rfp-map-noinherit include/asm-ia64/mman.h
--- linux-2.6.git/include/asm-ia64/mman.h~rfp-map-noinherit 2005-08-11 
12:06:40.0 +0200
+++ linux-2.6.git-paolo/include/asm-ia64/mman.h 2005-08-11 12:06:40.0 
+0200
@@ -30,6 +30,7 @@
 #define MAP_NORESERVE  0x04000 /* don't check for reservations */
 #define MAP_POPULATE   0x08000 /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
+#define MAP_NOINHERIT  0x2 /* don't inherit the protection bits of 
the underlying vma*/
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_INVALIDATE  2   /* invalidate the caches */
diff -puN include/asm-ppc64/mman.h~rfp-map-noinherit include/asm-ppc64/mman.h
--- linux-2.6.git/include/asm-ppc64/mman.h~rfp-map-noinherit2005-08-11 
12:06:40.0 +0200
+++ linux-2.6.git-paolo/include/asm-ppc64/mman.h2005-08-11 
12:06:40.0 +0200
@@ -38,6 +38,7 @@
 
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
+#define MAP_NOINHERIT  0x2 /* don't inherit the protection bits of 
the underlying vma*/
 
 #define MADV_NORMAL0x0 /* default page-in behavior */
 #define MADV_RANDOM0x1 /* page-in minimum required */
diff -puN include/asm-ppc/mman.h~rfp-map-noinherit include/asm-ppc/mman.h
--- linux-2.6.git/include/asm-ppc/mman.h~rfp-map-noinherit  2005-08-11 
12:06:40.0 +0200
+++ linux-2.6.git-paolo/include/asm-ppc/mman.h  2005-08-11 12:06:40.0 
+0200
@@ -23,6 +23,7 @@
 #define MAP_EXECUTABLE 0x1000  /* mark it as an executable */
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
+#define MAP_NOINHERIT  0x2 /* don't inherit the protection bits of 
the underlying vma*/
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_INVALIDATE  2   /* invalidate the caches */
diff -puN include/asm-s390/mman.h~rfp-map-noinherit include/asm-s390/mman.h
--- linux-2.6.git/include/asm-s390/mman.h~rfp-map-noinherit 2005-08-11 
12:06:40.0 +0200
+++ linux-2.6.git-paolo/include/asm-s390/mman.h 2005-08-11 12:06:40.0 
+0200
@@ -30,6 +30,7 @@
 #define MAP_NORESERVE  0x4000  /* don't check for reservations */
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
+#define MAP_NOINHERIT  0x2 /* don't inherit the protection bits of 
the underlying vma*/
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_INVALIDATE  2   /* invalidate the caches */
diff -puN include/asm-x86_64/mman.h~rfp-map-noinherit include/asm-x86_64/mman.h
--- linux-2.6.git/include/asm-x86_64/mman.h~rfp-map-noinherit   2005-08-11 
12:06:40.0 +0200
+++ linux-2.6.git-paolo/include/asm-x86_64/mman.h   2005-08-11 
12:06:40.0 +0200
@@ -23,6 +23,7 @@
 #define MAP_NORESERVE  0x4000  /* don't check for reservations */
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
+#define MAP_NOINHERIT  0x2 /* don't inherit the protection bits of 
the underlying vma*/
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define

[patch 14/39] remap_file_pages protection support: assume VM_SHARED never disappears

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Assume that even after dropping and reacquiring the lock, (vma->vm_flags &
VM_SHARED) won't change, thus moving a check earlier.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |   12 ++--
 1 files changed, 2 insertions(+), 10 deletions(-)

diff -puN mm/fremap.c~rfp-assume-VM_PRIVATE-stays mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-assume-VM_PRIVATE-stays   2005-08-11 
12:58:07.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 13:38:56.0 +0200
@@ -232,6 +232,8 @@ retry:
 
/* Must set VM_NONLINEAR before any pages are populated. */
if (pgoff != linear_page_index(vma, start)) {
+   if (!(vma->vm_flags & VM_SHARED))
+   goto out_unlock;
if (!(vma->vm_flags & VM_NONLINEAR)) {
if (!has_write_lock) {
up_read(&mm->mmap_sem);
@@ -239,12 +241,6 @@ retry:
has_write_lock = 1;
goto retry;
}
-   /* XXX: we check VM_SHARED after re-getting the
-* (write) semaphore but I guess that we could
-* check it earlier as we're not allowed to turn
-* a VM_PRIVATE vma into a VM_SHARED one! */
-   if (!(vma->vm_flags & VM_SHARED))
-   goto out_unlock;
 
mapping = vma->vm_file->f_mapping;
spin_lock(&mapping->i_mmap_lock);
@@ -254,10 +250,6 @@ retry:
vma_nonlinear_insert(vma, 
&mapping->i_mmap_nonlinear);
flush_dcache_mmap_unlock(mapping);
spin_unlock(&mapping->i_mmap_lock);
-   } else {
-   /* Won't drop the lock, check it here.*/
-   if (!(vma->vm_flags & VM_SHARED))
-   goto out_unlock;
}
}
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 35/39] remap_file_pages protection support: avoid redundant pte_file PTE's

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

For linear VMA's, there is no need to install pte_file PTEs to remember the
offset. We could probably go as far as checking directly the address and
protection like in include/linux/pagemap.h:set_nonlinear_pte(), instead of
vma->vm_flags. Also add some warnings on the path which used to cope with such
PTE's.

Untested yet.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |   12 ++--
 linux-2.6.git-paolo/mm/memory.c |5 +
 2 files changed, 11 insertions(+), 6 deletions(-)

diff -puN mm/fremap.c~rfp-linear-optim-v3 mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-linear-optim-v3   2005-08-11 
23:20:09.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:20:09.0 +0200
@@ -125,6 +125,12 @@ int install_file_pte(struct mm_struct *m
 
BUG_ON(!uniform && !(vma->vm_flags & VM_SHARED));
 
+   /* We're being called by mmap(MAP_NONBLOCK|MAP_POPULATE) on an uniform
+* VMA. So don't need to take the lock, and to install a PTE for the
+* page we'd fault in anyway. */
+   if (uniform)
+   return 0;
+
pgd = pgd_offset(mm, addr);
spin_lock(&mm->page_table_lock);

@@ -139,12 +145,6 @@ int install_file_pte(struct mm_struct *m
pte = pte_alloc_map(mm, pmd, addr);
if (!pte)
goto err_unlock;
-   /*
-* Skip uniform non-existent ptes:
-*/
-   err = 0;
-   if (uniform && pte_none(*pte))
-   goto err_unlock;
 
zap_pte(mm, vma, addr, pte);
 
diff -puN mm/memory.c~rfp-linear-optim-v3 mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-linear-optim-v3   2005-08-11 
23:20:09.0 +0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-11 23:20:09.0 +0200
@@ -1969,9 +1969,14 @@ static int do_file_page(struct mm_struct
/*
 * Fall back to the linear mapping if the fs does not support
 * ->populate; in this case do the protection checks.
+* Could have been installed by install_file_pte, for a MAP_NONBLOCK
+* pagetable population.
 */
if (!vma->vm_ops->populate ||
((access_mask & VM_WRITE) && !(vma->vm_flags & 
VM_SHARED))) {
+   /* remap_file_pages should disallow this, now that
+* install_file_pte skips linear ones. */
+   WARN_ON(1);
/* We're behaving as if pte_file was cleared, so check
 * protections like in handle_pte_fault. */
if (check_perms(vma, access_mask))
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 03/39] add swap cache mapping comment

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Add some more comments about page->mapping and swapper_space, explaining their
(historical and current) relationship. Such material can be extracted from the
old GIT history (which I used for reference), but having it in the source is
more useful.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/./mm/swap_state.c |5 +
 1 files changed, 5 insertions(+)

diff -puN ./mm/swap_state.c~swap-cache-mapping-comment ./mm/swap_state.c
--- linux-2.6.git/./mm/swap_state.c~swap-cache-mapping-comment  2005-08-11 
11:12:57.0 +0200
+++ linux-2.6.git-paolo/./mm/swap_state.c   2005-08-11 11:12:57.0 
+0200
@@ -21,6 +21,11 @@
  * swapper_space is a fiction, retained to simplify the path through
  * vmscan's shrink_list, to make sync_page look nicer, and to allow
  * future use of radix_tree tags in the swap cache.
+ *
+ * In 2.4 and until 2.6.6 pages in the swap cache also had page->mapping ==
+ * &swapper_space (this was the definition of PageSwapCache), but this is no
+ * more true. Instead, we use page->flags for that, and page->mapping is
+ * *ignored* here. However, also take a look at page_mapping().
  */
 static struct address_space_operations swap_aops = {
.writepage  = swap_writepage,
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 31/39] remap_file_pages protection support: s390 bits

2005-08-12 Thread blaisorblade


From: Martin Schwidefsky <[EMAIL PROTECTED]>

s390 memory management changes for remap-file-pages-prot patch:

- Add pgoff_prot_to_pte/pte_to_pgprot, remove pgoff_to_pte (required for
  'prot' parameteter in shared-writeable mappings).

- Handle VM_FAULT_SIGSEGV from handle_mm_fault in do_exception.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/s390/mm/fault.c   |2 
 linux-2.6.git-paolo/include/asm-s390/pgtable.h |   90 -
 2 files changed, 60 insertions(+), 32 deletions(-)

diff -puN arch/s390/mm/fault.c~rfp-arch-s390 arch/s390/mm/fault.c
--- linux-2.6.git/arch/s390/mm/fault.c~rfp-arch-s3902005-08-12 
19:27:58.0 +0200
+++ linux-2.6.git-paolo/arch/s390/mm/fault.c2005-08-12 19:27:58.0 
+0200
@@ -260,6 +260,8 @@ survive:
goto do_sigbus;
case VM_FAULT_OOM:
goto out_of_memory;
+   case VM_FAULT_SIGSEGV:
+   goto bad_area;
default:
BUG();
}
diff -puN include/asm-s390/pgtable.h~rfp-arch-s390 include/asm-s390/pgtable.h
--- linux-2.6.git/include/asm-s390/pgtable.h~rfp-arch-s390  2005-08-12 
19:27:58.0 +0200
+++ linux-2.6.git-paolo/include/asm-s390/pgtable.h  2005-08-12 
19:27:58.0 +0200
@@ -211,16 +211,41 @@ extern char empty_zero_page[PAGE_SIZE];
  * C  : changed bit
  */
 
-/* Hardware bits in the page table entry */
+/* Hardware bits in the page table entry. */
 #define _PAGE_RO0x200  /* HW read-only */
 #define _PAGE_INVALID   0x400  /* HW invalid   */
 
-/* Mask and four different kinds of invalid pages. */
-#define _PAGE_INVALID_MASK 0x601
+/* Software bits in the page table entry. */
+#define _PAGE_FILE 0x001
+#define _PAGE_PROTNONE 0x002
+
+/*
+ * We have 8 different page "types", two valid types and 6 invalid types
+ * (p = page address, o = swap offset, t = swap type, f = file offset):
+ * 0 xxx 0IP0 yy NF
+ * valid rw:   0 <p>  <--0-> 00
+ * valid ro:   0 <p> 0010 <--0-> 00
+ * invalid none:   0 <p> 0100 <--0-> 10
+ * invalid empty:  0 <0> 0100 <--0-> 00
+ * invalid swap:   0 <o> 0110 <--t-> 00
+ * invalid file rw:0 <f> 0100 <--f-> 01
+ * invalid file ro:0 <f> 0110 <--f-> 01
+ * invaild file none:  0 <f> 0100 <--f-> 11
+ *
+ * The format for 64 bit is almost identical, there isn't a leading zero
+ * and the number of bits in the page address part of the pte is 52 bits
+ * instead of 19.
+ */
+
 #define _PAGE_INVALID_EMPTY0x400
-#define _PAGE_INVALID_NONE 0x401
 #define _PAGE_INVALID_SWAP 0x600
-#define _PAGE_INVALID_FILE 0x601
+#define _PAGE_INVALID_FILE 0x401
+
+#define _PTE_IS_VALID(__pte)   (!(pte_val(__pte) & _PAGE_INVALID))
+#define _PTE_IS_NONE(__pte)((pte_val(__pte) & 0x603) == 0x402)
+#define _PTE_IS_EMPTY(__pte)   ((pte_val(__pte) & 0x603) == 0x400)
+#define _PTE_IS_SWAP(__pte)((pte_val(__pte) & 0x603) == 0x600)
+#define _PTE_IS_FILE(__pte)((pte_val(__pte) & 0x401) == 0x401)
 
 #ifndef __s390x__
 
@@ -281,13 +306,11 @@ extern char empty_zero_page[PAGE_SIZE];
 /*
  * No mapping available
  */
-#define PAGE_NONE_SHARED  __pgprot(_PAGE_INVALID_NONE)
-#define PAGE_NONE_PRIVATE __pgprot(_PAGE_INVALID_NONE)
-#define PAGE_RO_SHARED   __pgprot(_PAGE_RO)
-#define PAGE_RO_PRIVATE  __pgprot(_PAGE_RO)
-#define PAGE_COPY__pgprot(_PAGE_RO)
-#define PAGE_SHARED  __pgprot(0)
-#define PAGE_KERNEL  __pgprot(0)
+#define PAGE_NONE  __pgprot(_PAGE_INVALID | _PAGE_PROTNONE)
+#define PAGE_READONLY  __pgprot(_PAGE_RO)
+#define PAGE_COPY  __pgprot(_PAGE_RO)
+#define PAGE_SHARED__pgprot(0)
+#define PAGE_KERNEL__pgprot(0)
 
 /*
  * The S390 can't do page protection for execute, and considers that the
@@ -295,21 +318,21 @@ extern char empty_zero_page[PAGE_SIZE];
  * the closest we can get..
  */
  /*xwr*/
-#define __P000  PAGE_NONE_PRIVATE
-#define __P001  PAGE_RO_PRIVATE
+#define __P000  PAGE_NONE
+#define __P001  PAGE_READONLY
 #define __P010  PAGE_COPY
 #define __P011  PAGE_COPY
-#define __P100  PAGE_RO_PRIVATE
-#define __P101  PAGE_RO_PRIVATE
+#define __P100  PAGE_READONLY
+#define __P101  PAGE_READONLY
 #define __P110  PAGE_COPY
 #define __P111  PAGE_COPY
 
-#define __S000  PAGE_NONE_SHARED
-#define __S001  PAGE_RO_SHARED
+#define __S000  PAGE_NONE
+#define __S001  PAGE_READONLY
 #define __S010  PAGE_SHARED
 #define __S011  PAGE_SHARED
-#define __S100  PAGE_RO_SHARED
-#define __S101  PAGE_RO_SHARED
+#define __S100  PAGE_READONLY
+#define __S101  PAGE_READONLY
 #define __S110  PAGE_

[patch 16/39] remap_file_pages protection support: readd lock downgrading

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Even now, we'll sometimes take the write lock.  So, in that case, we could
downgrade it; after a tiny bit of thought, I've choosen doing that when we'll
either do any I/O or we'll alter a lot of PTEs. About how much "a lot" is,
I've copied the values from this code in mm/memory.c:

#ifdef CONFIG_PREEMPT
# define ZAP_BLOCK_SIZE (8 * PAGE_SIZE)
#else
/* No preempt: go for improved straight-line efficiency */
# define ZAP_BLOCK_SIZE (1024 * PAGE_SIZE)
#endif

I'm not sure about the trade-offs - we used to have a down_write, now we have
a down_read() and a possible up_read()down_write(), and with this patch, the
fast-path still takes only down_read, but the slow path will do down_read(),
down_write(), downgrade_write(). This will increase the number of atomic
operation but increase concurrency wrt mmap and similar operations - I don't
know how much contention there is on that lock.

Also, drop a bust comment: we cannot clear VM_NONLINEAR simply because code
elsewhere is going to use it. At the very least, madvise_dontneed() relies on
that flag being set (remaining non-linear truncation read the mapping
list), but the list is probably longer and going to increase in the next
patches of this series.

Just in case this wasn't clear: this patch is not strictly related to
protection support, I was just too lazy to move it up in the hierarchy.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |   18 +-
 1 files changed, 13 insertions(+), 5 deletions(-)

diff -puN mm/fremap.c~rfp-downgrade-lock mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-downgrade-lock2005-08-11 
23:04:39.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:04:39.0 +0200
@@ -152,6 +152,13 @@ err_unlock:
 }
 
 
+#ifdef CONFIG_PREEMPT
+# define INSTALL_SIZE  (8 * PAGE_SIZE)
+#else
+/* No preempt: go for improved straight-line efficiency */
+# define INSTALL_SIZE  (1024 * PAGE_SIZE)
+#endif
+
 /***
  * sys_remap_file_pages - remap arbitrary pages of a shared backing store
  *file within an existing vma.
@@ -266,14 +273,15 @@ retry:
}
}
 
+   /* Do NOT hold the write lock while doing any I/O, nor when
+* iterating over too many PTEs. Values might need tuning. */
+   if (has_write_lock && (!(flags & MAP_NONBLOCK) || size > 
INSTALL_SIZE)) {
+   downgrade_write(&mm->mmap_sem);
+   has_write_lock = 0;
+   }
err = vma->vm_ops->populate(vma, start, size, pgprot, pgoff,
flags & MAP_NONBLOCK);
 
-   /*
-* We can't clear VM_NONLINEAR because we'd have to do
-* it after ->populate completes, and that would prevent
-* downgrading the lock.  (Locks can't be upgraded).
-*/
}
 
 out_unlock:
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 34/39] remap_file_pages protection support: restrict permission testing

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Yet to test. Currently we install a PTE when one is missing
irrispective of the fault type, and if the access type is prohibited we'll
get another fault and kill the process only then. With this, we check the
access type on the 1st fault.

We could also use this code for testing present PTE's, if the current
assumption (fault on present PTE's in VM_NONUNIFORM vma's means access 
violation)
proves problematic for architectures other than UML (which I already fixed),
but I hope it's not needed.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/memory.c |   16 
 1 files changed, 16 insertions(+)

diff -puN mm/memory.c~rfp-fault-sigsegv-3 mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-fault-sigsegv-3   2005-08-12 
17:19:17.0 +0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-12 17:19:17.0 +0200
@@ -1963,6 +1963,7 @@ static int do_file_page(struct mm_struct
unsigned long pgoff;
pgprot_t pgprot;
int err;
+   pte_t test_entry;
 
BUG_ON(!vma->vm_ops || !vma->vm_ops->nopage);
/*
@@ -1983,6 +1984,21 @@ static int do_file_page(struct mm_struct
pgoff = pte_to_pgoff(*pte);
pgprot = vma->vm_flags & VM_NONUNIFORM ? pte_to_pgprot(*pte): 
vma->vm_page_prot;
 
+   /* If this is not enabled, we'll get another fault after return next
+* time, check we handle that one, and that this code works. */
+#if 1
+   /* We just want to test pte_{read,write,exec} */
+   test_entry = mk_pte(0, pgprot);
+   if (unlikely(vma->vm_flags & VM_NONUNIFORM) && !pte_file(*pte)) {
+   if ((access_mask & VM_WRITE) && !pte_write(test_entry))
+   goto out_segv;
+   if ((access_mask & VM_READ) && !pte_read(test_entry))
+   goto out_segv;
+   if ((access_mask & VM_EXEC) && !pte_exec(test_entry))
+   goto out_segv;
+   }
+#endif
+
pte_unmap(pte);
spin_unlock(&mm->page_table_lock);
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 08/39] remap_file_pages protection support: uml bits

2005-08-12 Thread blaisorblade


Update pte encoding macros for UML.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-um/pgtable-2level.h |   15 ++
 linux-2.6.git-paolo/include/asm-um/pgtable-3level.h |   21 +++-
 2 files changed, 27 insertions(+), 9 deletions(-)

diff -puN include/asm-um/pgtable-2level.h~rfp-arch-uml 
include/asm-um/pgtable-2level.h
--- linux-2.6.git/include/asm-um/pgtable-2level.h~rfp-arch-uml  2005-08-11 
11:23:21.0 +0200
+++ linux-2.6.git-paolo/include/asm-um/pgtable-2level.h 2005-08-11 
11:23:21.0 +0200
@@ -72,12 +72,19 @@ static inline void set_pte(pte_t *pteptr
((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
 
 /*
- * Bits 0 through 3 are taken
+ * Bits 0 to 5 are taken, split up the 26 bits of offset
+ * into this range:
  */
-#define PTE_FILE_MAX_BITS  28
+#define PTE_FILE_MAX_BITS  26
 
-#define pte_to_pgoff(pte) (pte_val(pte) >> 4)
+#define pte_to_pgoff(pte) (pte_val(pte) >> 6)
+#define pte_to_pgprot(pte) \
+   __pgprot((pte_val(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \
+   | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \
+   (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED)
 
-#define pgoff_to_pte(off) ((pte_t) { ((off) << 4) + _PAGE_FILE })
+#define pgoff_prot_to_pte(off, prot) \
+   ((pte_t) { ((off) << 6) + \
+(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE })
 
 #endif
diff -puN include/asm-um/pgtable-3level.h~rfp-arch-uml 
include/asm-um/pgtable-3level.h
--- linux-2.6.git/include/asm-um/pgtable-3level.h~rfp-arch-uml  2005-08-11 
11:23:21.0 +0200
+++ linux-2.6.git-paolo/include/asm-um/pgtable-3level.h 2005-08-11 
11:23:21.0 +0200
@@ -140,25 +140,36 @@ static inline pmd_t pfn_pmd(pfn_t page_n
 }
 
 /*
- * Bits 0 through 3 are taken in the low part of the pte,
+ * Bits 0 through 5 are taken in the low part of the pte,
  * put the 32 bits of offset into the high part.
  */
 #define PTE_FILE_MAX_BITS  32
 
+
 #ifdef CONFIG_64BIT
 
 #define pte_to_pgoff(p) ((p).pte >> 32)
-
-#define pgoff_to_pte(off) ((pte_t) { ((off) << 32) | _PAGE_FILE })
+#define pgoff_to_pte(off) ((pte_t) { ((off) << 32) | _PAGE_FILE | \
+   (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) })
+#define pte_flags(pte) pte_val(pte)
 
 #else
 
 #define pte_to_pgoff(pte) ((pte).pte_high)
-
-#define pgoff_to_pte(off) ((pte_t) { _PAGE_FILE, (off) })
+#define pgoff_prot_to_pte(off, prot) ((pte_t) { \
+   (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) | _PAGE_FILE, \
+   (off) })
+/* Don't use pte_val below, useless to join the two halves */
+#define pte_flags(pte) ((pte).pte_low)
 
 #endif
 
+#define pte_to_pgprot(pte) \
+   __pgprot((pte_flags(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \
+   | ((pte_flags(pte) & _PAGE_PROTNONE) ? 0 : \
+   (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED)
+#undef pte_flags
+
 #endif
 
 /*
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 22/39] remap file pages protection support: use FAULT_SIGSEGV for protection checking, uml bits

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

This adapts the changes to the i386 handler to the UML one. It isn't enough to
make UML work, however, because UML has some peculiarities. Subsequent patches
fix this.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/um/kernel/trap_kern.c |   32 +
 1 files changed, 27 insertions(+), 5 deletions(-)

diff -puN arch/um/kernel/trap_kern.c~rfp-fault-sigsegv-2-uml 
arch/um/kernel/trap_kern.c
--- linux-2.6.git/arch/um/kernel/trap_kern.c~rfp-fault-sigsegv-2-uml
2005-08-11 23:09:32.0 +0200
+++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c  2005-08-11 
23:09:32.0 +0200
@@ -37,6 +37,7 @@ int handle_page_fault(unsigned long addr
pmd_t *pmd;
pte_t *pte;
int err = -EFAULT;
+   int access_mask = 0;
 
*code_out = SEGV_MAPERR;
down_read(&mm->mmap_sem);
@@ -55,14 +56,15 @@ int handle_page_fault(unsigned long addr
 good_area:
*code_out = SEGV_ACCERR;
if(is_write && !(vma->vm_flags & VM_WRITE)) 
-   goto out;
+   goto prot_bad;
 
 if(!(vma->vm_flags & (VM_READ | VM_EXEC)))
-goto out;
+goto prot_bad;
 
+   access_mask = is_write ? VM_WRITE : 0;
do {
-survive:
-   switch (handle_mm_fault(mm, vma, address, is_write)){
+handle_fault:
+   switch (__handle_mm_fault(mm, vma, address, access_mask)) {
case VM_FAULT_MINOR:
current->min_flt++;
break;
@@ -72,6 +74,9 @@ survive:
case VM_FAULT_SIGBUS:
err = -EACCES;
goto out;
+   case VM_FAULT_SIGSEGV:
+   err = -EFAULT;
+   goto out;
case VM_FAULT_OOM:
err = -ENOMEM;
goto out_of_memory;
@@ -87,10 +92,27 @@ survive:
*pte = pte_mkyoung(*pte);
if(pte_write(*pte)) *pte = pte_mkdirty(*pte);
flush_tlb_page(vma, address);
+
+   /* If the PTE is not present, the vma protection are not accurate if
+* VM_NONUNIFORM; present PTE's are correct for VM_NONUNIFORM and were
+* already handled otherwise. */
 out:
up_read(&mm->mmap_sem);
return(err);
 
+prot_bad:
+   if (unlikely(vma->vm_flags & VM_NONUNIFORM)) {
+   access_mask = is_write ? VM_WRITE : 0;
+   /* Otherwise, on a legitimate read fault on a page mapped as
+* exec-only, we get problems. Probably, we should lower
+* requirements... we should always test just
+* pte_read/write/exec, on vma->vm_page_prot! This way is
+* cumbersome. However, for now things should work for UML. */
+   access_mask |= vma->vm_flags & VM_EXEC ? VM_EXEC : VM_READ;
+   goto handle_fault;
+   }
+   goto out;
+   
 /*
  * We ran out of memory, or some other thing happened to us that made
  * us unable to handle the page fault gracefully.
@@ -100,7 +122,7 @@ out_of_memory:
up_read(&mm->mmap_sem);
yield();
down_read(&mm->mmap_sem);
-   goto survive;
+   goto handle_fault;
}
goto out;
 }
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 17/39] remap_file_pages protection support: safety net for lazy arches

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Since proper support requires that the arch at the very least handles
VM_FAULT_SIGSEGV, as in next patch (otherwise the arch may BUG), and things
are even more complex (see next patches), and it's triggerable only with
VM_NONUNIFORM vma's, simply refuse creating them if the arch doesn't declare
itself ready.

This is a very temporary hack, so I've clearly marked it as such. And, with
current rythms, I've given about 6 months for arches to get ready. Reducing
this time is perfectly ok for me.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/Documentation/feature-removal-schedule.txt |   12 
++
 linux-2.6.git-paolo/include/asm-i386/pgtable.h |3 ++
 linux-2.6.git-paolo/include/asm-um/pgtable.h   |3 ++
 linux-2.6.git-paolo/mm/fremap.c|5 
 4 files changed, 23 insertions(+)

diff -puN mm/fremap.c~rfp-safety-net-for-archs mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-safety-net-for-archs  2005-08-11 
13:46:49.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 13:55:02.0 +0200
@@ -184,6 +184,11 @@ asmlinkage long sys_remap_file_pages(uns
int err = -EINVAL;
int has_write_lock = 0;
 
+   /* Hack for not-updated archs, KILLME after 2.6.16! */
+#ifndef __ARCH_SUPPORTS_VM_NONUNIFORM
+   if (flags & MAP_NOINHERIT)
+   goto out;
+#endif
if (prot && !(flags & MAP_NOINHERIT))
goto out;
/*
diff -puN include/asm-i386/pgtable.h~rfp-safety-net-for-archs 
include/asm-i386/pgtable.h
--- linux-2.6.git/include/asm-i386/pgtable.h~rfp-safety-net-for-archs   
2005-08-11 13:46:49.0 +0200
+++ linux-2.6.git-paolo/include/asm-i386/pgtable.h  2005-08-11 
13:55:02.0 +0200
@@ -419,4 +419,7 @@ extern void noexec_setup(const char *str
 #define __HAVE_ARCH_PTE_SAME
 #include 
 
+/* Hack for not-updated archs, KILLME after 2.6.16! */
+#define __ARCH_SUPPORTS_VM_NONUNIFORM
+
 #endif /* _I386_PGTABLE_H */
diff -puN include/asm-um/pgtable.h~rfp-safety-net-for-archs 
include/asm-um/pgtable.h
--- linux-2.6.git/include/asm-um/pgtable.h~rfp-safety-net-for-archs 
2005-08-11 13:46:49.0 +0200
+++ linux-2.6.git-paolo/include/asm-um/pgtable.h2005-08-11 
13:55:02.0 +0200
@@ -361,6 +361,9 @@ static inline pte_t pte_modify(pte_t pte
 
 #include 
 
+/* Hack for not-updated archs, KILLME after 2.6.16! */
+#define __ARCH_SUPPORTS_VM_NONUNIFORM
+
 #endif
 #endif
 
diff -puN Documentation/feature-removal-schedule.txt~rfp-safety-net-for-archs 
Documentation/feature-removal-schedule.txt
--- 
linux-2.6.git/Documentation/feature-removal-schedule.txt~rfp-safety-net-for-archs
   2005-08-11 14:06:00.0 +0200
+++ linux-2.6.git-paolo/Documentation/feature-removal-schedule.txt  
2005-08-11 14:10:34.0 +0200
@@ -135,3 +135,15 @@ Why:   With the 16-bit PCMCIA subsystem no
pcmciautils package available at
http://kernel.org/pub/linux/utils/kernel/pcmcia/
 Who:   Dominik Brodowski <[EMAIL PROTECTED]>
+
+---
+
+What:  __ARCH_SUPPORTS_VM_NONUNIFORM
+When:  December 2005
+Files: mm/fremap.c, include/asm-*/pgtable.h
+Why:   It's just there to allow arches to update their page fault handlers to
+   support VM_FAULT_SIGSEGV, for remap_file_pages protection support.
+   Since they may BUG if this support is not added, the syscall code
+   refuses this new operation mode unless the arch declares itself as
+   "VM_FAULT_SIGSEGV-aware" with this macro.
+Who:   Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 24/39] remap_file_pages protection support: adapt to uml peculiarities

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Uml is particular in respect with other architectures (and possibly this is to
fix) in the fact that our arch fault handler handles indifferently both TLB
and page faults. In particular, we may get to call handle_mm_fault() when the
PTE is already correct, but simply it's not flushed.

And rfp-fault-sigsegv-2 breaks this, because when getting a fault on a
pte_present PTE and non-uniform VMA, it assumes the fault is due to a
protection fault, and signals the caller a SIGSEGV must be sent.

This isn't the final fix for UML, that's the next one.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/um/kernel/trap_kern.c |   19 +++
 1 files changed, 15 insertions(+), 4 deletions(-)

diff -puN arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-3 
arch/um/kernel/trap_kern.c
--- linux-2.6.git/arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-3  2005-08-11 
23:13:06.0 +0200
+++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c  2005-08-11 
23:14:26.0 +0200
@@ -75,8 +75,21 @@ handle_fault:
err = -EACCES;
goto out;
case VM_FAULT_SIGSEGV:
-   err = -EFAULT;
-   goto out;
+   /* Duplicate this code here. */
+   pgd = pgd_offset(mm, address);
+   pud = pud_offset(pgd, address);
+   pmd = pmd_offset(pud, address);
+   pte = pte_offset_kernel(pmd, address);
+   if (likely (pte_newpage(*pte) || pte_newprot(*pte))) {
+   /* This wasn't done by __handle_mm_fault(), and
+* the page hadn't been flushed. */
+   *pte = pte_mkyoung(*pte);
+   if(pte_write(*pte)) *pte = pte_mkdirty(*pte);
+   break;
+   } else {
+   err = -EFAULT;
+   goto out;
+   }
case VM_FAULT_OOM:
err = -ENOMEM;
goto out_of_memory;
@@ -89,8 +102,6 @@ handle_fault:
pte = pte_offset_kernel(pmd, address);
} while(!pte_present(*pte));
err = 0;
-   *pte = pte_mkyoung(*pte);
-   if(pte_write(*pte)) *pte = pte_mkdirty(*pte);
flush_tlb_page(vma, address);
 
/* If the PTE is not present, the vma protection are not accurate if
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 09/39] remap_file_pages protection support: improvement for UML bits

2005-08-12 Thread blaisorblade


Recover one bit by additionally using _PAGE_NEWPROT. Since I wasn't sure this
would work, I've split this out. We rely on the fact that pte_newprot always
checks first if the PTE is marked present.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-um/pgtable-2level.h |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff -puN include/asm-um/pgtable-2level.h~rfp-arch-uml-improv 
include/asm-um/pgtable-2level.h
--- linux-2.6.git/include/asm-um/pgtable-2level.h~rfp-arch-uml-improv   
2005-08-07 19:09:34.0 +0200
+++ linux-2.6.git-paolo/include/asm-um/pgtable-2level.h 2005-08-07 
19:09:34.0 +0200
@@ -72,19 +72,19 @@ static inline void set_pte(pte_t *pteptr
((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
 
 /*
- * Bits 0 to 5 are taken, split up the 26 bits of offset
+ * Bits 0, 1, 3 to 5 are taken, split up the 27 bits of offset
  * into this range:
  */
-#define PTE_FILE_MAX_BITS  26
+#define PTE_FILE_MAX_BITS  27
 
-#define pte_to_pgoff(pte) (pte_val(pte) >> 6)
+#define pte_to_pgoff(pte) (((pte_val(pte) >> 6) << 1) | ((pte_val(pte) >> 2) & 
0x1))
 #define pte_to_pgprot(pte) \
__pgprot((pte_val(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \
| ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \
(_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED)
 
 #define pgoff_prot_to_pte(off, prot) \
-   ((pte_t) { ((off) << 6) + \
-(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE })
+   __pteoff) >> 1) << 6) + (((off) & 0x1) << 2) + \
+(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE)
 
 #endif
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 26/39] remap_file_pages protection support: ppc32 bits

2005-08-12 Thread blaisorblade


From: Ingo Molnar <[EMAIL PROTECTED]>

PPC32 bits of RFP - as in original patch.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-ppc/pgtable.h |   15 +++
 1 files changed, 11 insertions(+), 4 deletions(-)

diff -puN include/asm-ppc/pgtable.h~rfp-arch-ppc include/asm-ppc/pgtable.h
--- linux-2.6.git/include/asm-ppc/pgtable.h~rfp-arch-ppc2005-08-12 
18:18:43.0 +0200
+++ linux-2.6.git-paolo/include/asm-ppc/pgtable.h   2005-08-12 
18:39:57.0 +0200
@@ -309,8 +309,8 @@ extern unsigned long ioremap_bot, iorema
 /* Definitions for 60x, 740/750, etc. */
 #define _PAGE_PRESENT  0x001   /* software: pte contains a translation */
 #define _PAGE_HASHPTE  0x002   /* hash_page has made an HPTE for this pte */
-#define _PAGE_FILE 0x004   /* when !present: nonlinear file mapping */
 #define _PAGE_USER 0x004   /* usermode access allowed */
+#define _PAGE_FILE 0x008   /* when !present: nonlinear file mapping */
 #define _PAGE_GUARDED  0x008   /* G: prohibit speculative access */
 #define _PAGE_COHERENT 0x010   /* M: enforce memory coherence (SMP systems) */
 #define _PAGE_NO_CACHE 0x020   /* I: cache inhibit */
@@ -728,9 +728,16 @@ extern void paging_init(void);
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val << 3 })
 
 /* Encode and decode a nonlinear file mapping entry */
-#define PTE_FILE_MAX_BITS  29
-#define pte_to_pgoff(pte)  (pte_val(pte) >> 3)
-#define pgoff_to_pte(off)  ((pte_t) { ((off) << 3) | _PAGE_FILE })
+#define PTE_FILE_MAX_BITS  27
+#define pte_to_pgoff(pte)  (((pte_val(pte) & ~0x7ff) >> 5) \
+| ((pte_val(pte) & 0x3f0) >> 4))
+#define pte_to_pgprot(pte) \
+__pgprot((pte_val(pte) & (_PAGE_USER|_PAGE_RW|_PAGE_PRESENT)) | _PAGE_ACCESSED)
+
+#define pgoff_prot_to_pte(off, prot)   \
+   ((pte_t) { (((off) << 5) & ~0x7ff) | (((off) << 4) & 0x3f0) \
+  | (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) \
+  | _PAGE_FILE })
 
 /* CONFIG_APUS */
 /* For virtual address to physical address conversion */
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 19/39] remap file pages protection support: use FAULT_SIGSEGV for protection checking

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

The arch handler used to check itself protection, now we must possibly move
that to the generic VM if the VMA is non-uniform.

For now, do_file_page installs the PTE and doesn't check the fault type, if it
was wrong, then it'll do another fault and die only then. I've left this for
now to exercise more the code (and it works anyway).

I've also changed do_no_pages to fault in pages with their *exact* permissions
for non-uniform VMAs.

The approach for checking is a bit clumsy because we are given a
VM_{READ,WRITE,EXEC} mask so we do *strict* checking. For instance, a VM_EXEC
mapping (which won't have VM_READ in vma->vm_flags) would give a fault on
read.

To fix that properly, we should get a pgprot mask and test
pte_read()/write/exec; for now I workaround that in the i386/UML handler, I
have patches for fixing that subsequently.

Also, there is a (potential) problem: on VM_NONUNIFORM vmas, in
handle_pte_fault(), if the PTE is present we return VM_FAULT_SIGSEGV. This has
proven to be a bit strict, at least for UML - so this may break other arches
too (only for new functionality). At least, peculiar ones - this problem was
due to handle_mm_fault() called for TLB faults rather than PTE faults.

Another problem I've just discovered is that PTRACE_POKETEXT access_process_vm
on VM_NONUNIFORM write-protected vma's won't work. That's not a big problem.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/i386/mm/fault.c |   28 +++--
 linux-2.6.git-paolo/include/linux/mm.h   |   11 +++
 linux-2.6.git-paolo/mm/memory.c  |   96 ---
 3 files changed, 108 insertions(+), 27 deletions(-)

diff -puN arch/i386/mm/fault.c~rfp-fault-sigsegv-2 arch/i386/mm/fault.c
--- linux-2.6.git/arch/i386/mm/fault.c~rfp-fault-sigsegv-2  2005-08-11 
14:21:01.0 +0200
+++ linux-2.6.git-paolo/arch/i386/mm/fault.c2005-08-11 16:12:46.0 
+0200
@@ -219,6 +219,7 @@ fastcall void do_page_fault(struct pt_re
unsigned long address;
unsigned long page;
int write;
+   int access_mask = 0;
siginfo_t info;
 
/* get the address */
@@ -324,23 +325,24 @@ good_area:
/* fall through */
case 2: /* write, not present */
if (!(vma->vm_flags & VM_WRITE))
-   goto bad_area;
+   goto bad_area_prot;
write++;
break;
-   case 1: /* read, present */
+   case 1: /* read, present - when does this happen? Maybe 
for NX exceptions? */
goto bad_area;
case 0: /* read, not present */
if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
-   goto bad_area;
+   goto bad_area_prot;
}
 
- survive:
+   access_mask = write ? VM_WRITE : 0;
+handle_fault:
/*
 * If for any reason at all we couldn't handle the fault,
 * make sure we exit gracefully rather than endlessly redo
 * the fault.
 */
-   switch (handle_mm_fault(mm, vma, address, write)) {
+   switch (__handle_mm_fault(mm, vma, address, access_mask)) {
case VM_FAULT_MINOR:
tsk->min_flt++;
break;
@@ -368,6 +370,20 @@ good_area:
up_read(&mm->mmap_sem);
return;
 
+   /* If the PTE is not present, the vma protection are not accurate if
+* VM_NONUNIFORM; present PTE's are correct for VM_NONUNIFORM and were
+* already handled otherwise. */
+bad_area_prot:
+   if (unlikely(vma->vm_flags & VM_NONUNIFORM)) {
+   access_mask = write ? VM_WRITE : 0;
+   /* Otherwise, on a legitimate read fault on a page mapped as
+* exec-only, we get problems. Probably, we should lower
+* requirements... we should always test just
+* pte_read/write/exec, on vma->vm_page_prot! This way is
+* cumbersome. However, for now things should work for i386. */
+   access_mask |= vma->vm_flags & VM_EXEC ? VM_EXEC : VM_READ;
+   goto handle_fault;
+   }
 /*
  * Something tried to access memory that isn't in our memory map..
  * Fix it, but check if it's kernel or user first..
@@ -481,7 +497,7 @@ out_of_memory:
if (tsk->pid == 1) {
yield();
down_read(&mm->mmap_sem);
-   goto survive;
+   goto handle_fault;
}
printk("VM: killing process %s\n", tsk->comm);
if (error_code & 4)
diff -puN mm/memory.c~rfp-fault-

[patch 1/1] uml: fixes performance regression in activate_mm and thus exec()

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
CC: Benjamin LaHaise <[EMAIL PROTECTED]>

Normally, activate_mm() is called from exec(), and thus it used to be a no-op
because we use a completely new "MM context" on the host (for instance, a new
process), and so we didn't need to flush any "TLB entries" (which for us are
the set of memory mappings for the host process from the virtual "RAM" file).

Kernel threads, instead, are usually handled in a different way. So, when for
AIO we call use_mm(), things used to break and so Benjamin implemented
activate_mm(). However, that is only needed for AIO, and could slow down
exec() inside UML, so be smart: detect being called for AIO (via PF_BORROWED_MM)
and do the full flush only in that situation.

Comment also the caller so that people won't go breaking UML without noticing.
I also rely on the caller's locks for testing current->flags.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/fs/aio.c |2 ++
 linux-2.6.git-paolo/include/asm-um/mmu_context.h |8 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff -puN include/asm-um/mmu_context.h~uml-optimize-activate-mm 
include/asm-um/mmu_context.h
--- linux-2.6.git/include/asm-um/mmu_context.h~uml-optimize-activate-mm 
2005-08-06 12:53:30.141344264 +0200
+++ linux-2.6.git-paolo/include/asm-um/mmu_context.h2005-08-06 
12:58:49.682766584 +0200
@@ -20,7 +20,13 @@ extern void force_flush_all(void);
 
 static inline void activate_mm(struct mm_struct *old, struct mm_struct *new)
 {
-   if (old != new)
+   /* This is called by fs/exec.c and fs/aio.c. In the first case, for an
+* exec, we don't need to do anything as we're called from userspace
+* and thus going to use a new host PID. In the second, we're called
+* from a kernel thread, and thus need to go doing the mmap's on the
+* host. Since they're very expensive, we want to avoid that as far as
+* possible. */
+   if (old != new && (current->flags & PF_BORROWED_MM))
force_flush_all();
 }
 
diff -puN fs/aio.c~uml-optimize-activate-mm fs/aio.c
--- linux-2.6.git/fs/aio.c~uml-optimize-activate-mm 2005-08-06 
12:59:14.393010056 +0200
+++ linux-2.6.git-paolo/fs/aio.c2005-08-06 13:03:07.163623544 +0200
@@ -567,6 +567,8 @@ static void use_mm(struct mm_struct *mm)
atomic_inc(&mm->mm_count);
tsk->mm = mm;
tsk->active_mm = mm;
+   /* Note that on UML this *requires* PF_BORROWED_MM to be set, otherwise
+* it won't work. Update it accordingly if you change it here. */
activate_mm(active_mm, mm);
task_unlock(tsk);
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 03/39] add swap cache mapping comment

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Add some more comments about page->mapping and swapper_space, explaining their
(historical and current) relationship. Such material can be extracted from the
old GIT history (which I used for reference), but having it in the source is
more useful.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/./mm/swap_state.c |5 +
 1 files changed, 5 insertions(+)

diff -puN ./mm/swap_state.c~swap-cache-mapping-comment ./mm/swap_state.c
--- linux-2.6.git/./mm/swap_state.c~swap-cache-mapping-comment  2005-08-11 
11:12:57.0 +0200
+++ linux-2.6.git-paolo/./mm/swap_state.c   2005-08-11 11:12:57.0 
+0200
@@ -21,6 +21,11 @@
  * swapper_space is a fiction, retained to simplify the path through
  * vmscan's shrink_list, to make sync_page look nicer, and to allow
  * future use of radix_tree tags in the swap cache.
+ *
+ * In 2.4 and until 2.6.6 pages in the swap cache also had page->mapping ==
+ * &swapper_space (this was the definition of PageSwapCache), but this is no
+ * more true. Instead, we use page->flags for that, and page->mapping is
+ * *ignored* here. However, also take a look at page_mapping().
  */
 static struct address_space_operations swap_aops = {
.writepage  = swap_writepage,
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 15/39] remap_file_pages protection support: add VM_NONUNIFORM to fix existing usage of mprotect()

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Distinguish between "normal" VMA and VMA with non-uniform protection. This
will be also useful for fault handling (we must ignore VM_{READ,WRITE,EXEC} in
the arch fault handler).

As said before, with remap-file-pages-prot, we must punt on private VMA even
when we're just changing protections.

Also, with the remap_file_pages protection support, we have indeed a
regression with remap_file_pages VS mprotect. mprotect alters the VMA
protections and walks each installed PTE.

Mprotect'ing a nonlinear VMA used to work, obviously, but now doesn't, because
we must now read the protections from the PTE which haven't been updated; so,
to avoid changing behaviour for old binaries, on uniform VMA's we ignore
protections in the PTE, like we did before.

On non-uniform VMA's, instead, mprotect is currently broken, however we've
never supported it so this is acceptable.

What it does is to split the VMA if needed, assign the new protection to the
VMA and enforce the new protections on all present pages, ignoring all absent
ones (including pte_file() ones), which will keep the current protections. So,
the application has no reliable way to know which pages would actually be
remapped.

What is more, there is IMHO no reason to support using mprotect on non-uniform
VMAs. The only exception is to change the VMA's default protection (which is
used for non-individually remapped pages), but it should still ignore the page
tables.

The only need for that is if I want to change protections without changing the
indexes, which with remap_file_pages you must do one page at a time and
re-specifying the indexes.

It is more reasonable to allow remap_file_pages to change protections on a PTE
range without changing the offsets. I've not implemented this, but if wanted I
can. For sure, UML doesn't need this interface.

However, for now I've implemented no change to mprotect(), I'd like to get
some feedback before about which way to go.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/linux/mm.h |7 +++
 linux-2.6.git-paolo/mm/fremap.c|   13 +
 linux-2.6.git-paolo/mm/memory.c|2 +-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff -puN mm/fremap.c~rfp-add-VM_NONUNIFORM mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-add-VM_NONUNIFORM 2005-08-11 
23:03:51.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:03:51.0 +0200
@@ -252,6 +252,19 @@ retry:
spin_unlock(&mapping->i_mmap_lock);
}
}
+   if (pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot)) {
+   if (!(vma->vm_flags & VM_SHARED))
+   goto out_unlock;
+   if (!(vma->vm_flags & VM_NONUNIFORM)) {
+   if (!has_write_lock) {
+   up_read(&mm->mmap_sem);
+   down_write(&mm->mmap_sem);
+   has_write_lock = 1;
+   goto retry;
+   }
+   vma->vm_flags |= VM_NONUNIFORM;
+   }
+   }
 
err = vma->vm_ops->populate(vma, start, size, pgprot, pgoff,
flags & MAP_NONBLOCK);
diff -puN include/linux/mm.h~rfp-add-VM_NONUNIFORM include/linux/mm.h
--- linux-2.6.git/include/linux/mm.h~rfp-add-VM_NONUNIFORM  2005-08-11 
23:03:51.0 +0200
+++ linux-2.6.git-paolo/include/linux/mm.h  2005-08-11 23:03:51.0 
+0200
@@ -160,7 +160,14 @@ extern unsigned int kobjsize(const void 
 #define VM_ACCOUNT 0x0010  /* Is a VM accounted object */
 #define VM_HUGETLB 0x0040  /* Huge TLB Page VM */
 #define VM_NONLINEAR   0x0080  /* Is non-linear (remap_file_pages) */
+
+#ifndef CONFIG_MMU
 #define VM_MAPPED_COPY 0x0100  /* T if mapped copy of data (nommu 
mmap) */
+#else
+#define VM_NONUNIFORM  0x0100  /* The VM individual pages have
+  different protections
+  (remap_file_pages)*/
+#endif
 
 #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
diff -puN mm/memory.c~rfp-add-VM_NONUNIFORM mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-add-VM_NONUNIFORM 2005-08-11 
23:03:51.0 +0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-11 23:03:51.0 +0200
@@ -1941,7 +1941,7 @@ static int do_file_page(struct mm_struct
}
 
pgoff = pte_to_pgoff(*pte);
-   pgprot = pte_to_pgprot(*pte);
+   pgpr

[patch 14/39] remap_file_pages protection support: assume VM_SHARED never disappears

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Assume that even after dropping and reacquiring the lock, (vma->vm_flags &
VM_SHARED) won't change, thus moving a check earlier.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |   12 ++--
 1 files changed, 2 insertions(+), 10 deletions(-)

diff -puN mm/fremap.c~rfp-assume-VM_PRIVATE-stays mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-assume-VM_PRIVATE-stays   2005-08-11 
12:58:07.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 13:38:56.0 +0200
@@ -232,6 +232,8 @@ retry:
 
/* Must set VM_NONLINEAR before any pages are populated. */
if (pgoff != linear_page_index(vma, start)) {
+   if (!(vma->vm_flags & VM_SHARED))
+   goto out_unlock;
if (!(vma->vm_flags & VM_NONLINEAR)) {
if (!has_write_lock) {
up_read(&mm->mmap_sem);
@@ -239,12 +241,6 @@ retry:
has_write_lock = 1;
goto retry;
}
-   /* XXX: we check VM_SHARED after re-getting the
-* (write) semaphore but I guess that we could
-* check it earlier as we're not allowed to turn
-* a VM_PRIVATE vma into a VM_SHARED one! */
-   if (!(vma->vm_flags & VM_SHARED))
-   goto out_unlock;
 
mapping = vma->vm_file->f_mapping;
spin_lock(&mapping->i_mmap_lock);
@@ -254,10 +250,6 @@ retry:
vma_nonlinear_insert(vma, 
&mapping->i_mmap_nonlinear);
flush_dcache_mmap_unlock(mapping);
spin_unlock(&mapping->i_mmap_lock);
-   } else {
-   /* Won't drop the lock, check it here.*/
-   if (!(vma->vm_flags & VM_SHARED))
-   goto out_unlock;
}
}
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 06/39] correct _PAGE_FILE comment

2005-08-12 Thread blaisorblade


_PAGE_FILE does not indicate whether a file is in page / swap cache, it is set
just for non-linear PTE's. Correct the comment for i386, x86_64, UML. Also
clearify _PAGE_NONE.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-i386/pgtable.h   |   10 +-
 linux-2.6.git-paolo/include/asm-um/pgtable.h |8 +---
 linux-2.6.git-paolo/include/asm-x86_64/pgtable.h |2 +-
 3 files changed, 11 insertions(+), 9 deletions(-)

diff -puN include/asm-i386/pgtable.h~correct-_PAGE_FILE-comment 
include/asm-i386/pgtable.h
--- linux-2.6.git/include/asm-i386/pgtable.h~correct-_PAGE_FILE-comment 
2005-08-11 11:17:04.0 +0200
+++ linux-2.6.git-paolo/include/asm-i386/pgtable.h  2005-08-11 
11:17:04.0 +0200
@@ -86,9 +86,7 @@ void paging_init(void);
 #endif
 
 /*
- * The 4MB page is guessing..  Detailed in the infamous "Chapter H"
- * of the Pentium details, but assuming intel did the straightforward
- * thing, this bit set in the page directory entry just means that
+ * _PAGE_PSE set in the page directory entry just means that
  * the page directory entry points directly to a 4MB-aligned block of
  * memory. 
  */
@@ -119,8 +117,10 @@ void paging_init(void);
 #define _PAGE_UNUSED2  0x400
 #define _PAGE_UNUSED3  0x800
 
-#define _PAGE_FILE 0x040   /* set:pagecache unset:swap */
-#define _PAGE_PROTNONE 0x080   /* If not present */
+/* If _PAGE_PRESENT is clear, we use these: */
+#define _PAGE_FILE 0x040   /* nonlinear file mapping, saved PTE; 
unset:swap */
+#define _PAGE_PROTNONE 0x080   /* if the user mapped it with PROT_NONE;
+  pte_present gives true */
 #ifdef CONFIG_X86_PAE
 #define _PAGE_NX   (1ULL<<_PAGE_BIT_NX)
 #else
diff -puN include/asm-x86_64/pgtable.h~correct-_PAGE_FILE-comment 
include/asm-x86_64/pgtable.h
--- linux-2.6.git/include/asm-x86_64/pgtable.h~correct-_PAGE_FILE-comment   
2005-08-11 11:17:04.0 +0200
+++ linux-2.6.git-paolo/include/asm-x86_64/pgtable.h2005-08-11 
11:17:04.0 +0200
@@ -143,7 +143,7 @@ extern inline void pgd_clear (pgd_t * pg
 #define _PAGE_ACCESSED 0x020
 #define _PAGE_DIRTY0x040
 #define _PAGE_PSE  0x080   /* 2MB page */
-#define _PAGE_FILE 0x040   /* set:pagecache, unset:swap */
+#define _PAGE_FILE 0x040   /* nonlinear file mapping, saved PTE; 
unset:swap */
 #define _PAGE_GLOBAL   0x100   /* Global TLB entry */
 
 #define _PAGE_PROTNONE 0x080   /* If not present */
diff -puN include/asm-um/pgtable.h~correct-_PAGE_FILE-comment 
include/asm-um/pgtable.h
--- linux-2.6.git/include/asm-um/pgtable.h~correct-_PAGE_FILE-comment   
2005-08-11 11:17:04.0 +0200
+++ linux-2.6.git-paolo/include/asm-um/pgtable.h2005-08-11 
11:17:04.0 +0200
@@ -16,13 +16,15 @@
 
 #define _PAGE_PRESENT  0x001
 #define _PAGE_NEWPAGE  0x002
-#define _PAGE_NEWPROT   0x004
-#define _PAGE_FILE 0x008   /* set:pagecache unset:swap */
-#define _PAGE_PROTNONE 0x010   /* If not present */
+#define _PAGE_NEWPROT  0x004
 #define _PAGE_RW   0x020
 #define _PAGE_USER 0x040
 #define _PAGE_ACCESSED 0x080
 #define _PAGE_DIRTY0x100
+/* If _PAGE_PRESENT is clear, we use these: */
+#define _PAGE_FILE 0x008   /* nonlinear file mapping, saved PTE; 
unset:swap */
+#define _PAGE_PROTNONE 0x010   /* if the user mapped it with PROT_NONE;
+  pte_present gives true */
 
 #ifdef CONFIG_3_LEVEL_PGTABLES
 #include "asm/pgtable-3level.h"
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 17/39] remap_file_pages protection support: safety net for lazy arches

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Since proper support requires that the arch at the very least handles
VM_FAULT_SIGSEGV, as in next patch (otherwise the arch may BUG), and things
are even more complex (see next patches), and it's triggerable only with
VM_NONUNIFORM vma's, simply refuse creating them if the arch doesn't declare
itself ready.

This is a very temporary hack, so I've clearly marked it as such. And, with
current rythms, I've given about 6 months for arches to get ready. Reducing
this time is perfectly ok for me.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/Documentation/feature-removal-schedule.txt |   12 
++
 linux-2.6.git-paolo/include/asm-i386/pgtable.h |3 ++
 linux-2.6.git-paolo/include/asm-um/pgtable.h   |3 ++
 linux-2.6.git-paolo/mm/fremap.c|5 
 4 files changed, 23 insertions(+)

diff -puN mm/fremap.c~rfp-safety-net-for-archs mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-safety-net-for-archs  2005-08-11 
13:46:49.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 13:55:02.0 +0200
@@ -184,6 +184,11 @@ asmlinkage long sys_remap_file_pages(uns
int err = -EINVAL;
int has_write_lock = 0;
 
+   /* Hack for not-updated archs, KILLME after 2.6.16! */
+#ifndef __ARCH_SUPPORTS_VM_NONUNIFORM
+   if (flags & MAP_NOINHERIT)
+   goto out;
+#endif
if (prot && !(flags & MAP_NOINHERIT))
goto out;
/*
diff -puN include/asm-i386/pgtable.h~rfp-safety-net-for-archs 
include/asm-i386/pgtable.h
--- linux-2.6.git/include/asm-i386/pgtable.h~rfp-safety-net-for-archs   
2005-08-11 13:46:49.0 +0200
+++ linux-2.6.git-paolo/include/asm-i386/pgtable.h  2005-08-11 
13:55:02.0 +0200
@@ -419,4 +419,7 @@ extern void noexec_setup(const char *str
 #define __HAVE_ARCH_PTE_SAME
 #include 
 
+/* Hack for not-updated archs, KILLME after 2.6.16! */
+#define __ARCH_SUPPORTS_VM_NONUNIFORM
+
 #endif /* _I386_PGTABLE_H */
diff -puN include/asm-um/pgtable.h~rfp-safety-net-for-archs 
include/asm-um/pgtable.h
--- linux-2.6.git/include/asm-um/pgtable.h~rfp-safety-net-for-archs 
2005-08-11 13:46:49.0 +0200
+++ linux-2.6.git-paolo/include/asm-um/pgtable.h2005-08-11 
13:55:02.0 +0200
@@ -361,6 +361,9 @@ static inline pte_t pte_modify(pte_t pte
 
 #include 
 
+/* Hack for not-updated archs, KILLME after 2.6.16! */
+#define __ARCH_SUPPORTS_VM_NONUNIFORM
+
 #endif
 #endif
 
diff -puN Documentation/feature-removal-schedule.txt~rfp-safety-net-for-archs 
Documentation/feature-removal-schedule.txt
--- 
linux-2.6.git/Documentation/feature-removal-schedule.txt~rfp-safety-net-for-archs
   2005-08-11 14:06:00.0 +0200
+++ linux-2.6.git-paolo/Documentation/feature-removal-schedule.txt  
2005-08-11 14:10:34.0 +0200
@@ -135,3 +135,15 @@ Why:   With the 16-bit PCMCIA subsystem no
pcmciautils package available at
http://kernel.org/pub/linux/utils/kernel/pcmcia/
 Who:   Dominik Brodowski <[EMAIL PROTECTED]>
+
+---
+
+What:  __ARCH_SUPPORTS_VM_NONUNIFORM
+When:  December 2005
+Files: mm/fremap.c, include/asm-*/pgtable.h
+Why:   It's just there to allow arches to update their page fault handlers to
+   support VM_FAULT_SIGSEGV, for remap_file_pages protection support.
+   Since they may BUG if this support is not added, the syscall code
+   refuses this new operation mode unless the arch declares itself as
+   "VM_FAULT_SIGSEGV-aware" with this macro.
+Who:   Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 11/39] remap_file_pages protection support: add MAP_NOINHERIT flag

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Add the MAP_NOINHERIT flag to arch headers, for use with remap-file-pages.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-i386/mman.h   |1 +
 linux-2.6.git-paolo/include/asm-ia64/mman.h   |1 +
 linux-2.6.git-paolo/include/asm-ppc/mman.h|1 +
 linux-2.6.git-paolo/include/asm-ppc64/mman.h  |1 +
 linux-2.6.git-paolo/include/asm-s390/mman.h   |1 +
 linux-2.6.git-paolo/include/asm-x86_64/mman.h |1 +
 6 files changed, 6 insertions(+)

diff -puN include/asm-i386/mman.h~rfp-map-noinherit include/asm-i386/mman.h
--- linux-2.6.git/include/asm-i386/mman.h~rfp-map-noinherit 2005-08-11 
12:06:40.0 +0200
+++ linux-2.6.git-paolo/include/asm-i386/mman.h 2005-08-11 12:06:40.0 
+0200
@@ -22,6 +22,7 @@
 #define MAP_NORESERVE  0x4000  /* don't check for reservations */
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
+#define MAP_NOINHERIT  0x2 /* don't inherit the protection bits of 
the underlying vma*/
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_INVALIDATE  2   /* invalidate the caches */
diff -puN include/asm-ia64/mman.h~rfp-map-noinherit include/asm-ia64/mman.h
--- linux-2.6.git/include/asm-ia64/mman.h~rfp-map-noinherit 2005-08-11 
12:06:40.0 +0200
+++ linux-2.6.git-paolo/include/asm-ia64/mman.h 2005-08-11 12:06:40.0 
+0200
@@ -30,6 +30,7 @@
 #define MAP_NORESERVE  0x04000 /* don't check for reservations */
 #define MAP_POPULATE   0x08000 /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
+#define MAP_NOINHERIT  0x2 /* don't inherit the protection bits of 
the underlying vma*/
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_INVALIDATE  2   /* invalidate the caches */
diff -puN include/asm-ppc64/mman.h~rfp-map-noinherit include/asm-ppc64/mman.h
--- linux-2.6.git/include/asm-ppc64/mman.h~rfp-map-noinherit2005-08-11 
12:06:40.0 +0200
+++ linux-2.6.git-paolo/include/asm-ppc64/mman.h2005-08-11 
12:06:40.0 +0200
@@ -38,6 +38,7 @@
 
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
+#define MAP_NOINHERIT  0x2 /* don't inherit the protection bits of 
the underlying vma*/
 
 #define MADV_NORMAL0x0 /* default page-in behavior */
 #define MADV_RANDOM0x1 /* page-in minimum required */
diff -puN include/asm-ppc/mman.h~rfp-map-noinherit include/asm-ppc/mman.h
--- linux-2.6.git/include/asm-ppc/mman.h~rfp-map-noinherit  2005-08-11 
12:06:40.0 +0200
+++ linux-2.6.git-paolo/include/asm-ppc/mman.h  2005-08-11 12:06:40.0 
+0200
@@ -23,6 +23,7 @@
 #define MAP_EXECUTABLE 0x1000  /* mark it as an executable */
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
+#define MAP_NOINHERIT  0x2 /* don't inherit the protection bits of 
the underlying vma*/
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_INVALIDATE  2   /* invalidate the caches */
diff -puN include/asm-s390/mman.h~rfp-map-noinherit include/asm-s390/mman.h
--- linux-2.6.git/include/asm-s390/mman.h~rfp-map-noinherit 2005-08-11 
12:06:40.0 +0200
+++ linux-2.6.git-paolo/include/asm-s390/mman.h 2005-08-11 12:06:40.0 
+0200
@@ -30,6 +30,7 @@
 #define MAP_NORESERVE  0x4000  /* don't check for reservations */
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
+#define MAP_NOINHERIT  0x2 /* don't inherit the protection bits of 
the underlying vma*/
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_INVALIDATE  2   /* invalidate the caches */
diff -puN include/asm-x86_64/mman.h~rfp-map-noinherit include/asm-x86_64/mman.h
--- linux-2.6.git/include/asm-x86_64/mman.h~rfp-map-noinherit   2005-08-11 
12:06:40.0 +0200
+++ linux-2.6.git-paolo/include/asm-x86_64/mman.h   2005-08-11 
12:06:40.0 +0200
@@ -23,6 +23,7 @@
 #define MAP_NORESERVE  0x4000  /* don't check for reservations */
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
+#define MAP_NOINHERIT  0x2 /* don't inherit the protection bits of 
the underlying vma*/
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define

[patch 10/39] remap_file_pages protection support: i386 and x86-64 bits

2005-08-12 Thread blaisorblade


Update pte encoding macros for i386 and x86-64.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-i386/pgtable-2level.h |   15 ++-
 linux-2.6.git-paolo/include/asm-i386/pgtable-3level.h |   11 ++-
 linux-2.6.git-paolo/include/asm-x86_64/pgtable.h  |   12 +++-
 3 files changed, 31 insertions(+), 7 deletions(-)

diff -puN include/asm-i386/pgtable-2level.h~rfp-arch-i386-x86_64 
include/asm-i386/pgtable-2level.h
--- linux-2.6.git/include/asm-i386/pgtable-2level.h~rfp-arch-i386-x86_64
2005-08-11 11:42:28.0 +0200
+++ linux-2.6.git-paolo/include/asm-i386/pgtable-2level.h   2005-08-11 
11:42:28.0 +0200
@@ -48,16 +48,21 @@ static inline int pte_exec_kernel(pte_t 
 }
 
 /*
- * Bits 0, 6 and 7 are taken, split up the 29 bits of offset
+ * Bits 0, 1, 6 and 7 are taken, split up the 28 bits of offset
  * into this range:
  */
-#define PTE_FILE_MAX_BITS  29
+#define PTE_FILE_MAX_BITS  28
 
 #define pte_to_pgoff(pte) \
-   pte).pte_low >> 1) & 0x1f ) + (((pte).pte_low >> 8) << 5 ))
+   pte).pte_low >> 2) & 0xf ) + (((pte).pte_low >> 8) << 4 ))
+#define pte_to_pgprot(pte) \
+   __pgprot(((pte).pte_low & (_PAGE_RW | _PAGE_PROTNONE)) \
+   | (((pte).pte_low & _PAGE_PROTNONE) ? 0 : \
+   (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED)
 
-#define pgoff_to_pte(off) \
-   ((pte_t) { (((off) & 0x1f) << 1) + (((off) >> 5) << 8) + _PAGE_FILE })
+#define pgoff_prot_to_pte(off, prot) \
+   ((pte_t) { (((off) & 0xf) << 2) + (((off) >> 4) << 8) + \
+(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE })
 
 /* Encode and de-code a swap entry */
 #define __swp_type(x)  (((x).val >> 1) & 0x1f)
diff -puN include/asm-i386/pgtable-3level.h~rfp-arch-i386-x86_64 
include/asm-i386/pgtable-3level.h
--- linux-2.6.git/include/asm-i386/pgtable-3level.h~rfp-arch-i386-x86_64
2005-08-11 11:42:28.0 +0200
+++ linux-2.6.git-paolo/include/asm-i386/pgtable-3level.h   2005-08-11 
11:42:28.0 +0200
@@ -145,7 +145,16 @@ static inline pmd_t pfn_pmd(unsigned lon
  * put the 32 bits of offset into the high part.
  */
 #define pte_to_pgoff(pte) ((pte).pte_high)
-#define pgoff_to_pte(off) ((pte_t) { _PAGE_FILE, (off) })
+
+#define pte_to_pgprot(pte) \
+   __pgprot(((pte).pte_low & (_PAGE_RW | _PAGE_PROTNONE)) \
+   | (((pte).pte_low & _PAGE_PROTNONE) ? 0 : \
+   (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED)
+
+#define pgoff_prot_to_pte(off, prot) \
+   ((pte_t) { _PAGE_FILE + \
+   (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) , (off) })
+
 #define PTE_FILE_MAX_BITS   32
 
 /* Encode and de-code a swap entry */
diff -puN include/asm-x86_64/pgtable.h~rfp-arch-i386-x86_64 
include/asm-x86_64/pgtable.h
--- linux-2.6.git/include/asm-x86_64/pgtable.h~rfp-arch-i386-x86_64 
2005-08-11 11:42:28.0 +0200
+++ linux-2.6.git-paolo/include/asm-x86_64/pgtable.h2005-08-11 
11:42:28.0 +0200
@@ -343,9 +343,19 @@ static inline pud_t *__pud_offset_k(pud_
 #define pmd_pfn(x)  ((pmd_val(x) >> PAGE_SHIFT) & __PHYSICAL_MASK)
 
 #define pte_to_pgoff(pte) ((pte_val(pte) & PHYSICAL_PAGE_MASK) >> PAGE_SHIFT)
-#define pgoff_to_pte(off) ((pte_t) { ((off) << PAGE_SHIFT) | _PAGE_FILE })
 #define PTE_FILE_MAX_BITS __PHYSICAL_MASK_SHIFT
 
+#define pte_to_pgprot(pte) \
+   __pgprot((pte_val(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \
+   | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \
+   (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED)
+
+#define pgoff_prot_to_pte(off, prot) \
+   ((pte_t) { _PAGE_FILE + \
+   (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + \
+   ((off) << PAGE_SHIFT) })
+
+
 /* PTE - Level 1 access. */
 
 /* page, protection -> pte */
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 19/39] remap file pages protection support: use FAULT_SIGSEGV for protection checking

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

The arch handler used to check itself protection, now we must possibly move
that to the generic VM if the VMA is non-uniform.

For now, do_file_page installs the PTE and doesn't check the fault type, if it
was wrong, then it'll do another fault and die only then. I've left this for
now to exercise more the code (and it works anyway).

I've also changed do_no_pages to fault in pages with their *exact* permissions
for non-uniform VMAs.

The approach for checking is a bit clumsy because we are given a
VM_{READ,WRITE,EXEC} mask so we do *strict* checking. For instance, a VM_EXEC
mapping (which won't have VM_READ in vma->vm_flags) would give a fault on
read.

To fix that properly, we should get a pgprot mask and test
pte_read()/write/exec; for now I workaround that in the i386/UML handler, I
have patches for fixing that subsequently.

Also, there is a (potential) problem: on VM_NONUNIFORM vmas, in
handle_pte_fault(), if the PTE is present we return VM_FAULT_SIGSEGV. This has
proven to be a bit strict, at least for UML - so this may break other arches
too (only for new functionality). At least, peculiar ones - this problem was
due to handle_mm_fault() called for TLB faults rather than PTE faults.

Another problem I've just discovered is that PTRACE_POKETEXT access_process_vm
on VM_NONUNIFORM write-protected vma's won't work. That's not a big problem.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/i386/mm/fault.c |   28 +++--
 linux-2.6.git-paolo/include/linux/mm.h   |   11 +++
 linux-2.6.git-paolo/mm/memory.c  |   96 ---
 3 files changed, 108 insertions(+), 27 deletions(-)

diff -puN arch/i386/mm/fault.c~rfp-fault-sigsegv-2 arch/i386/mm/fault.c
--- linux-2.6.git/arch/i386/mm/fault.c~rfp-fault-sigsegv-2  2005-08-11 
14:21:01.0 +0200
+++ linux-2.6.git-paolo/arch/i386/mm/fault.c2005-08-11 16:12:46.0 
+0200
@@ -219,6 +219,7 @@ fastcall void do_page_fault(struct pt_re
unsigned long address;
unsigned long page;
int write;
+   int access_mask = 0;
siginfo_t info;
 
/* get the address */
@@ -324,23 +325,24 @@ good_area:
/* fall through */
case 2: /* write, not present */
if (!(vma->vm_flags & VM_WRITE))
-   goto bad_area;
+   goto bad_area_prot;
write++;
break;
-   case 1: /* read, present */
+   case 1: /* read, present - when does this happen? Maybe 
for NX exceptions? */
goto bad_area;
case 0: /* read, not present */
if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
-   goto bad_area;
+   goto bad_area_prot;
}
 
- survive:
+   access_mask = write ? VM_WRITE : 0;
+handle_fault:
/*
 * If for any reason at all we couldn't handle the fault,
 * make sure we exit gracefully rather than endlessly redo
 * the fault.
 */
-   switch (handle_mm_fault(mm, vma, address, write)) {
+   switch (__handle_mm_fault(mm, vma, address, access_mask)) {
case VM_FAULT_MINOR:
tsk->min_flt++;
break;
@@ -368,6 +370,20 @@ good_area:
up_read(&mm->mmap_sem);
return;
 
+   /* If the PTE is not present, the vma protection are not accurate if
+* VM_NONUNIFORM; present PTE's are correct for VM_NONUNIFORM and were
+* already handled otherwise. */
+bad_area_prot:
+   if (unlikely(vma->vm_flags & VM_NONUNIFORM)) {
+   access_mask = write ? VM_WRITE : 0;
+   /* Otherwise, on a legitimate read fault on a page mapped as
+* exec-only, we get problems. Probably, we should lower
+* requirements... we should always test just
+* pte_read/write/exec, on vma->vm_page_prot! This way is
+* cumbersome. However, for now things should work for i386. */
+   access_mask |= vma->vm_flags & VM_EXEC ? VM_EXEC : VM_READ;
+   goto handle_fault;
+   }
 /*
  * Something tried to access memory that isn't in our memory map..
  * Fix it, but check if it's kernel or user first..
@@ -481,7 +497,7 @@ out_of_memory:
if (tsk->pid == 1) {
yield();
down_read(&mm->mmap_sem);
-   goto survive;
+   goto handle_fault;
}
printk("VM: killing process %s\n", tsk->comm);
if (error_code & 4)
diff -puN mm/memory.c~rfp-fault-

[patch 12/39] remap_file_pages protection support: enhance syscall interface and swapout code

2005-08-12 Thread blaisorblade


From: Ingo Molnar <[EMAIL PROTECTED]>, Paolo 'Blaisorblade' Giarrusso <[EMAIL 
PROTECTED]>

This is the "main" patch for the syscall code, containing the core of what was
sent by Ingo Molnar, variously reworked.

Differently from his patch, I've *not* added a new syscall, choosing to add a
new flag (MAP_NOINHERIT) which the application must specify to get the new
behavior (prot != 0 is accepted and prot == 0 means PROT_NONE).

The changes to the page fault handler have been separated, even because that
has required considerable amount of effort.

Handle the possibility that remap_file_pages changes protections in 
various places.

* Enable the 'prot' parameter for shared-writable mappings (the ones
  which are the primary target for remap_file_pages), without breaking up the
  vma
* Use pte_file PTE's also when protections don't match, not only when the
  offset doesn't match; and add set_nonlinear_pte() for this testing
* Save the current protection too when clearing a nonlinear PTE, by
  replacing pgoff_to_pte() uses with pgoff_prot_to_pte().
* Use the supplied protections on restore and on populate (partially
  uncomplete, fixed in subsequent patches)

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/linux/pagemap.h |   19 ++
 linux-2.6.git-paolo/mm/fremap.c |   50 +---
 linux-2.6.git-paolo/mm/memory.c |   14 ---
 linux-2.6.git-paolo/mm/rmap.c   |3 -
 4 files changed, 60 insertions(+), 26 deletions(-)

diff -puN include/linux/pagemap.h~rfp-enhance-syscall-and-swapout-code 
include/linux/pagemap.h
--- linux-2.6.git/include/linux/pagemap.h~rfp-enhance-syscall-and-swapout-code  
2005-08-11 22:59:47.0 +0200
+++ linux-2.6.git-paolo/include/linux/pagemap.h 2005-08-11 22:59:47.0 
+0200
@@ -159,6 +159,25 @@ static inline pgoff_t linear_page_index(
return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 }
 
+/***
+ * Checks if the PTE is nonlinear, and if yes sets it.
+ * @vma: the VMA in which @addr is; we don't check if it's VM_NONLINEAR, just
+ * if this PTE is nonlinear.
+ * @addr: the addr which @pte refers to.
+ * @pte: the old PTE value (to read its protections.
+ * @ptep: the PTE pointer (for setting it).
+ * @mm: passed to set_pte_at.
+ * @page: the page which was installed (to read its ->index, i.e. the old
+ * offset inside the file.
+ */
+static inline void set_nonlinear_pte(pte_t pte, pte_t * ptep, struct 
vm_area_struct *vma, struct mm_struct *mm, struct page* page, unsigned long 
addr)
+{
+   pgprot_t pgprot = pte_to_pgprot(pte);
+   if(linear_page_index(vma, addr) != page->index || 
+   pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot))
+   set_pte_at(mm, addr, ptep, pgoff_prot_to_pte(page->index, 
pgprot));
+}
+
 extern void FASTCALL(__lock_page(struct page *page));
 extern void FASTCALL(unlock_page(struct page *page));
 
diff -puN mm/fremap.c~rfp-enhance-syscall-and-swapout-code mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-enhance-syscall-and-swapout-code  
2005-08-11 22:59:47.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:01:14.0 +0200
@@ -54,7 +54,7 @@ static inline void zap_pte(struct mm_str
  * previously existing mapping.
  */
 int install_page(struct mm_struct *mm, struct vm_area_struct *vma,
-   unsigned long addr, struct page *page, pgprot_t prot)
+   unsigned long addr, struct page *page, pgprot_t pgprot)
 {
struct inode *inode;
pgoff_t size;
@@ -94,7 +94,7 @@ int install_page(struct mm_struct *mm, s
 
inc_mm_counter(mm,rss);
flush_icache_page(vma, page);
-   set_pte_at(mm, addr, pte, mk_pte(page, prot));
+   set_pte_at(mm, addr, pte, mk_pte(page, pgprot));
page_add_file_rmap(page);
pte_val = *pte;
pte_unmap(pte);
@@ -113,7 +113,7 @@ EXPORT_SYMBOL(install_page);
  * previously existing mapping.
  */
 int install_file_pte(struct mm_struct *mm, struct vm_area_struct *vma,
-   unsigned long addr, unsigned long pgoff, pgprot_t prot)
+   unsigned long addr, unsigned long pgoff, pgprot_t pgprot)
 {
int err = -ENOMEM;
pte_t *pte;
@@ -139,7 +139,7 @@ int install_file_pte(struct mm_struct *m
 
zap_pte(mm, vma, addr, pte);
 
-   set_pte_at(mm, addr, pte, pgoff_to_pte(pgoff));
+   set_pte_at(mm, addr, pte, pgoff_prot_to_pte(pgoff, pgprot));
pte_val = *pte;
pte_unmap(pte);
update_mmu_cache(vma, addr, pte_val);
@@ -157,31 +157,28 @@ err_unlock:
  *file within an existing vma.
  * @start: start of the remapped virtual memory range
  * @size: size of the remapped virtual memory range
- * @prot: new protection bits of the range
+ * @prot: new protection bits of the range, must be 0 if not us

[patch 09/39] remap_file_pages protection support: improvement for UML bits

2005-08-12 Thread blaisorblade


Recover one bit by additionally using _PAGE_NEWPROT. Since I wasn't sure this
would work, I've split this out. We rely on the fact that pte_newprot always
checks first if the PTE is marked present.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-um/pgtable-2level.h |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff -puN include/asm-um/pgtable-2level.h~rfp-arch-uml-improv 
include/asm-um/pgtable-2level.h
--- linux-2.6.git/include/asm-um/pgtable-2level.h~rfp-arch-uml-improv   
2005-08-07 19:09:34.0 +0200
+++ linux-2.6.git-paolo/include/asm-um/pgtable-2level.h 2005-08-07 
19:09:34.0 +0200
@@ -72,19 +72,19 @@ static inline void set_pte(pte_t *pteptr
((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
 
 /*
- * Bits 0 to 5 are taken, split up the 26 bits of offset
+ * Bits 0, 1, 3 to 5 are taken, split up the 27 bits of offset
  * into this range:
  */
-#define PTE_FILE_MAX_BITS  26
+#define PTE_FILE_MAX_BITS  27
 
-#define pte_to_pgoff(pte) (pte_val(pte) >> 6)
+#define pte_to_pgoff(pte) (((pte_val(pte) >> 6) << 1) | ((pte_val(pte) >> 2) & 
0x1))
 #define pte_to_pgprot(pte) \
__pgprot((pte_val(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \
| ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \
(_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED)
 
 #define pgoff_prot_to_pte(off, prot) \
-   ((pte_t) { ((off) << 6) + \
-(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE })
+   __pteoff) >> 1) << 6) + (((off) & 0x1) << 2) + \
+(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE)
 
 #endif
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 07/39] uml: fault handler micro-cleanups

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Avoid chomping low bits of address for functions doing it by themselves, fix
whitespace, add a correctness checking.

I did this for remap-file-pages protection support, it was useful on its own
too.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/um/kernel/trap_kern.c |   28 +++--
 1 files changed, 13 insertions(+), 15 deletions(-)

diff -puN arch/um/kernel/trap_kern.c~uml-fault-handler-changes 
arch/um/kernel/trap_kern.c
--- linux-2.6.git/arch/um/kernel/trap_kern.c~uml-fault-handler-changes  
2005-08-11 11:18:03.0 +0200
+++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c  2005-08-11 
11:19:56.0 +0200
@@ -26,6 +26,7 @@
 #include "mem.h"
 #include "mem_kern.h"
 
+/* Note this is constrained to return 0, -EFAULT, -EACCESS, -ENOMEM by segv(). 
*/
 int handle_page_fault(unsigned long address, unsigned long ip, 
  int is_write, int is_user, int *code_out)
 {
@@ -35,7 +36,6 @@ int handle_page_fault(unsigned long addr
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   unsigned long page;
int err = -EFAULT;
 
*code_out = SEGV_MAPERR;
@@ -52,7 +52,7 @@ int handle_page_fault(unsigned long addr
else if(expand_stack(vma, address)) 
goto out;
 
- good_area:
+good_area:
*code_out = SEGV_ACCERR;
if(is_write && !(vma->vm_flags & VM_WRITE)) 
goto out;
@@ -60,9 +60,8 @@ int handle_page_fault(unsigned long addr
 if(!(vma->vm_flags & (VM_READ | VM_EXEC)))
 goto out;
 
-   page = address & PAGE_MASK;
do {
- survive:
+survive:
switch (handle_mm_fault(mm, vma, address, is_write)){
case VM_FAULT_MINOR:
current->min_flt++;
@@ -79,16 +78,16 @@ int handle_page_fault(unsigned long addr
default:
BUG();
}
-   pgd = pgd_offset(mm, page);
-   pud = pud_offset(pgd, page);
-   pmd = pmd_offset(pud, page);
-   pte = pte_offset_kernel(pmd, page);
+   pgd = pgd_offset(mm, address);
+   pud = pud_offset(pgd, address);
+   pmd = pmd_offset(pud, address);
+   pte = pte_offset_kernel(pmd, address);
} while(!pte_present(*pte));
err = 0;
*pte = pte_mkyoung(*pte);
if(pte_write(*pte)) *pte = pte_mkdirty(*pte);
-   flush_tlb_page(vma, page);
- out:
+   flush_tlb_page(vma, address);
+out:
up_read(&mm->mmap_sem);
return(err);
 
@@ -144,19 +143,18 @@ unsigned long segv(struct faultinfo fi, 
panic("Kernel mode fault at addr 0x%lx, ip 0x%lx", 
  address, ip);
 
-   if(err == -EACCES){
+   if (err == -EACCES) {
si.si_signo = SIGBUS;
si.si_errno = 0;
si.si_code = BUS_ADRERR;
si.si_addr = (void *)address;
 current->thread.arch.faultinfo = fi;
force_sig_info(SIGBUS, &si, current);
-   }
-   else if(err == -ENOMEM){
+   } else if (err == -ENOMEM) {
printk("VM: killing process %s\n", current->comm);
do_exit(SIGKILL);
-   }
-   else {
+   } else {
+   BUG_ON(err != -EFAULT);
si.si_signo = SIGSEGV;
si.si_addr = (void *) address;
 current->thread.arch.faultinfo = fi;
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 20/39] remap_file_pages protection support: optimize install_file_pte for MAP_POPULATE

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Add an optimization to install_file_pte: if the VMA is uniform, and the PTE
was null, it will be installed correctly if needed at fault time - we avoid
thus touching the page tables, but we must still do the walk...

I'd like to avoid doing the walk altogether, when detecting that the VMA is
uniform.

I'm wondering why should the PTE have a wrong value? It could be a pte_file
PTE installed by a previous MAP_POPULATE or remap_file_pages call with
MAP_NONBLOCK, but that would either have been zapped (if we're handling
MAP_POPULATE) or it would be correct (if called by remap_file_pages, which is
unlikely since we're in a uniform VMA).

The protections must be correct, or we'd detect it by seeing VM_NONUNIFORM,
and the offset too, otherwise we'd see VM_NONLINEAR.

Thus it's just used for MAP_POPULATE|MAP_NONBLOCK.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |9 +
 1 files changed, 9 insertions(+)

diff -puN mm/fremap.c~rfp-linear-optim-v2 mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-linear-optim-v2   2005-08-11 
22:46:58.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 22:57:49.0 +0200
@@ -121,6 +121,9 @@ int install_file_pte(struct mm_struct *m
pud_t *pud;
pgd_t *pgd;
pte_t pte_val;
+   int uniform = !(vma->vm_flags & (VM_NONUNIFORM | VM_NONLINEAR));
+
+   BUG_ON(!uniform && !(vma->vm_flags & VM_SHARED));
 
pgd = pgd_offset(mm, addr);
spin_lock(&mm->page_table_lock);
@@ -136,6 +139,12 @@ int install_file_pte(struct mm_struct *m
pte = pte_alloc_map(mm, pmd, addr);
if (!pte)
goto err_unlock;
+   /*
+* Skip uniform non-existent ptes:
+*/
+   err = 0;
+   if (uniform && pte_none(*pte))
+   goto err_unlock;
 
zap_pte(mm, vma, addr, pte);
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 01/39] comment typo fix

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

smp_entry_t -> swap_entry_t

Too short changelog entry?

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/linux/swapops.h |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN include/linux/swapops.h~fix-typo include/linux/swapops.h
--- linux-2.6.git/include/linux/swapops.h~fix-typo  2005-08-11 
11:12:23.0 +0200
+++ linux-2.6.git-paolo/include/linux/swapops.h 2005-08-11 11:12:24.0 
+0200
@@ -4,7 +4,7 @@
  * the low-order bits.
  *
  * We arrange the `type' and `offset' fields so that `type' is at the five
- * high-order bits of the smp_entry_t and `offset' is right-aligned in the
+ * high-order bits of the swap_entry_t and `offset' is right-aligned in the
  * remaining bits.
  *
  * swp_entry_t's are *never* stored anywhere in their arch-dependent format.
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 08/39] remap_file_pages protection support: uml bits

2005-08-12 Thread blaisorblade


Update pte encoding macros for UML.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-um/pgtable-2level.h |   15 ++
 linux-2.6.git-paolo/include/asm-um/pgtable-3level.h |   21 +++-
 2 files changed, 27 insertions(+), 9 deletions(-)

diff -puN include/asm-um/pgtable-2level.h~rfp-arch-uml 
include/asm-um/pgtable-2level.h
--- linux-2.6.git/include/asm-um/pgtable-2level.h~rfp-arch-uml  2005-08-11 
11:23:21.0 +0200
+++ linux-2.6.git-paolo/include/asm-um/pgtable-2level.h 2005-08-11 
11:23:21.0 +0200
@@ -72,12 +72,19 @@ static inline void set_pte(pte_t *pteptr
((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
 
 /*
- * Bits 0 through 3 are taken
+ * Bits 0 to 5 are taken, split up the 26 bits of offset
+ * into this range:
  */
-#define PTE_FILE_MAX_BITS  28
+#define PTE_FILE_MAX_BITS  26
 
-#define pte_to_pgoff(pte) (pte_val(pte) >> 4)
+#define pte_to_pgoff(pte) (pte_val(pte) >> 6)
+#define pte_to_pgprot(pte) \
+   __pgprot((pte_val(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \
+   | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \
+   (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED)
 
-#define pgoff_to_pte(off) ((pte_t) { ((off) << 4) + _PAGE_FILE })
+#define pgoff_prot_to_pte(off, prot) \
+   ((pte_t) { ((off) << 6) + \
+(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE })
 
 #endif
diff -puN include/asm-um/pgtable-3level.h~rfp-arch-uml 
include/asm-um/pgtable-3level.h
--- linux-2.6.git/include/asm-um/pgtable-3level.h~rfp-arch-uml  2005-08-11 
11:23:21.0 +0200
+++ linux-2.6.git-paolo/include/asm-um/pgtable-3level.h 2005-08-11 
11:23:21.0 +0200
@@ -140,25 +140,36 @@ static inline pmd_t pfn_pmd(pfn_t page_n
 }
 
 /*
- * Bits 0 through 3 are taken in the low part of the pte,
+ * Bits 0 through 5 are taken in the low part of the pte,
  * put the 32 bits of offset into the high part.
  */
 #define PTE_FILE_MAX_BITS  32
 
+
 #ifdef CONFIG_64BIT
 
 #define pte_to_pgoff(p) ((p).pte >> 32)
-
-#define pgoff_to_pte(off) ((pte_t) { ((off) << 32) | _PAGE_FILE })
+#define pgoff_to_pte(off) ((pte_t) { ((off) << 32) | _PAGE_FILE | \
+   (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) })
+#define pte_flags(pte) pte_val(pte)
 
 #else
 
 #define pte_to_pgoff(pte) ((pte).pte_high)
-
-#define pgoff_to_pte(off) ((pte_t) { _PAGE_FILE, (off) })
+#define pgoff_prot_to_pte(off, prot) ((pte_t) { \
+   (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) | _PAGE_FILE, \
+   (off) })
+/* Don't use pte_val below, useless to join the two halves */
+#define pte_flags(pte) ((pte).pte_low)
 
 #endif
 
+#define pte_to_pgprot(pte) \
+   __pgprot((pte_flags(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \
+   | ((pte_flags(pte) & _PAGE_PROTNONE) ? 0 : \
+   (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED)
+#undef pte_flags
+
 #endif
 
 /*
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 05/39] remove stale comment from swapfile.c

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Seems like on 2.4.9.4 this comment got out of sync ;-)

I'm not completely sure on which basis we don't need any more to do as the
comment suggests, but it seems that when faulting in a second time the same
swap page,  can_share_swap_page() returns false, and we do an early COW break,
so there's no need to write-protect the page.

No idea why we don't defer the COW break.

Reference commit from GIT version of BKCVS history:
5ee46c7964de4b1969fc5be036167eb2da0de4e2, BKRev 3c603c81PtWl2I1NnVuphvsItrD1hg
(v2.4.9.3 -> v2.4.9.4).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/swapfile.c |5 +
 1 files changed, 1 insertion(+), 4 deletions(-)

diff -puN mm/swapfile.c~remove-stale-comment-swap-file mm/swapfile.c
--- linux-2.6.git/mm/swapfile.c~remove-stale-comment-swap-file  2005-08-11 
11:13:18.0 +0200
+++ linux-2.6.git-paolo/mm/swapfile.c   2005-08-11 11:13:18.0 +0200
@@ -388,10 +388,7 @@ void free_swap_and_cache(swp_entry_t ent
 }
 
 /*
- * Always set the resulting pte to be nowrite (the same as COW pages
- * after one process has exited).  We don't know just how many PTEs will
- * share this swap entry, so be cautious and let do_wp_page work out
- * what to do if a write is requested later.
+ * Since we're swapping it in, we mark it as old.
  *
  * vma->vm_mm->page_table_lock is held.
  */
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 13/39] remap_file_pages protection support: support private vma for MAP_POPULATE

2005-08-12 Thread blaisorblade


From: Ingo Molnar <[EMAIL PROTECTED]>

If we're not rearranging pages, support even PRIVATE vma.

This is needed to make MAP_POPULATE|MAP_PRIVATE to work, since it calls
remap_file_pages.

Notes from: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

We don't support private VMA because when they're swapped out we need to store
the swap entry in the PTE, not the file offset and protections; so, I suppose
that with remap-file-pages-prot, we must punt on private VMA even when we're
just changing protections. This change is in a separate patch.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |   55 
 linux-2.6.git-paolo/mm/mmap.c   |4 ++
 2 files changed, 38 insertions(+), 21 deletions(-)

diff -puN mm/fremap.c~rfp-private-vma mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-private-vma   2005-08-11 23:02:45.0 
+0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:02:45.0 +0200
@@ -218,34 +218,47 @@ retry:
goto out_unlock;
if (((prot & PROT_EXEC) && !(vma->vm_flags & VM_MAYEXEC)))
goto out_unlock;
+   err = -EINVAL;
pgprot = protection_map[calc_vm_prot_bits(prot) | VM_SHARED];
} else 
pgprot = vma->vm_page_prot;
 
-   if ((vma->vm_flags & VM_SHARED) &&
-   (!vma->vm_private_data ||
-   (vma->vm_flags & (VM_NONLINEAR|VM_RESERVED))) &&
-   vma->vm_ops && vma->vm_ops->populate &&
-   end > start && start >= vma->vm_start &&
-   end <= vma->vm_end) {
+   if (!vma->vm_ops || !vma->vm_ops->populate || end <= start || start <
+   vma->vm_start || end > vma->vm_end)
+   goto out_unlock;
+
+   if (!vma->vm_private_data ||
+   (vma->vm_flags & (VM_NONLINEAR|VM_RESERVED))) {
 
/* Must set VM_NONLINEAR before any pages are populated. */
-   if (pgoff != linear_page_index(vma, start) &&
-   !(vma->vm_flags & VM_NONLINEAR)) {
-   if (!has_write_lock) {
-   up_read(&mm->mmap_sem);
-   down_write(&mm->mmap_sem);
-   has_write_lock = 1;
-   goto retry;
+   if (pgoff != linear_page_index(vma, start)) {
+   if (!(vma->vm_flags & VM_NONLINEAR)) {
+   if (!has_write_lock) {
+   up_read(&mm->mmap_sem);
+   down_write(&mm->mmap_sem);
+   has_write_lock = 1;
+   goto retry;
+   }
+   /* XXX: we check VM_SHARED after re-getting the
+* (write) semaphore but I guess that we could
+* check it earlier as we're not allowed to turn
+* a VM_PRIVATE vma into a VM_SHARED one! */
+   if (!(vma->vm_flags & VM_SHARED))
+   goto out_unlock;
+
+   mapping = vma->vm_file->f_mapping;
+   spin_lock(&mapping->i_mmap_lock);
+   flush_dcache_mmap_lock(mapping);
+   vma->vm_flags |= VM_NONLINEAR;
+   vma_prio_tree_remove(vma, &mapping->i_mmap);
+   vma_nonlinear_insert(vma, 
&mapping->i_mmap_nonlinear);
+   flush_dcache_mmap_unlock(mapping);
+   spin_unlock(&mapping->i_mmap_lock);
+   } else {
+   /* Won't drop the lock, check it here.*/
+   if (!(vma->vm_flags & VM_SHARED))
+   goto out_unlock;
}
-   mapping = vma->vm_file->f_mapping;
-   spin_lock(&mapping->i_mmap_lock);
-   flush_dcache_mmap_lock(mapping);
-   vma->vm_flags |= VM_NONLINEAR;
-   vma_prio_tree_remove(vma, &mapping->i_mmap);
-   vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear);
-   flush_dcache_mmap_unlock(mapping);
-   spin_unlock(&mappi

[patch 16/39] remap_file_pages protection support: readd lock downgrading

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Even now, we'll sometimes take the write lock.  So, in that case, we could
downgrade it; after a tiny bit of thought, I've choosen doing that when we'll
either do any I/O or we'll alter a lot of PTEs. About how much "a lot" is,
I've copied the values from this code in mm/memory.c:

#ifdef CONFIG_PREEMPT
# define ZAP_BLOCK_SIZE (8 * PAGE_SIZE)
#else
/* No preempt: go for improved straight-line efficiency */
# define ZAP_BLOCK_SIZE (1024 * PAGE_SIZE)
#endif

I'm not sure about the trade-offs - we used to have a down_write, now we have
a down_read() and a possible up_read()down_write(), and with this patch, the
fast-path still takes only down_read, but the slow path will do down_read(),
down_write(), downgrade_write(). This will increase the number of atomic
operation but increase concurrency wrt mmap and similar operations - I don't
know how much contention there is on that lock.

Also, drop a bust comment: we cannot clear VM_NONLINEAR simply because code
elsewhere is going to use it. At the very least, madvise_dontneed() relies on
that flag being set (remaining non-linear truncation read the mapping
list), but the list is probably longer and going to increase in the next
patches of this series.

Just in case this wasn't clear: this patch is not strictly related to
protection support, I was just too lazy to move it up in the hierarchy.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |   18 +-
 1 files changed, 13 insertions(+), 5 deletions(-)

diff -puN mm/fremap.c~rfp-downgrade-lock mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-downgrade-lock2005-08-11 
23:04:39.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:04:39.0 +0200
@@ -152,6 +152,13 @@ err_unlock:
 }
 
 
+#ifdef CONFIG_PREEMPT
+# define INSTALL_SIZE  (8 * PAGE_SIZE)
+#else
+/* No preempt: go for improved straight-line efficiency */
+# define INSTALL_SIZE  (1024 * PAGE_SIZE)
+#endif
+
 /***
  * sys_remap_file_pages - remap arbitrary pages of a shared backing store
  *file within an existing vma.
@@ -266,14 +273,15 @@ retry:
}
}
 
+   /* Do NOT hold the write lock while doing any I/O, nor when
+* iterating over too many PTEs. Values might need tuning. */
+   if (has_write_lock && (!(flags & MAP_NONBLOCK) || size > 
INSTALL_SIZE)) {
+   downgrade_write(&mm->mmap_sem);
+   has_write_lock = 0;
+   }
err = vma->vm_ops->populate(vma, start, size, pgprot, pgoff,
flags & MAP_NONBLOCK);
 
-   /*
-* We can't clear VM_NONLINEAR because we'd have to do
-* it after ->populate completes, and that would prevent
-* downgrading the lock.  (Locks can't be upgraded).
-*/
}
 
 out_unlock:
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 02/39] shmem_populate: avoid an useless check, and some comments

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Either shmem_getpage returns a failure, or it found a page, or it was told it
couldn't do any I/O. So it's useless to check nonblock in the else branch. We
could add a BUG() there but I preferred to comment the offending function.

This was taken out from one Ingo Molnar's old patch I'm resurrecting.

References: commit b103e8b204b317d52834671d5f09db95645523c2 of old-2.6-bkcvs,
pointing to BKrev: 3f5ed0c1llm6NnNwNXtPv-Z0IYzkwA

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/filemap.c |7 +++
 linux-2.6.git-paolo/mm/shmem.c   |6 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff -puN mm/shmem.c~mm-populate-optim-comment mm/shmem.c
--- linux-2.6.git/mm/shmem.c~mm-populate-optim-comment  2005-08-11 
11:12:39.0 +0200
+++ linux-2.6.git-paolo/mm/shmem.c  2005-08-11 11:12:39.0 +0200
@@ -1195,6 +1195,7 @@ static int shmem_populate(struct vm_area
err = shmem_getpage(inode, pgoff, &page, sgp, NULL);
if (err)
return err;
+   /* Page may still be null, but only if nonblock was set. */
if (page) {
mark_page_accessed(page);
err = install_page(mm, vma, addr, page, prot);
@@ -1202,7 +1203,10 @@ static int shmem_populate(struct vm_area
page_cache_release(page);
return err;
}
-   } else if (nonblock) {
+   } else {
+   /* No page was found just because we can't read it in
+* now (being here implies nonblock != 0), but the page
+* may exist, so set the PTE to fault it in later. */
err = install_file_pte(mm, vma, addr, pgoff, prot);
if (err)
return err;
diff -puN mm/filemap.c~mm-populate-optim-comment mm/filemap.c
--- linux-2.6.git/mm/filemap.c~mm-populate-optim-comment2005-08-11 
11:12:39.0 +0200
+++ linux-2.6.git-paolo/mm/filemap.c2005-08-11 11:12:39.0 +0200
@@ -1505,8 +1505,12 @@ repeat:
return -EINVAL;
 
page = filemap_getpage(file, pgoff, nonblock);
+
+   /* XXX: This is wrong, a filesystem I/O error may have happened. Fix 
that as
+* done in shmem_populate calling shmem_getpage */
if (!page && !nonblock)
return -ENOMEM;
+
if (page) {
err = install_page(mm, vma, addr, page, prot);
if (err) {
@@ -1514,6 +1518,9 @@ repeat:
return err;
}
} else {
+   /* No page was found just because we can't read it in now (being
+* here implies nonblock != 0), but the page may exist, so set
+* the PTE to fault it in later. */
err = install_file_pte(mm, vma, addr, pgoff, prot);
if (err)
return err;
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 18/39] remap_file_pages protection support: add VM_FAULT_SIGSEGV

2005-08-12 Thread blaisorblade


From: Ingo Molnar <[EMAIL PROTECTED]>

Since with remap_file_pages w/prot we may put PROT_NONE on a single PTE rather
than a VMA, we must handle that inside handle_mm_fault.

This value must be handled in the arch-specific fault handlers, and this
change must be ported to every arch on the world; now the new support is not
in a separate syscall, so this *must* be done unless we want stability /
security issues (the *BUG()* for unknown return values of handle_mm_fault() is
triggerable from userspace calling remap_file_pages, and on other archs, we
have VM_FAULT_OOM which is worse). However, I've alleviated this need via the
previous "safety net" patch.

This patch includes the arch-specific part for i386.

Note, however, that _proper_ support is more intrusive; we can allow a write
on a readonly VMA, but the arch fault handler currently stops that; it should
test VM_NONUNIFORM instead and call handle_mm_fault() in case it's set. And it
will have to do on its own all protection checks. This is in the following
patches.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/i386/mm/fault.c |2 ++
 linux-2.6.git-paolo/include/linux/mm.h   |9 +
 linux-2.6.git-paolo/mm/memory.c  |   12 
 3 files changed, 19 insertions(+), 4 deletions(-)

diff -puN arch/i386/mm/fault.c~rfp-add-vm_fault_sigsegv arch/i386/mm/fault.c
--- linux-2.6.git/arch/i386/mm/fault.c~rfp-add-vm_fault_sigsegv 2005-08-11 
14:19:57.0 +0200
+++ linux-2.6.git-paolo/arch/i386/mm/fault.c2005-08-11 14:19:58.0 
+0200
@@ -351,6 +351,8 @@ good_area:
goto do_sigbus;
case VM_FAULT_OOM:
goto out_of_memory;
+   case VM_FAULT_SIGSEGV:
+   goto bad_area;
default:
BUG();
}
diff -puN include/linux/mm.h~rfp-add-vm_fault_sigsegv include/linux/mm.h
--- linux-2.6.git/include/linux/mm.h~rfp-add-vm_fault_sigsegv   2005-08-11 
14:19:58.0 +0200
+++ linux-2.6.git-paolo/include/linux/mm.h  2005-08-11 14:19:58.0 
+0200
@@ -632,10 +632,11 @@ static inline int page_mapped(struct pag
  * Used to decide whether a process gets delivered SIGBUS or
  * just gets major/minor fault counters bumped up.
  */
-#define VM_FAULT_OOM   (-1)
-#define VM_FAULT_SIGBUS0
-#define VM_FAULT_MINOR 1
-#define VM_FAULT_MAJOR 2
+#define VM_FAULT_OOM   (-1)
+#define VM_FAULT_SIGBUS0
+#define VM_FAULT_MINOR 1
+#define VM_FAULT_MAJOR 2
+#define VM_FAULT_SIGSEGV   3
 
 #define offset_in_page(p)  ((unsigned long)(p) & ~PAGE_MASK)
 
diff -puN mm/memory.c~rfp-add-vm_fault_sigsegv mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-add-vm_fault_sigsegv  2005-08-11 
14:19:58.0 +0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-11 14:19:58.0 +0200
@@ -1995,6 +1995,18 @@ static inline int handle_pte_fault(struc
return do_swap_page(mm, vma, address, pte, pmd, entry, 
write_access);
}
 
+   /*
+* Generate a SIGSEGV if a PROT_NONE page is accessed; this is handled
+* in arch-specific code if the whole VMA has PROT_NONE, and here if
+* just this pte has PROT_NONE (which can be done only with
+* remap_file_pages).
+*/
+   if (pgprot_val(pte_to_pgprot(entry)) == pgprot_val(__P000)) {
+   pte_unmap(pte);
+   spin_unlock(&mm->page_table_lock);
+   return VM_FAULT_SIGSEGV;
+   }
+
if (write_access) {
if (!pte_write(entry))
return do_wp_page(mm, vma, address, pte, pmd, entry);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] [patch 0/39] remap_file_pages protection support, try 2

2005-08-12 Thread Blaisorblade

Ok, I've been working for the past two weeks learning well the Linux VM, 
understanding the Ingo's remap_file_pages protection support and its various 
weakness (due to lack of time on his part), and splitting and finishing it.

Here follow a series of 39 _little_ patches against the git-commit-id 
889371f61fd5bb914d0331268f12432590cf7e85, which means between 2.6.13-rc4 and 
-rc5.

Actually, the first 7 ones are unrelated trivial cleanups which somehow get in 
the way on this work and that can probably be merged even now (many are just 
comment fixes).

Since I was a VM newbie until two weeks ago, I've separated my changes into 
many little patches.

To avoid the noise, I'm CC:ing many people only on this message, while I'm 
sending the full patch series only to akpm, mingo and LKML. Or actually, I'm 
trying - my provider seem not to like me sending so many patches. I attached 
an exported tarball to this mail, since it's very little.

I hope these changes can be included inside -mm, but I guess that they'll 
probably conflict with pagefault scalability patches, and that some of them 
are not completely polished. Still, the patch is IMHO in better shape, in 
many ways, than when it was in -mm last time.

I'll appreciate any comments.

==
Changes from 2.6.5-mm1/dropped version of the patches:
==
*) Actually implemented _real_ and _anal_ protection support, safe against 
swapout; programs get SIGSEGV *always* when they should. I've used the 
attached test program (an improved version of Ingo's one) to check that.
I tested just until patch 25, onto UML. The subsequent ones are either patches 
for foreign archs or proposed

*) Fixed many changes present in the patches.
*) Fixed UML bits
*) Added several headaches for arches ports. I've also included some patches 
which reduce this 

*) No more usage of a new syscall slot: to use the new interface, application 
will use the new MAP_NOINHERIT flag I've added. I've still the patches to use 
the old -mm ABI, if there's any reason they're needed.

*) Fixed a regression wrt using mprotect() against remapped area (see patch 
15)

==
Still to do:
==
*) fix mprotect VS remap_file_pages(MAP_NOINHERIT) interaction - see long 
discussion in patch 15 changelog
*) ->populate flushes each TLB individually, instead of using mmu_gathers as 
it should; this was suggested even by Ingo when sending the patch, but it 
seems he didn't get the time to finish this. Seems rewriting the kernel 
locking is a quite time-consuming task!

==
Patch summaries
==
Each patch has an attached changelog, but I'm giving here a summary (sorry for 
using the patch numbers, but I found no other way).

The first 7 are just generic cleanups (mostly for comments) which bugged me 
along the way, however some of them are needed for the subsequent patches to 
apply.

08-11 ones are arch bits for some arches (the ones I have access too).

12 is the core change for generic code, 13-17 are various changes to the 
syscall code, as 20, 21 and 23, 35 and 36, to review individually. Most of 
those changes (except #23, which is a fix for try_to_unmap_one I missed 
initially) are just speedups, and it should be possible to individually drop 
them.

18, 19, 22, 32, 33, 34 move partially the handling of protection checks from 
the arches' page faults handler to the generic code, by introducing 
VM_FAULT_SIGSEGV. In fact, the VMA protection are not reliable for 
VM_NONUNIFORM areas. This aspect was just begun in Ingo's code, and was the 
weakest area of his patch. I must now pass the *full* kind of fault to the 
generic code, and test it against the PTE or possibly the VMA protections. 
However, in these patches it's done in a kludgy way, because we check the VMA 
protections against VM_READ/WRITE/EXEC with no consideration of the 
architecture-specific dependencies between them (like READ_IMPLIES_EXEC and 
so on), so arches have to workaround this. This is fixed in patch 33, which 
is untested however.

24 and 25 are some fixes for UML code, needed to make it work even with this 
change.

26-31 are other arch's compile fix for the introduction of pte_to_pgoff.

The last three ones (37-39) are not to apply - they are some possible changes 
I'm either really uncertain about, or which I'm sure are wrong in that form 
but express possibly correct ideas. 36 should be a fixed version of the #37 
one, but I wrote it in the past few minutes.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


fremap-prot-complete-broken-out.tar.bz2
Description: application/tbz


fremap-test-complete.c.bz2
Description: BZip2 compressed data

Re: [patch 1/3] uml: share page bits handling between 2 and 3 level pagetables

2005-08-12 Thread Blaisorblade

On Saturday 30 July 2005 18:02, Jeff Dike wrote:
> On Thu, Jul 28, 2005 at 08:56:53PM +0200, [EMAIL PROTECTED] wrote:
> > As obvious, a "core code nice cleanup" is not a "stability-friendly
> > patch" so usual care applies.

> These look reasonable, as they are what we discussed in Ottawa.

> I'll put them in my tree and see if I see any problems.  I would
> suggest sending these in early after 2.6.13 if they seem OK.

I've discovered that we're not the only one to miss dirty / accessed 
"hardware" bits: see include/asm-alpha/pgtable.h (they don't have the 
accessed bit). So maybe we could drop the "fault-on-access" thing.

Also, note the comment before handle_pte_fault:
/*
 * These routines also need to handle stuff like marking pages dirty
 * and/or accessed for architectures that don't do it in hardware (most
 * RISC architectures).  The early dirtying is also good on the i386.
 */

I'm not able to find where we clean the dirty bit on a pte, however it's not 
only done by pte_mkclean, there are some macros like ptep_clear... in 
asm-generic/pgtable.h
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 27/39] remap_file_pages protection support: fixups to ppc32 bits

2005-08-12 Thread blaisorblade


From: Paul Mackerras <[EMAIL PROTECTED]>

When I tried -mm4 on a ppc32 box, it hit a BUG because I hadn't excluded
_PAGE_FILE from the bits used for swap entries.  While looking at that I
realised that the pte_to_pgoff and pgoff_prot_to_pte macros were wrong for
4xx and 8xx (embedded) PPC chips, since they use

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-ppc/pgtable.h |   48 +-
 1 files changed, 39 insertions(+), 9 deletions(-)

diff -puN include/asm-ppc/pgtable.h~rfp-arch-ppc32-pgtable-fixes 
include/asm-ppc/pgtable.h
--- linux-2.6.git/include/asm-ppc/pgtable.h~rfp-arch-ppc32-pgtable-fixes
2005-08-12 18:18:44.0 +0200
+++ linux-2.6.git-paolo/include/asm-ppc/pgtable.h   2005-08-12 
18:18:44.0 +0200
@@ -205,6 +205,7 @@ extern unsigned long ioremap_bot, iorema
  */
 #define _PAGE_PRESENT  0x0001  /* S: PTE valid */
 #define_PAGE_RW0x0002  /* S: Write permission 
*/
+#define _PAGE_FILE 0x0004  /* S: nonlinear file mapping */
 #define_PAGE_DIRTY 0x0004  /* S: Page dirty */
 #define _PAGE_ACCESSED 0x0008  /* S: Page referenced */
 #define _PAGE_HWWRITE  0x0010  /* H: Dirty & RW */
@@ -213,7 +214,6 @@ extern unsigned long ioremap_bot, iorema
 #define_PAGE_ENDIAN0x0080  /* H: E bit */
 #define_PAGE_GUARDED   0x0100  /* H: G bit */
 #define_PAGE_COHERENT  0x0200  /* H: M bit */
-#define _PAGE_FILE 0x0400  /* S: nonlinear file mapping */
 #define_PAGE_NO_CACHE  0x0400  /* H: I bit */
 #define_PAGE_WRITETHRU 0x0800  /* H: W bit */
 
@@ -724,20 +724,50 @@ extern void paging_init(void);
 #define __swp_type(entry)  ((entry).val & 0x1f)
 #define __swp_offset(entry)((entry).val >> 5)
 #define __swp_entry(type, offset)  ((swp_entry_t) { (type) | ((offset) << 
5) })
+
+#if defined(CONFIG_4xx) || defined(CONFIG_8xx)
+/* _PAGE_FILE and _PAGE_PRESENT are in the bottom 3 bits on all these chips. */
 #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) 
>> 3 })
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val << 3 })
+#else  /* Classic PPC */
+#define __pte_to_swp_entry(pte)\
+((swp_entry_t) { ((pte_val(pte) >> 3) & ~1) | ((pte_val(pte) >> 2) & 1) })
+#define __swp_entry_to_pte(x)  \
+((pte_t) { (((x).val & ~1) << 3) | (((x).val & 1) << 2) })
+#endif
 
 /* Encode and decode a nonlinear file mapping entry */
-#define PTE_FILE_MAX_BITS  27
-#define pte_to_pgoff(pte)  (((pte_val(pte) & ~0x7ff) >> 5) \
-| ((pte_val(pte) & 0x3f0) >> 4))
-#define pte_to_pgprot(pte) \
-__pgprot((pte_val(pte) & (_PAGE_USER|_PAGE_RW|_PAGE_PRESENT)) | _PAGE_ACCESSED)
+/* We can't use any the _PAGE_PRESENT, _PAGE_FILE, _PAGE_USER, _PAGE_RW,
+   or _PAGE_HASHPTE bits for storing a page offset. */
+#if defined(CONFIG_40x)
+/* 40x, avoid the 0x53 bits - to simplify things, avoid 0x73 */ */
+#define __pgoff_split(x)   x) << 5) & ~0x7f) | (((x) << 2) & 0xc))
+#define __pgoff_glue(x)x) & ~0x7f) >> 5) | (((x) & 0xc) >> 
2))
+#elif defined(CONFIG_44x)
+/* 44x, avoid the 0x47 bits */
+#define __pgoff_split(x)   x) << 4) & ~0x7f) | (((x) << 3) & 0x38))
+#define __pgoff_glue(x)x) & ~0x7f) >> 4) | (((x) & 0x38) 
>> 3))
+#elif defined(CONFIG_8xx)
+/* 8xx, avoid the 0x843 bits */
+#define __pgoff_split(x)   x) << 4) & ~0xfff) | (((x) << 3) & 0x780) \
+| (((x) << 2) & 0x3c))
+#define __pgoff_glue(x)x) & ~0xfff) >> 4) | (((x) & 0x780) 
>> 3))\
+| (((x) & 0x3c) >> 2))
+#else
+/* classic PPC, avoid the 0x40f bits */
+#define __pgoff_split(x)   x) << 5) & ~0x7ff) | (((x) << 4) & 0x3f0))
+#define __pgoff_glue(x)x) & ~0x7ff) >> 5) | (((x) & 0x3f0) 
>> 4))
+#endif
 
+#define PTE_FILE_MAX_BITS  27
+#define pte_to_pgoff(pte)  __pgoff_glue(pte_val(pte))
 #define pgoff_prot_to_pte(off, prot)   \
-   ((pte_t) { (((off) << 5) & ~0x7ff) | (((off) << 4) & 0x3f0) \
-  | (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) \
-  | _PAGE_FILE })
+   ((pte_t) { __pgoff_split(off) | _PAGE_FILE |\
+  (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) })
+
+#de

[patch 24/39] remap_file_pages protection support: adapt to uml peculiarities

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Uml is particular in respect with other architectures (and possibly this is to
fix) in the fact that our arch fault handler handles indifferently both TLB
and page faults. In particular, we may get to call handle_mm_fault() when the
PTE is already correct, but simply it's not flushed.

And rfp-fault-sigsegv-2 breaks this, because when getting a fault on a
pte_present PTE and non-uniform VMA, it assumes the fault is due to a
protection fault, and signals the caller a SIGSEGV must be sent.

This isn't the final fix for UML, that's the next one.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/um/kernel/trap_kern.c |   19 +++
 1 files changed, 15 insertions(+), 4 deletions(-)

diff -puN arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-3 
arch/um/kernel/trap_kern.c
--- linux-2.6.git/arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-3  2005-08-11 
23:13:06.0 +0200
+++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c  2005-08-11 
23:14:26.0 +0200
@@ -75,8 +75,21 @@ handle_fault:
err = -EACCES;
goto out;
case VM_FAULT_SIGSEGV:
-   err = -EFAULT;
-   goto out;
+   /* Duplicate this code here. */
+   pgd = pgd_offset(mm, address);
+   pud = pud_offset(pgd, address);
+   pmd = pmd_offset(pud, address);
+   pte = pte_offset_kernel(pmd, address);
+   if (likely (pte_newpage(*pte) || pte_newprot(*pte))) {
+   /* This wasn't done by __handle_mm_fault(), and
+* the page hadn't been flushed. */
+   *pte = pte_mkyoung(*pte);
+   if(pte_write(*pte)) *pte = pte_mkdirty(*pte);
+   break;
+   } else {
+   err = -EFAULT;
+   goto out;
+   }
case VM_FAULT_OOM:
err = -ENOMEM;
goto out_of_memory;
@@ -89,8 +102,6 @@ handle_fault:
pte = pte_offset_kernel(pmd, address);
} while(!pte_present(*pte));
err = 0;
-   *pte = pte_mkyoung(*pte);
-   if(pte_write(*pte)) *pte = pte_mkdirty(*pte);
flush_tlb_page(vma, address);
 
/* If the PTE is not present, the vma protection are not accurate if
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 34/39] remap_file_pages protection support: restrict permission testing

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Yet to test. Currently we install a PTE when one is missing
irrispective of the fault type, and if the access type is prohibited we'll
get another fault and kill the process only then. With this, we check the
access type on the 1st fault.

We could also use this code for testing present PTE's, if the current
assumption (fault on present PTE's in VM_NONUNIFORM vma's means access 
violation)
proves problematic for architectures other than UML (which I already fixed),
but I hope it's not needed.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/memory.c |   16 
 1 files changed, 16 insertions(+)

diff -puN mm/memory.c~rfp-fault-sigsegv-3 mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-fault-sigsegv-3   2005-08-12 
17:19:17.0 +0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-12 17:19:17.0 +0200
@@ -1963,6 +1963,7 @@ static int do_file_page(struct mm_struct
unsigned long pgoff;
pgprot_t pgprot;
int err;
+   pte_t test_entry;
 
BUG_ON(!vma->vm_ops || !vma->vm_ops->nopage);
/*
@@ -1983,6 +1984,21 @@ static int do_file_page(struct mm_struct
pgoff = pte_to_pgoff(*pte);
pgprot = vma->vm_flags & VM_NONUNIFORM ? pte_to_pgprot(*pte): 
vma->vm_page_prot;
 
+   /* If this is not enabled, we'll get another fault after return next
+* time, check we handle that one, and that this code works. */
+#if 1
+   /* We just want to test pte_{read,write,exec} */
+   test_entry = mk_pte(0, pgprot);
+   if (unlikely(vma->vm_flags & VM_NONUNIFORM) && !pte_file(*pte)) {
+   if ((access_mask & VM_WRITE) && !pte_write(test_entry))
+   goto out_segv;
+   if ((access_mask & VM_READ) && !pte_read(test_entry))
+   goto out_segv;
+   if ((access_mask & VM_EXEC) && !pte_exec(test_entry))
+   goto out_segv;
+   }
+#endif
+
pte_unmap(pte);
spin_unlock(&mm->page_table_lock);
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 26/39] remap_file_pages protection support: ppc32 bits

2005-08-12 Thread blaisorblade


From: Ingo Molnar <[EMAIL PROTECTED]>

PPC32 bits of RFP - as in original patch.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-ppc/pgtable.h |   15 +++
 1 files changed, 11 insertions(+), 4 deletions(-)

diff -puN include/asm-ppc/pgtable.h~rfp-arch-ppc include/asm-ppc/pgtable.h
--- linux-2.6.git/include/asm-ppc/pgtable.h~rfp-arch-ppc2005-08-12 
18:18:43.0 +0200
+++ linux-2.6.git-paolo/include/asm-ppc/pgtable.h   2005-08-12 
18:39:57.0 +0200
@@ -309,8 +309,8 @@ extern unsigned long ioremap_bot, iorema
 /* Definitions for 60x, 740/750, etc. */
 #define _PAGE_PRESENT  0x001   /* software: pte contains a translation */
 #define _PAGE_HASHPTE  0x002   /* hash_page has made an HPTE for this pte */
-#define _PAGE_FILE 0x004   /* when !present: nonlinear file mapping */
 #define _PAGE_USER 0x004   /* usermode access allowed */
+#define _PAGE_FILE 0x008   /* when !present: nonlinear file mapping */
 #define _PAGE_GUARDED  0x008   /* G: prohibit speculative access */
 #define _PAGE_COHERENT 0x010   /* M: enforce memory coherence (SMP systems) */
 #define _PAGE_NO_CACHE 0x020   /* I: cache inhibit */
@@ -728,9 +728,16 @@ extern void paging_init(void);
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val << 3 })
 
 /* Encode and decode a nonlinear file mapping entry */
-#define PTE_FILE_MAX_BITS  29
-#define pte_to_pgoff(pte)  (pte_val(pte) >> 3)
-#define pgoff_to_pte(off)  ((pte_t) { ((off) << 3) | _PAGE_FILE })
+#define PTE_FILE_MAX_BITS  27
+#define pte_to_pgoff(pte)  (((pte_val(pte) & ~0x7ff) >> 5) \
+| ((pte_val(pte) & 0x3f0) >> 4))
+#define pte_to_pgprot(pte) \
+__pgprot((pte_val(pte) & (_PAGE_USER|_PAGE_RW|_PAGE_PRESENT)) | _PAGE_ACCESSED)
+
+#define pgoff_prot_to_pte(off, prot)   \
+   ((pte_t) { (((off) << 5) & ~0x7ff) | (((off) << 4) & 0x3f0) \
+  | (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) \
+  | _PAGE_FILE })
 
 /* CONFIG_APUS */
 /* For virtual address to physical address conversion */
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 33/39] remap_file_pages protection support: VM_FAULT_SIGSEGV permission checking rework

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Simplify the generic arch permission checking: the previous one was clumsy, as
it didn't account arch-specific implications (read implies exec, write implies
read, and so on).

Still to undo fixes for the archs (i386 and UML) which were modified for the
previous scheme.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/memory.c |   49 ++--
 1 files changed, 33 insertions(+), 16 deletions(-)

diff -puN mm/memory.c~rfp-sigsegv-4 mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-sigsegv-4 2005-08-12 17:18:55.0 
+0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-12 17:18:55.0 +0200
@@ -1923,6 +1923,35 @@ oom:
goto out;
 }
 
+static inline int check_perms(struct vm_area_struct * vma, int access_mask) {
+   if (unlikely(vm_flags & VM_NONUNIFORM)) {
+   /* we used to check protections in arch handler, but with
+* VM_NONUNIFORM the check is skipped. */
+#if 0
+   if ((access_mask & VM_WRITE) > (vm_flags & VM_WRITE))
+   goto err;
+   if ((access_mask & VM_READ) > (vm_flags & VM_READ))
+   goto err;
+   if ((access_mask & VM_EXEC) > (vm_flags & VM_EXEC))
+   goto err;
+#else
+   /* access_mask contains the type of the access, vm_flags are the
+* declared protections, pte has the protection which will be
+* given to the PTE's in that area. */
+   //pte_t pte = pfn_pte(0UL, protection_map[vm_flags & 
0x0f|VM_SHARED]);
+   pte_t pte = pfn_pte(0UL, vma->vm_page_prot);
+   if ((access_mask & VM_WRITE) && ! pte_write(pte))
+   goto err;
+   if ((access_mask & VM_READ) && ! pte_read(pte))
+   goto err;
+   if ((access_mask & VM_EXEC) && ! pte_exec(pte))
+   goto err;
+#endif
+   }
+   return 0;
+err:
+   return -EPERM;
+}
 /*
  * Fault of a previously existing named mapping. Repopulate the pte
  * from the encoded file_pte if possible. This enables swappable
@@ -1944,14 +1973,8 @@ static int do_file_page(struct mm_struct
((access_mask & VM_WRITE) && !(vma->vm_flags & 
VM_SHARED))) {
/* We're behaving as if pte_file was cleared, so check
 * protections like in handle_pte_fault. */
-   if (unlikely(vma->vm_flags & VM_NONUNIFORM)) {
-   if ((access_mask & VM_WRITE) > (vma->vm_flags & 
VM_WRITE))
-   goto out_segv;
-   if ((access_mask & VM_READ) > (vma->vm_flags & VM_READ))
-   goto out_segv;
-   if ((access_mask & VM_EXEC) > (vma->vm_flags & VM_EXEC))
-   goto out_segv;
-   }
+   if (check_perms(vma, access_mask))
+   goto out_segv;
 
pte_clear(mm, address, pte);
return do_no_page(mm, vma, address, access_mask & VM_WRITE, 
pte, pmd);
@@ -2007,14 +2030,8 @@ static inline int handle_pte_fault(struc
/* when pte_file(), the VMA protections are useless. Otherwise,
 * we used to check protections in arch handler, but with
 * VM_NONUNIFORM the check is skipped. */
-   if (unlikely(vma->vm_flags & VM_NONUNIFORM) && 
!pte_file(entry)) {
-   if ((access_mask & VM_WRITE) > (vma->vm_flags & 
VM_WRITE))
-   goto out_segv;
-   if ((access_mask & VM_READ) > (vma->vm_flags & VM_READ))
-   goto out_segv;
-   if ((access_mask & VM_EXEC) > (vma->vm_flags & VM_EXEC))
-   goto out_segv;
-   }
+   if (!pte_file(entry) && check_perms(vma, access_mask))
+   goto out_segv;
 
/*
 * If it truly wasn't present, we know that kswapd
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 30/39] remap_file_pages protection support: ia64 bits

2005-08-12 Thread blaisorblade


From: Ingo Molnar <[EMAIL PROTECTED]>

I've attached a 'blind' port of the prot bits of fremap to ia64.  I've
compiled it with a cross-compiler but otherwise it's untested.  (and it's
very likely i got the pte bits wrong - but it's roughly OK.)

This should at least make ia64 compile.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-ia64/pgtable.h |   17 +
 1 files changed, 13 insertions(+), 4 deletions(-)

diff -puN include/asm-ia64/pgtable.h~rfp-arch-ia64 include/asm-ia64/pgtable.h
--- linux-2.6.git/include/asm-ia64/pgtable.h~rfp-arch-ia64  2005-08-12 
19:27:03.0 +0200
+++ linux-2.6.git-paolo/include/asm-ia64/pgtable.h  2005-08-12 
19:27:03.0 +0200
@@ -433,7 +433,8 @@ extern void paging_init (void);
  * Format of file pte:
  * bit   0   : present bit (must be zero)
  * bit   1   : _PAGE_FILE (must be one)
- * bits  2-62: file_offset/PAGE_SIZE
+ * bit   2   : _PAGE_AR_RW
+ * bits  3-62: file_offset/PAGE_SIZE
  * bit  63   : _PAGE_PROTNONE bit
  */
 #define __swp_type(entry)  (((entry).val >> 2) & 0x7f)
@@ -442,9 +443,17 @@ extern void paging_init (void);
 #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) })
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val })
 
-#define PTE_FILE_MAX_BITS  61
-#define pte_to_pgoff(pte)  ((pte_val(pte) << 1) >> 3)
-#define pgoff_to_pte(off)  ((pte_t) { ((off) << 2) | _PAGE_FILE })
+#define PTE_FILE_MAX_BITS  59
+#define pte_to_pgoff(pte)  ((pte_val(pte) << 1) >> 4)
+
+#define pte_to_pgprot(pte) \
+   __pgprot((pte_val(pte) & (_PAGE_AR_RW | _PAGE_PROTNONE)) \
+   | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \
+   (__ACCESS_BITS | _PAGE_PL_3)))
+
+#define pgoff_prot_to_pte(off, prot) \
+   ((pte_t) { _PAGE_FILE + \
+   (pgprot_val(prot) & (_PAGE_AR_RW | _PAGE_PROTNONE)) + (off) })
 
 /* XXX is this right? */
 #define io_remap_page_range(vma, vaddr, paddr, size, prot) \
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 25/39] remap_file_pages protection support: fix unflushed TLB errors detection

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

We got unflushed PTE's marked up-to-date, because they were protected to get
dirtying / accessing faults. So, don't test the PTE for being up-to-date, but
check directly the permission (since the PTE is not protected for that).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/um/kernel/trap_kern.c |   28 +++--
 1 files changed, 22 insertions(+), 6 deletions(-)

diff -puN arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-3-fix 
arch/um/kernel/trap_kern.c
--- linux-2.6.git/arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-3-fix  
2005-08-11 23:14:58.0 +0200
+++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c  2005-08-11 
23:14:58.0 +0200
@@ -35,7 +35,7 @@ int handle_page_fault(unsigned long addr
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
-   pte_t *pte;
+   pte_t *pte, entry;
int err = -EFAULT;
int access_mask = 0;
 
@@ -75,16 +75,32 @@ handle_fault:
err = -EACCES;
goto out;
case VM_FAULT_SIGSEGV:
+   WARN_ON(!(vma->vm_flags & VM_NONUNIFORM));
/* Duplicate this code here. */
pgd = pgd_offset(mm, address);
pud = pud_offset(pgd, address);
pmd = pmd_offset(pud, address);
pte = pte_offset_kernel(pmd, address);
-   if (likely (pte_newpage(*pte) || pte_newprot(*pte))) {
-   /* This wasn't done by __handle_mm_fault(), and
-* the page hadn't been flushed. */
-   *pte = pte_mkyoung(*pte);
-   if(pte_write(*pte)) *pte = pte_mkdirty(*pte);
+   if (likely (pte_newpage(*pte) || pte_newprot(*pte)) ||
+   (is_write ? pte_write(*pte) : pte_read(*pte)) ) 
{
+   /* The page hadn't been flushed, or it had been
+* flushed but without access to get a dirtying
+* / accessing fault. */
+
+   /* __handle_mm_fault() didn't dirty / young this
+* PTE, probably we won't get another fault for
+* this page, so fix things now. */
+   entry = *pte;
+   entry = pte_mkyoung(*pte);
+   if(pte_write(entry))
+   entry = pte_mkdirty(entry);
+   /* Yes, this will set the page as NEWPAGE. We
+* want this, otherwise things won't work.
+* Indeed, the
+* *pte = pte_mkyoung(*pte);
+* we used to have (uselessly) didn't work at
+* all! */
+   set_pte(pte, entry);
break;
} else {
err = -EFAULT;
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [uml-devel] Re: [RFC] [patch 0/39] remap_file_pages protection support, try 2

2005-08-12 Thread Blaisorblade

On Friday 12 August 2005 20:29, David S. Miller wrote:
> Please do not BOMB linux-kernel with 39 patches in one
> go, that will kill the list server.

> Try to consolidate your patch groups into smaller pieces,
> like so about 10 or 15 at a time.  And send any that remain
> on some later date.
Whoops - some later date for me means a week unfortunately or some minutes. 
I'm trying for the latter.

However, I sent the initial tarball containing all them, so I hope that will 
be useful.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade





___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 31/39] remap_file_pages protection support: s390 bits

2005-08-12 Thread blaisorblade


From: Martin Schwidefsky <[EMAIL PROTECTED]>

s390 memory management changes for remap-file-pages-prot patch:

- Add pgoff_prot_to_pte/pte_to_pgprot, remove pgoff_to_pte (required for
  'prot' parameteter in shared-writeable mappings).

- Handle VM_FAULT_SIGSEGV from handle_mm_fault in do_exception.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/s390/mm/fault.c   |2 
 linux-2.6.git-paolo/include/asm-s390/pgtable.h |   90 -
 2 files changed, 60 insertions(+), 32 deletions(-)

diff -puN arch/s390/mm/fault.c~rfp-arch-s390 arch/s390/mm/fault.c
--- linux-2.6.git/arch/s390/mm/fault.c~rfp-arch-s3902005-08-12 
19:27:58.0 +0200
+++ linux-2.6.git-paolo/arch/s390/mm/fault.c2005-08-12 19:27:58.0 
+0200
@@ -260,6 +260,8 @@ survive:
goto do_sigbus;
case VM_FAULT_OOM:
goto out_of_memory;
+   case VM_FAULT_SIGSEGV:
+   goto bad_area;
default:
BUG();
}
diff -puN include/asm-s390/pgtable.h~rfp-arch-s390 include/asm-s390/pgtable.h
--- linux-2.6.git/include/asm-s390/pgtable.h~rfp-arch-s390  2005-08-12 
19:27:58.0 +0200
+++ linux-2.6.git-paolo/include/asm-s390/pgtable.h  2005-08-12 
19:27:58.0 +0200
@@ -211,16 +211,41 @@ extern char empty_zero_page[PAGE_SIZE];
  * C  : changed bit
  */
 
-/* Hardware bits in the page table entry */
+/* Hardware bits in the page table entry. */
 #define _PAGE_RO0x200  /* HW read-only */
 #define _PAGE_INVALID   0x400  /* HW invalid   */
 
-/* Mask and four different kinds of invalid pages. */
-#define _PAGE_INVALID_MASK 0x601
+/* Software bits in the page table entry. */
+#define _PAGE_FILE 0x001
+#define _PAGE_PROTNONE 0x002
+
+/*
+ * We have 8 different page "types", two valid types and 6 invalid types
+ * (p = page address, o = swap offset, t = swap type, f = file offset):
+ * 0 xxx 0IP0 yy NF
+ * valid rw:   0 <p>  <--0-> 00
+ * valid ro:   0 <p> 0010 <--0-> 00
+ * invalid none:   0 <p> 0100 <--0-> 10
+ * invalid empty:  0 <0> 0100 <--0-> 00
+ * invalid swap:   0 <o> 0110 <--t-> 00
+ * invalid file rw:0 <f> 0100 <--f-> 01
+ * invalid file ro:0 <f> 0110 <--f-> 01
+ * invaild file none:  0 <f> 0100 <--f-> 11
+ *
+ * The format for 64 bit is almost identical, there isn't a leading zero
+ * and the number of bits in the page address part of the pte is 52 bits
+ * instead of 19.
+ */
+
 #define _PAGE_INVALID_EMPTY0x400
-#define _PAGE_INVALID_NONE 0x401
 #define _PAGE_INVALID_SWAP 0x600
-#define _PAGE_INVALID_FILE 0x601
+#define _PAGE_INVALID_FILE 0x401
+
+#define _PTE_IS_VALID(__pte)   (!(pte_val(__pte) & _PAGE_INVALID))
+#define _PTE_IS_NONE(__pte)((pte_val(__pte) & 0x603) == 0x402)
+#define _PTE_IS_EMPTY(__pte)   ((pte_val(__pte) & 0x603) == 0x400)
+#define _PTE_IS_SWAP(__pte)((pte_val(__pte) & 0x603) == 0x600)
+#define _PTE_IS_FILE(__pte)((pte_val(__pte) & 0x401) == 0x401)
 
 #ifndef __s390x__
 
@@ -281,13 +306,11 @@ extern char empty_zero_page[PAGE_SIZE];
 /*
  * No mapping available
  */
-#define PAGE_NONE_SHARED  __pgprot(_PAGE_INVALID_NONE)
-#define PAGE_NONE_PRIVATE __pgprot(_PAGE_INVALID_NONE)
-#define PAGE_RO_SHARED   __pgprot(_PAGE_RO)
-#define PAGE_RO_PRIVATE  __pgprot(_PAGE_RO)
-#define PAGE_COPY__pgprot(_PAGE_RO)
-#define PAGE_SHARED  __pgprot(0)
-#define PAGE_KERNEL  __pgprot(0)
+#define PAGE_NONE  __pgprot(_PAGE_INVALID | _PAGE_PROTNONE)
+#define PAGE_READONLY  __pgprot(_PAGE_RO)
+#define PAGE_COPY  __pgprot(_PAGE_RO)
+#define PAGE_SHARED__pgprot(0)
+#define PAGE_KERNEL__pgprot(0)
 
 /*
  * The S390 can't do page protection for execute, and considers that the
@@ -295,21 +318,21 @@ extern char empty_zero_page[PAGE_SIZE];
  * the closest we can get..
  */
  /*xwr*/
-#define __P000  PAGE_NONE_PRIVATE
-#define __P001  PAGE_RO_PRIVATE
+#define __P000  PAGE_NONE
+#define __P001  PAGE_READONLY
 #define __P010  PAGE_COPY
 #define __P011  PAGE_COPY
-#define __P100  PAGE_RO_PRIVATE
-#define __P101  PAGE_RO_PRIVATE
+#define __P100  PAGE_READONLY
+#define __P101  PAGE_READONLY
 #define __P110  PAGE_COPY
 #define __P111  PAGE_COPY
 
-#define __S000  PAGE_NONE_SHARED
-#define __S001  PAGE_RO_SHARED
+#define __S000  PAGE_NONE
+#define __S001  PAGE_READONLY
 #define __S010  PAGE_SHARED
 #define __S011  PAGE_SHARED
-#define __S100  PAGE_RO_SHARED
-#define __S101  PAGE_RO_SHARED
+#define __S100  PAGE_READONLY
+#define __S101  PAGE_READONLY
 #define __S110  PAGE_

[patch 28/39] remap_file_pages protection support: sparc64 bits.

2005-08-12 Thread blaisorblade


From: William Lee Irwin III <[EMAIL PROTECTED]>

Implement remap_file_pages-with-per-page-protections for sparc64.

See

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.4/2.6.4-mm1/broken-out/remap-file-pages-prot-2.6.4-rc1-mm1-A1.patch

and

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.4/2.6.4-mm1/broken-out/remap-file-pages-prot-ia64-2.6.4-rc2-mm1-A0.patch

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-sparc64/pgtable.h |   13 ++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff -puN include/asm-sparc64/pgtable.h~rfp-arch-sparc64 
include/asm-sparc64/pgtable.h
--- linux-2.6.git/include/asm-sparc64/pgtable.h~rfp-arch-sparc64
2005-08-12 18:41:31.0 +0200
+++ linux-2.6.git-paolo/include/asm-sparc64/pgtable.h   2005-08-12 
18:41:31.0 +0200
@@ -367,9 +367,16 @@ static inline pte_t mk_pte_io(unsigned l
 
 /* File offset in PTE support. */
 #define pte_file(pte)  (pte_val(pte) & _PAGE_FILE)
-#define pte_to_pgoff(pte)  (pte_val(pte) >> PAGE_SHIFT)
-#define pgoff_to_pte(off)  (__pte(((off) << PAGE_SHIFT) | _PAGE_FILE))
-#define PTE_FILE_MAX_BITS  (64UL - PAGE_SHIFT - 1UL)
+#define __pte_to_pgprot(pte) \
+   __pgprot(pte_val(pte) & (_PAGE_READ|_PAGE_WRITE))
+#define __file_pte_to_pgprot(pte) \
+   __pgprot(((pte_val(pte) >> PAGE_SHIFT) & 0x3UL) << 8)
+#define pte_to_pgprot(pte) \
+   (pte_file(pte) ? __file_pte_to_pgprot(pte) : __pte_to_pgprot(pte))
+#define pte_to_pgoff(pte)  (pte_val(pte) >> (PAGE_SHIFT+2))
+#define pgoff_prot_to_pte(off, prot) \
+   ((__pte(((off) | ((pgprot_val(prot) >> 8) & 0x3UL << (PAGE_SHIFT+2) 
| _PAGE_FILE)
+#define PTE_FILE_MAX_BITS  (64UL - PAGE_SHIFT - 3UL)
 
 extern unsigned long prom_virt_to_phys(unsigned long, int *);
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 21/39] remap_file_pages protection support: use EOVERFLOW ret code

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Use -EOVERFLOW ("Value too large for defined data type") rather than -EINVAL
when we cannot store the file offset in the PTE.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN mm/fremap.c~rfp-ef2big-ret-code mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-ef2big-ret-code   2005-08-11 
23:04:59.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:04:59.0 +0200
@@ -213,7 +213,7 @@ asmlinkage long sys_remap_file_pages(uns
/* Can we represent this offset inside this architecture's pte's? */
 #if PTE_FILE_MAX_BITS < BITS_PER_LONG
if (pgoff + (size >> PAGE_SHIFT) >= (1UL << PTE_FILE_MAX_BITS))
-   return err;
+   return -EOVERFLOW;
 #endif
 
/* We need down_write() to change vma->vm_flags. */
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 29/39] remap_file_pages protection support: ppc64 bits

2005-08-12 Thread blaisorblade


From: Paul Mackerras <[EMAIL PROTECTED]>

ppc64 bits for remap_file_pages w/prot (no syscall table).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-ppc64/pgtable.h |   12 +---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff -puN include/asm-ppc64/pgtable.h~rfp-arch-ppc64 include/asm-ppc64/pgtable.h
--- linux-2.6.git/include/asm-ppc64/pgtable.h~rfp-arch-ppc642005-08-12 
18:42:20.0 +0200
+++ linux-2.6.git-paolo/include/asm-ppc64/pgtable.h 2005-08-12 
18:42:20.0 +0200
@@ -62,8 +62,8 @@
  */
 #define _PAGE_PRESENT  0x0001 /* software: pte contains a translation */
 #define _PAGE_USER 0x0002 /* matches one of the PP bits */
-#define _PAGE_FILE 0x0002 /* (!present only) software: pte holds file 
offset */
 #define _PAGE_EXEC 0x0004 /* No execute on POWER4 and newer (we invert) */
+#define _PAGE_FILE 0x0008 /* !present: pte holds file offset */
 #define _PAGE_GUARDED  0x0008
 #define _PAGE_COHERENT 0x0010 /* M: enforce memory coherence (SMP systems) */
 #define _PAGE_NO_CACHE 0x0020 /* I: cache inhibit */
@@ -492,9 +492,15 @@ extern void update_mmu_cache(struct vm_a
 #define __swp_entry(type, offset) ((swp_entry_t) { ((type) << 1) | ((offset) 
<< 8) })
 #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) >> 
PTE_SHIFT })
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val << PTE_SHIFT })
-#define pte_to_pgoff(pte)  (pte_val(pte) >> PTE_SHIFT)
-#define pgoff_to_pte(off)  ((pte_t) {((off) << PTE_SHIFT)|_PAGE_FILE})
+
 #define PTE_FILE_MAX_BITS  (BITS_PER_LONG - PTE_SHIFT)
+#define pte_to_pgoff(pte)  (pte_val(pte) >> PTE_SHIFT)
+#define pte_to_pgprot(pte) \
+__pgprot((pte_val(pte) & (_PAGE_USER|_PAGE_RW|_PAGE_PRESENT)) | _PAGE_ACCESSED)
+
+#define pgoff_prot_to_pte(off, prot)   \
+   ((pte_t) { ((off) << PTE_SHIFT) | _PAGE_FILE\
+  | (pgprot_val(prot) & (_PAGE_USER|_PAGE_RW)) })
 
 /*
  * kern_addr_valid is intended to indicate whether an address is a valid
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 22/39] remap file pages protection support: use FAULT_SIGSEGV for protection checking, uml bits

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

This adapts the changes to the i386 handler to the UML one. It isn't enough to
make UML work, however, because UML has some peculiarities. Subsequent patches
fix this.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/um/kernel/trap_kern.c |   32 +
 1 files changed, 27 insertions(+), 5 deletions(-)

diff -puN arch/um/kernel/trap_kern.c~rfp-fault-sigsegv-2-uml 
arch/um/kernel/trap_kern.c
--- linux-2.6.git/arch/um/kernel/trap_kern.c~rfp-fault-sigsegv-2-uml
2005-08-11 23:09:32.0 +0200
+++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c  2005-08-11 
23:09:32.0 +0200
@@ -37,6 +37,7 @@ int handle_page_fault(unsigned long addr
pmd_t *pmd;
pte_t *pte;
int err = -EFAULT;
+   int access_mask = 0;
 
*code_out = SEGV_MAPERR;
down_read(&mm->mmap_sem);
@@ -55,14 +56,15 @@ int handle_page_fault(unsigned long addr
 good_area:
*code_out = SEGV_ACCERR;
if(is_write && !(vma->vm_flags & VM_WRITE)) 
-   goto out;
+   goto prot_bad;
 
 if(!(vma->vm_flags & (VM_READ | VM_EXEC)))
-goto out;
+goto prot_bad;
 
+   access_mask = is_write ? VM_WRITE : 0;
do {
-survive:
-   switch (handle_mm_fault(mm, vma, address, is_write)){
+handle_fault:
+   switch (__handle_mm_fault(mm, vma, address, access_mask)) {
case VM_FAULT_MINOR:
current->min_flt++;
break;
@@ -72,6 +74,9 @@ survive:
case VM_FAULT_SIGBUS:
err = -EACCES;
goto out;
+   case VM_FAULT_SIGSEGV:
+   err = -EFAULT;
+   goto out;
case VM_FAULT_OOM:
err = -ENOMEM;
goto out_of_memory;
@@ -87,10 +92,27 @@ survive:
*pte = pte_mkyoung(*pte);
if(pte_write(*pte)) *pte = pte_mkdirty(*pte);
flush_tlb_page(vma, address);
+
+   /* If the PTE is not present, the vma protection are not accurate if
+* VM_NONUNIFORM; present PTE's are correct for VM_NONUNIFORM and were
+* already handled otherwise. */
 out:
up_read(&mm->mmap_sem);
return(err);
 
+prot_bad:
+   if (unlikely(vma->vm_flags & VM_NONUNIFORM)) {
+   access_mask = is_write ? VM_WRITE : 0;
+   /* Otherwise, on a legitimate read fault on a page mapped as
+* exec-only, we get problems. Probably, we should lower
+* requirements... we should always test just
+* pte_read/write/exec, on vma->vm_page_prot! This way is
+* cumbersome. However, for now things should work for UML. */
+   access_mask |= vma->vm_flags & VM_EXEC ? VM_EXEC : VM_READ;
+   goto handle_fault;
+   }
+   goto out;
+   
 /*
  * We ran out of memory, or some other thing happened to us that made
  * us unable to handle the page fault gracefully.
@@ -100,7 +122,7 @@ out_of_memory:
up_read(&mm->mmap_sem);
yield();
down_read(&mm->mmap_sem);
-   goto survive;
+   goto handle_fault;
}
goto out;
 }
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 23/39] remap_file_pages protection support: fix try_to_unmap_one for VM_NONUNIFORM vma's

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

When unmapping linear but non uniform VMA's in try_to_unmap_one, we must
encode the prots in the PTE.

However, we shouldn't use the generic set_nonlinear_pte() function as it
allows for nonlinear offsets, on which we should instead BUG() in this code
path.

Additionally, add a missing TLB flush in both locations. However, there'is
some excess of flushes in these functions.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/rmap.c |5 +
 1 files changed, 5 insertions(+)

diff -puN mm/rmap.c~rfp-fix-unmap-linear mm/rmap.c
--- linux-2.6.git/mm/rmap.c~rfp-fix-unmap-linear2005-08-11 
23:07:12.0 +0200
+++ linux-2.6.git-paolo/mm/rmap.c   2005-08-11 23:07:12.0 +0200
@@ -543,6 +543,10 @@ static int try_to_unmap_one(struct page 
flush_cache_page(vma, address, page_to_pfn(page));
pteval = ptep_clear_flush(vma, address, pte);
 
+   /* If nonlinear, store the file page offset in the pte. */
+   set_nonlinear_pte(pteval, pte, vma, mm, page, address);
+   flush_tlb_page(vma, address);
+
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
set_page_dirty(page);
@@ -661,6 +665,7 @@ static void try_to_unmap_cluster(unsigne
 
/* If nonlinear, store the file page offset in the pte. */
set_nonlinear_pte(pteval, pte, vma, mm, page, address);
+   flush_tlb_page(vma, address);
 
/* Move the dirty bit to the physical page now the pte is gone. 
*/
if (pte_dirty(pteval))
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 36/39] remap_file_pages protection support: avoid lookup of pages for PROT_NONE remapping

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

This optimization avoid looking up pages for PROT_NONE mappings. The idea was
taken from the "wrong "historical" code for review - 1" one (the next one)
from mingo, but I fixed it, by adding another "detail" parameter. I've also
fixed the other callers to clear this parameter, and fixed madvise_dontneed()
to use memset(0) on its parameter - currently it's probably a bug.

Not even-compile tested, just written off the top of my head.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/linux/mm.h |1 +
 linux-2.6.git-paolo/mm/filemap.c   |   18 ++
 linux-2.6.git-paolo/mm/madvise.c   |   10 ++
 linux-2.6.git-paolo/mm/memory.c|   11 ---
 linux-2.6.git-paolo/mm/shmem.c |   11 +++
 5 files changed, 44 insertions(+), 7 deletions(-)

diff -puN mm/filemap.c~rfp-avoid-lookup-pages-miss-mapping mm/filemap.c
--- linux-2.6.git/mm/filemap.c~rfp-avoid-lookup-pages-miss-mapping  
2005-08-12 18:42:23.0 +0200
+++ linux-2.6.git-paolo/mm/filemap.c2005-08-12 19:14:39.0 +0200
@@ -1495,6 +1495,24 @@ int filemap_populate(struct vm_area_stru
struct page *page;
int err;
 
+   /*
+* mapping-removal fastpath:
+*/
+   if ((vma->vm_flags & VM_SHARED) &&
+   (pgprot_val(prot) == pgprot_val(PAGE_NONE))) {
+   struct zap_details details;
+
+   /* Still do error-checking! */
+   size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> 
PAGE_CACHE_SHIFT;
+   if (pgoff + (len >> PAGE_CACHE_SHIFT) > size)
+   return -EINVAL;
+
+   memset(&details, 0, sizeof(details));
+   details.prot_none_ptes = 1;
+   zap_page_range(vma, addr, len, &details);
+   return 0;
+   }
+
if (!nonblock)
force_page_cache_readahead(mapping, vma->vm_file,
pgoff, len >> PAGE_CACHE_SHIFT);
diff -puN mm/shmem.c~rfp-avoid-lookup-pages-miss-mapping mm/shmem.c
--- linux-2.6.git/mm/shmem.c~rfp-avoid-lookup-pages-miss-mapping
2005-08-12 18:42:23.0 +0200
+++ linux-2.6.git-paolo/mm/shmem.c  2005-08-12 19:11:52.0 +0200
@@ -1186,6 +1186,17 @@ static int shmem_populate(struct vm_area
if (pgoff >= size || pgoff + (len >> PAGE_SHIFT) > size)
return -EINVAL;
 
+   /*
+* mapping-removal fastpath:
+*/
+   if ((vma->vm_flags & VM_SHARED) &&
+   (pgprot_val(prot) == pgprot_val(PAGE_NONE))) {
+   memset(&details, 0, sizeof(details));
+   details.prot_none_ptes = 1;
+   zap_page_range(vma, addr, len, &details);
+   return 0;
+   }
+
while ((long) len > 0) {
struct page *page = NULL;
int err;
diff -puN mm/memory.c~rfp-avoid-lookup-pages-miss-mapping mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-avoid-lookup-pages-miss-mapping   
2005-08-12 18:44:29.0 +0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-12 19:09:50.0 +0200
@@ -575,11 +575,14 @@ static void zap_pte_range(struct mmu_gat
 * If details->check_mapping, we leave swap entries;
 * if details->nonlinear_vma, we leave file entries.
 */
-   if (unlikely(details))
+   if (unlikely(details) && !details->prot_none_ptes)
continue;
if (!pte_file(ptent))
free_swap_and_cache(pte_to_swp_entry(ptent));
-   pte_clear(tlb->mm, addr, pte);
+   if (unlikely(details->prot_none_ptes))
+   set_pte_at(mm, addr, pte, pfn_pte(0, __S000));
+   else
+   pte_clear(tlb->mm, addr, pte);
} while (pte++, addr += PAGE_SIZE, addr != end);
pte_unmap(pte - 1);
 }
@@ -623,7 +626,8 @@ static void unmap_page_range(struct mmu_
pgd_t *pgd;
unsigned long next;
 
-   if (details && !details->check_mapping && !details->nonlinear_vma)
+   if (details && !details->check_mapping && !details->nonlinear_vma &&
+   !details->prot_none_ptes)
details = NULL;
 
BUG_ON(addr >= end);
@@ -1499,6 +1503,7 @@ void unmap_mapping_range(struct address_
if (details.last_index < details.first_index)
details.last_index = ULONG_MAX;
details.i_mmap_lock = &mapping->i_mmap_lock;
+   details.prot_none_ptes = 0;
 
spin_lock(&mapping->i_mmap_lock);
 
diff -puN include/lin

[patch 37/39] remap_file_pages protection support: wrong "historical" code for review - 1

2005-08-12 Thread blaisorblade


From: Ingo Molnar <[EMAIL PROTECTED]>, Paolo 'Blaisorblade' Giarrusso <[EMAIL 
PROTECTED]>

This "fast-path" was contained in the original
remap-file-pages-prot-2.6.4-rc1-mm1-A1.patch from Ingo Molnar*; I think this
code is wrong, but I'm sending it for review anyway, because I'm unsure (and
in fact, in the end I found the reason for this).

What I think is that this patch (done only for filemap_populate, not for
shmem_populate) calls zap_page_range() when installing mappings with PROT_NONE
protection. The purpose is to avoid an useless page lookup; but the PTE's will
be simply marked as absent, not as _PAGE_NONE. So, with this fastpath, pages
would be remapped again in their "default" position.

In this case, probably a possible fix is to add yet another param in
"zap_details" to mark all PTE's as PROT_NONE ones. Using
details->nonlinear_vma has the inconvenient of using
details->{first,last}_index and of leaving file entries unchanged.

* available at
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.5/2.6.5-mm1/dropped/remap-file-pages-prot-2.6.4-rc1-mm1-A1.patch

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/filemap.c |9 +
 1 files changed, 9 insertions(+)

diff -puN mm/filemap.c~rfp-wrong2 mm/filemap.c
--- linux-2.6.git/mm/filemap.c~rfp-wrong2   2005-08-12 18:31:32.0 
+0200
+++ linux-2.6.git-paolo/mm/filemap.c2005-08-12 18:31:32.0 +0200
@@ -1495,6 +1495,15 @@ int filemap_populate(struct vm_area_stru
struct page *page;
int err;
 
+   /*
+* mapping-removal fastpath:
+*/
+   if ((vma->vm_flags & VM_SHARED) &&
+   (pgprot_val(prot) == pgprot_val(PAGE_NONE))) {
+   zap_page_range(vma, addr, len, NULL);
+   return 0;
+   }
+
if (!nonblock)
force_page_cache_readahead(mapping, vma->vm_file,
pgoff, len >> PAGE_CACHE_SHIFT);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 38/39] [RFC] remap_file_pages protection support: avoid dirtying on read faults for NONUNIFORM pages

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

When installing pages on non-uniform VMA's, even for read faults we must
install them writable if the VMA is writable (we won't have a chance to fix
that). Normally, on write faults, we install the PTE as dirty (there's a
comment about 80386 on this), but maybe it's not needed here on read faults.

I've looked for more info about that comment - unfortunately, it's there
almost unchanged since 2.4.0, so I've found no info.

However, UML does depend on the old behaviour currently (trivial to cure,
anyway). And if other arch's don't have an hardware "dirty" bit, they'll
depend on this too.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/memory.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletion(-)

diff -puN mm/memory.c~rfp-fault-optim-risky mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-fault-optim-risky 2005-08-12 
19:25:16.0 +0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-12 19:25:16.0 +0200
@@ -1899,8 +1899,10 @@ retry:
 * been set (we can have a writeable VMA with a read-only PTE),
 * so we must set the *exact* permission on fault, and avoid
 * calling do_wp_page on write faults. */
-   if (write_access || unlikely(vma->vm_flags & VM_NONUNIFORM))
+   if (write_access)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+   else if (unlikely(vma->vm_flags & VM_NONUNIFORM))
+   entry = maybe_mkwrite(entry, vma);
set_pte_at(mm, address, page_table, entry);
if (anon) {
lru_cache_add_active(new_page);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 35/39] remap_file_pages protection support: avoid redundant pte_file PTE's

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

For linear VMA's, there is no need to install pte_file PTEs to remember the
offset. We could probably go as far as checking directly the address and
protection like in include/linux/pagemap.h:set_nonlinear_pte(), instead of
vma->vm_flags. Also add some warnings on the path which used to cope with such
PTE's.

Untested yet.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |   12 ++--
 linux-2.6.git-paolo/mm/memory.c |5 +
 2 files changed, 11 insertions(+), 6 deletions(-)

diff -puN mm/fremap.c~rfp-linear-optim-v3 mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-linear-optim-v3   2005-08-11 
23:20:09.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-11 23:20:09.0 +0200
@@ -125,6 +125,12 @@ int install_file_pte(struct mm_struct *m
 
BUG_ON(!uniform && !(vma->vm_flags & VM_SHARED));
 
+   /* We're being called by mmap(MAP_NONBLOCK|MAP_POPULATE) on an uniform
+* VMA. So don't need to take the lock, and to install a PTE for the
+* page we'd fault in anyway. */
+   if (uniform)
+   return 0;
+
pgd = pgd_offset(mm, addr);
spin_lock(&mm->page_table_lock);

@@ -139,12 +145,6 @@ int install_file_pte(struct mm_struct *m
pte = pte_alloc_map(mm, pmd, addr);
if (!pte)
goto err_unlock;
-   /*
-* Skip uniform non-existent ptes:
-*/
-   err = 0;
-   if (uniform && pte_none(*pte))
-   goto err_unlock;
 
zap_pte(mm, vma, addr, pte);
 
diff -puN mm/memory.c~rfp-linear-optim-v3 mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-linear-optim-v3   2005-08-11 
23:20:09.0 +0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-11 23:20:09.0 +0200
@@ -1969,9 +1969,14 @@ static int do_file_page(struct mm_struct
/*
 * Fall back to the linear mapping if the fs does not support
 * ->populate; in this case do the protection checks.
+* Could have been installed by install_file_pte, for a MAP_NONBLOCK
+* pagetable population.
 */
if (!vma->vm_ops->populate ||
((access_mask & VM_WRITE) && !(vma->vm_flags & 
VM_SHARED))) {
+   /* remap_file_pages should disallow this, now that
+* install_file_pte skips linear ones. */
+   WARN_ON(1);
/* We're behaving as if pte_file was cleared, so check
 * protections like in handle_pte_fault. */
if (check_perms(vma, access_mask))
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 32/39] remap_file_pages protection support: fix i386 handler

2005-08-12 Thread blaisorblade


From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Actually, with the current model, we should get a failure with VMA's mapped
with only PROT_WRITE (even if I wasn't able to verify that in UML, which has
similar code).

To test!

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/i386/mm/fault.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

diff -puN arch/i386/mm/fault.c~rfp-fault-sigsegv-3-i386 arch/i386/mm/fault.c
--- linux-2.6.git/arch/i386/mm/fault.c~rfp-fault-sigsegv-3-i386 2005-08-12 
17:12:51.0 +0200
+++ linux-2.6.git-paolo/arch/i386/mm/fault.c2005-08-12 17:12:51.0 
+0200
@@ -381,7 +381,8 @@ bad_area_prot:
 * requirements... we should always test just
 * pte_read/write/exec, on vma->vm_page_prot! This way is
 * cumbersome. However, for now things should work for i386. */
-   access_mask |= vma->vm_flags & VM_EXEC ? VM_EXEC : VM_READ;
+   access_mask |= vma->vm_flags & VM_EXEC ? VM_EXEC :
+   (vma->vm_flags & VM_READ ? VM_READ : VM_WRITE );
goto handle_fault;
}
 /*
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 39/39] remap_file_pages protection support: wrong "historical" code for review - 2

2005-08-12 Thread blaisorblade


From: Ingo Molnar <[EMAIL PROTECTED]>, Paolo 'Blaisorblade' Giarrusso <[EMAIL 
PROTECTED]>

This "fast-path" was contained in the original
remap-file-pages-prot-2.6.4-rc1-mm1-A1.patch from Ingo Molnar; I think this
code is wrong, but I'm sending it for review anyway, because I'm unsure (and
in fact, in the end I found the reason for this).

I guess this code is intended for when we're called by sys_remap_file_page,
without altering pgoffset or protections (otherwise we'd refuse operation on a
private mapping). This cannot happen with mmap(MAP_POPULATE) because we clear
old mappings. And the code makes sense only if we COW'ed a page, because
otherwise the old mapping is already correct. I'm not sure whether we should
fail here - maybe skipping the PTE would be more appropriate. Or we could
anyway turn the nonblock param into a bitmask and pass O_TRUNC there.

However, this is wrong because both routines can be called from within
do_file_page, which is called when !pte_present(pte) && !pte_none(pte) &&
pte_file(pte). I.e.  the pte is not zeroed, so it has been used, but the page
has been swapped out, or the page hasn't been loaded in first place (for
instance for MAP_NONBLOCK).

More accurately, in that situation ->populate is called with nonblock == 0, so
only install_page can be called there. If ->populate fails, the faulting
process will get an inappropriate SIGBUS.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |   15 +++
 1 files changed, 15 insertions(+)

diff -puN mm/fremap.c~rfp-wrong mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-wrong 2005-08-12 18:42:23.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-12 18:42:23.0 +0200
@@ -90,6 +90,14 @@ int install_page(struct mm_struct *mm, s
if (!page->mapping || page->index >= size)
goto err_unlock;
 
+   /*
+* Only install a new page for a non-shared mapping if it's
+* not existent yet:
+*/
+   err = -EEXIST;
+   if (!pte_none(*pte) && !(vma->vm_flags & VM_SHARED))
+   goto err_unlock;
+
zap_pte(mm, vma, addr, pte);
 
inc_mm_counter(mm,rss);
@@ -145,6 +153,13 @@ int install_file_pte(struct mm_struct *m
pte = pte_alloc_map(mm, pmd, addr);
if (!pte)
goto err_unlock;
+   /*
+* Only install a new page for a non-shared mapping if it's
+* not existent yet:
+*/
+   err = -EEXIST;
+   if (!pte_none(*pte) && !(vma->vm_flags & VM_SHARED))
+   goto err_unlock;
 
zap_pte(mm, vma, addr, pte);
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 4/8] irq code: Add coherence test for PREEMPT_ACTIVE

2005-05-26 Thread Blaisorblade

On Friday 27 May 2005 02:38, [EMAIL PROTECTED] wrote:
> After porting this fixlet to UML:
>
> http://linux.bkbits.net:8080/linux-2.5/[EMAIL PROTECTED]
>
> , I've also added a warning which should refuse compilation with insane
> values for PREEMPT_ACTIVE... maybe we should simply move PREEMPT_ACTIVE out
> of architectures using GENERIC_IRQS.
Ok, a grep shows that possible culprits (i.e. giving success to
grep GENERIC_HARDIRQS arch/*/Kconfig, and using 0x400 as PREEMPT_ACTIVE, 
as given by grep PREEMPT_ACTIVE include/asm-*/thread_info.h) are (at a first 
glance): frv, sh, sh64.

After a bit of checking, I also verified if they had overriden the value of 
HARDIRQ_BITS. Which they haven't (it seems it's defined exactly where 
CONFIG_HARDIRQS is not used, i.e. nobody is using the ability to override it 
currently).

This was not a very deep investigation, however, so feel free to verify this 
better.
-- 
Paolo Giarrusso, aka Blaisorblade
Skype user "PaoloGiarrusso"
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade





___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

Re: [patch 4/8] irq code: Add coherence test for PREEMPT_ACTIVE

2005-05-28 Thread Blaisorblade

On Friday 27 May 2005 15:33, David Howells wrote:
> Blaisorblade <[EMAIL PROTECTED]> wrote:
> > Ok, a grep shows that possible culprits (i.e. giving success to
> > grep GENERIC_HARDIRQS arch/*/Kconfig, and using 0x400 as
> > PREEMPT_ACTIVE, as given by grep PREEMPT_ACTIVE
> > include/asm-*/thread_info.h) are (at a first glance): frv, sh, sh64.
>
> For FRV that's simply because it got copied from the parent arch along with
> other stuff. Feel free to move it... Do you want me to make you up a patch
> to do so?
Sorry but fix that yourself, otherwise you get a chance I'll forget since I'm 
quite busy.

Thanks a lot for attention.
-- 
Paolo Giarrusso, aka Blaisorblade
Skype user "PaoloGiarrusso"
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade


___ 
Yahoo! Messenger: chiamate gratuite in tutto il mondo 
http://it.beta.messenger.yahoo.com

Re: [patch 4/8] irq code: Add coherence test for PREEMPT_ACTIVE

2005-05-28 Thread Blaisorblade

On Friday 27 May 2005 05:31, Paul Mundt wrote:
> On Fri, May 27, 2005 at 03:06:09AM +0200, Blaisorblade wrote:
> > On Friday 27 May 2005 02:38, [EMAIL PROTECTED] wrote:

> > Ok, a grep shows that possible culprits (i.e. giving success to
> > grep GENERIC_HARDIRQS arch/*/Kconfig, and using 0x400 as
> > PREEMPT_ACTIVE, as given by grep PREEMPT_ACTIVE
> > include/asm-*/thread_info.h) are (at a first glance): frv, sh, sh64.

> Yeah, that's bogus for sh and sh64 anyways, this should do it.

> It would be nice to move PRREMPT_ACTIVE so it isn't per-arch anymore,
> there's not many users that use a different value (at least for the ones
> using generic hardirqs, ia64 seems to be the only one?).

Then in the generic headers
#ifndef PREEMPT_ACTIVE
#define PREEMPT_ACTIVE 
#else

#endif
Would be ok, right?
-- 
Paolo Giarrusso, aka Blaisorblade
Skype user "PaoloGiarrusso"
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade





___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

1 2 3 4 >

1 - 100 of 324 matches

Mail list logo