Re: [uml-devel] [PATCH 3/6] UML - Userspace files should call libc directly

2007-08-26 Thread Blaisorblade
On venerdì 17 agosto 2007, Jeff Dike wrote:
> A number of files that were changed in the recent removal of tt mode
> are userspace files which call the os_* wrappers instead of calling
> libc directly.  A few other files were affected by this, through

> os_print_error has no remaining callers, so it is deleted.

Actually, os_print_error() (or some other interface to perror()) should be 
reintroduced and used more for error messages (and also strsignal() should 
be, for "Kernel mode signal N"). Problem debugging is hard, but there is no 
reason make it harder (or rather, not to simplify it for users).

Btw, the inlined abs() call is not very nice; on the other hand, it's a simple 
solution to make it robust, and we do not to be extra-optimal on these debug 
code paths.

Bye
-- 
"Doh!" (cit.), I've made another mistake!
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade


signature.asc
Description: This is a digitally signed message part.


Re: [uml-devel] [PATCH 6/6] UML - Fix hostfs style

2007-08-24 Thread Blaisorblade
On venerdì 24 agosto 2007, Jeff Dike wrote:
> On Thu, Aug 23, 2007 at 04:54:59PM +0200, Blaisorblade wrote:
> > > actually. Personally I'd prefer:
> > >
> > >   else
> > >   type = OS_TYPE_DIR;
> >
> > I strongly agree with this style; beyond style itself, one strong reason
> > is that joining statements hinder singlestepping through function code
> > (it's easy to run gdb on UML, and anyway kgdb exists).
>
> How does that help?  gdb should stop as easily on a "else foo;" line as on
>   else
>   foo;
> right?
Sorry, a better example is on:

if (bar)
foo;
where the test and foo are two distinct parts. One step is "I execute the if", 
another (possible) step is "I perform foo" - which is not easy to tell if it 
is not on a different line.
-- 
"Doh!" (cit.), I've made another mistake!
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade


signature.asc
Description: This is a digitally signed message part.


[PATCH] Script to check for undefined Kconfig symbols - v2

2007-08-24 Thread Paolo 'Blaisorblade' Giarrusso
In this version, I've updated the scripts to search for "\<$symb_bare\>" instead
of $symb_bare in Kconfig files. Please ignore my previous message.

To avoid to look manually for used but undefined Kconfig variables, I've
written a script which tries do this efficiently, in case all other attention
fail. It accounts for _MODULE suffix and for UML_ prefixes to Kconfig variable,
but otherwise looks for exact matches (i.e. \
---

 scripts/checkunknowndefines.sh |   59 
 1 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/scripts/checkunknowndefines.sh b/scripts/checkunknowndefines.sh
new file mode 100755
index 000..dbb5cef
--- /dev/null
+++ b/scripts/checkunknowndefines.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+# Find Kconfig variables used in source code but never defined in Kconfig
+# Copyright (C) 2007, Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
+
+# Tested with dash.
+paths="$@"
+[ -z "$paths" ] && paths=.
+
+# Doing this once at the beginning saves a lot of time, on a cache-hot tree.
+Kconfigs="`find . -name 'Kconfig' -o -name 'Kconfig*[^~]'`"
+
+echo "File list \tundefined symbol used"
+find $paths -name '*.[chS]' -o -name 'Makefile' -o -name 'Makefile*[^~]'| 
while read i
+do
+   # Output the bare Kconfig variable and the filename; the _MODULE part at
+   # the end is not removed here (would need perl an not-hungry regexp for 
that).
+   sed -ne 's!^.*\<\(UML_\)\?CONFIG_\([0-9A-Z_]\+\).*!\2 '$i'!p' < $i
+done | \
+# Smart "sort|uniq" implemented in awk and tuned to collect the names of all
+# files which use a given symbol
+awk '{map[$1, count[$1]++] = $2; }
+END {
+   for (combIdx in map) {
+   split(combIdx, separate, SUBSEP);
+   # The value may have been removed.
+   if (! ( (separate[1], separate[2]) in map ) )
+   continue;
+   symb=separate[1];
+   printf "%s ", symb;
+   #Use gawk extension to delete the names vector
+   delete names;
+   #Portably delete the names vector
+   #split("", names);
+   for (i=0; i < count[symb]; i++) {
+   names[map[symb, i]] = 1;
+   # Unfortunately, we may still encounter symb, i in the
+   # outside iteration.
+   delete map[symb, i];
+   }
+   i=0;
+   for (name in names) {
+   if (i > 0)
+   printf ", %s", name;
+   else
+   printf "%s", name;
+   i++;
+   }
+   printf "\n";
+   }
+}' |
+while read symb files; do
+   # Remove the _MODULE suffix when checking the variable name. This should
+   # be done only on tristate symbols, actually, but Kconfig parsing is
+   # beyond the purpose of this script.
+   symb_bare=`echo $symb | sed -e 's/_MODULE//'`
+   if ! grep -q "\<$symb_bare\>" $Kconfigs; then
+   echo "$files: \t$symb"
+   fi
+done|sort

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Script to check for undefined Kconfig symbols

2007-08-24 Thread Paolo &#x27;Blaisorblade' Giarrusso
To avoid to look manually for used but undefined Kconfig variables, I've
written a script which tries do this efficiently, in case all other attention
fail. It accounts for _MODULE suffix and for UML_ prefixes to Kconfig variable,
but otherwise looks for exact matches (i.e. \
---

 scripts/checkunknowndefines.sh |   59 
 1 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/scripts/checkunknowndefines.sh b/scripts/checkunknowndefines.sh
new file mode 100755
index 000..d15950e
--- /dev/null
+++ b/scripts/checkunknowndefines.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+# Find Kconfig variables used in source code but never defined in Kconfig
+# Copyright (C) 2007, Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
+
+# Tested with dash.
+paths="$@"
+[ -z "$paths" ] && paths=.
+
+# Doing this once at the beginning saves a lot of time, on a cache-hot tree.
+Kconfigs="`find . -name 'Kconfig' -o -name 'Kconfig*[^~]'`"
+
+echo "File list \tundefined symbol used"
+find $paths -name '*.[chS]' -o -name 'Makefile' -o -name 'Makefile*[^~]'| 
while read i
+do
+   # Output the bare Kconfig variable and the filename; the _MODULE part at
+   # the end is not removed here (would need perl an not-hungry regexp for 
that).
+   sed -ne 's!^.*\<\(UML_\)\?CONFIG_\([0-9A-Z_]\+\).*!\2 '$i'!p' < $i
+done | \
+# Smart "sort|uniq" implemented in awk and tuned to collect the names of all
+# files which use a given symbol
+awk '{map[$1, count[$1]++] = $2; }
+END {
+   for (combIdx in map) {
+   split(combIdx, separate, SUBSEP);
+   # The value may have been removed.
+   if (! ( (separate[1], separate[2]) in map ) )
+   continue;
+   symb=separate[1];
+   printf "%s ", symb;
+   #Use gawk extension to delete the names vector
+   delete names;
+   #Portably delete the names vector
+   #split("", names);
+   for (i=0; i < count[symb]; i++) {
+   names[map[symb, i]] = 1;
+   # Unfortunately, we may still encounter symb, i in the
+   # outside iteration.
+   delete map[symb, i];
+   }
+   i=0;
+   for (name in names) {
+   if (i > 0)
+   printf ", %s", name;
+   else
+   printf "%s", name;
+   i++;
+   }
+   printf "\n";
+   }
+}' |
+while read symb files; do
+   # Remove the _MODULE suffix when checking the variable name. This should
+   # be done only on tristate symbols, actually, but Kconfig parsing is
+   # beyond the purpose of this script.
+   symb_bare=`echo $symb | sed -e 's/_MODULE//'`
+   if ! grep -q $symb_bare $Kconfigs; then
+   echo "$files: \t$symb"
+   fi
+done|sort

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Replace CONFIG_USB_OHCI with CONFIG_USB_OHCI_HCD in a few overlooked files

2007-08-24 Thread Paolo &#x27;Blaisorblade' Giarrusso
Finish the rename of CONFIG_USB_OHCI to CONFIG_USB_OHCI_HCD, which started
in 2005 (before 2.6.12-rc2). The patch in this message has not been applied yet;
moreover, it is not something to fix afterwards. I've verified that no more
instances of 'CONFIG_USB_[UOE]HCI\>' exist in the source tree.

http://www.linux-mips.org/archives/linux-mips/2005-06/msg00060.html

I'm also sending a script to detect undefined Kconfig variables in next patch.

Thanks to my colleague Giuseppe Patanè for the original report: he discovered
the original mail (above) and for verified that the fix had not yet been
applied.

Cc: Giuseppe Patanè <[EMAIL PROTECTED]>
Cc: Ralf Baechle <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/mips/au1000/mtx-1/board_setup.c  |4 ++--
 arch/mips/au1000/pb1000/board_setup.c |6 +++---
 arch/mips/au1000/pb1100/board_setup.c |4 ++--
 arch/mips/au1000/pb1500/board_setup.c |6 +++---
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/mips/au1000/mtx-1/board_setup.c 
b/arch/mips/au1000/mtx-1/board_setup.c
index 7bc5af8..1688ca9 100644
--- a/arch/mips/au1000/mtx-1/board_setup.c
+++ b/arch/mips/au1000/mtx-1/board_setup.c
@@ -54,11 +54,11 @@ void board_reset (void)
 
 void __init board_setup(void)
 {
-#ifdef CONFIG_USB_OHCI
+#ifdef CONFIG_USB_OHCI_HCD
// enable USB power switch
au_writel( au_readl(GPIO2_DIR) | 0x10, GPIO2_DIR );
au_writel( 0x10, GPIO2_OUTPUT );
-#endif // defined (CONFIG_USB_OHCI)
+#endif // defined (CONFIG_USB_OHCI_HCD)
 
 #ifdef CONFIG_PCI
 #if defined(__MIPSEB__)
diff --git a/arch/mips/au1000/pb1000/board_setup.c 
b/arch/mips/au1000/pb1000/board_setup.c
index 824cfaf..f25b38f 100644
--- a/arch/mips/au1000/pb1000/board_setup.c
+++ b/arch/mips/au1000/pb1000/board_setup.c
@@ -54,7 +54,7 @@ void __init board_setup(void)
au_writel(0, SYS_PINSTATERD);
udelay(100);
 
-#ifdef CONFIG_USB_OHCI
+#ifdef CONFIG_USB_OHCI_HCD
/* zero and disable FREQ2 */
sys_freqctrl = au_readl(SYS_FREQCTRL0);
sys_freqctrl &= ~0xFFF0;
@@ -102,7 +102,7 @@ void __init board_setup(void)
/*
 * Route 48MHz FREQ2 into USB Host and/or Device
 */
-#ifdef CONFIG_USB_OHCI
+#ifdef CONFIG_USB_OHCI_HCD
sys_clksrc |= ((4<<12) | (0<<11) | (0<<10));
 #endif
au_writel(sys_clksrc, SYS_CLKSRC);
@@ -116,7 +116,7 @@ void __init board_setup(void)
au_writel(pin_func, SYS_PINFUNC);
au_writel(0x2800, SYS_TRIOUTCLR);
au_writel(0x0030, SYS_OUTPUTCLR);
-#endif // defined (CONFIG_USB_OHCI)
+#endif // defined (CONFIG_USB_OHCI_HCD)
 
// make gpio 15 an input (for interrupt line)
pin_func = au_readl(SYS_PINFUNC) & (u32)(~0x100);
diff --git a/arch/mips/au1000/pb1100/board_setup.c 
b/arch/mips/au1000/pb1100/board_setup.c
index 6bc1f8e..3205f88 100644
--- a/arch/mips/au1000/pb1100/board_setup.c
+++ b/arch/mips/au1000/pb1100/board_setup.c
@@ -54,7 +54,7 @@ void __init board_setup(void)
au_writel(0, SYS_PININPUTEN);
udelay(100);
 
-#ifdef CONFIG_USB_OHCI
+#ifdef CONFIG_USB_OHCI_HCD
{
u32 pin_func, sys_freqctrl, sys_clksrc;
 
@@ -98,7 +98,7 @@ void __init board_setup(void)
pin_func |= 0x8000;
au_writel(pin_func, SYS_PINFUNC);
}
-#endif // defined (CONFIG_USB_OHCI)
+#endif // defined (CONFIG_USB_OHCI_HCD)
 
/* Enable sys bus clock divider when IDLE state or no bus activity. */
au_writel(au_readl(SYS_POWERCTRL) | (0x3 << 5), SYS_POWERCTRL);
diff --git a/arch/mips/au1000/pb1500/board_setup.c 
b/arch/mips/au1000/pb1500/board_setup.c
index c9b6556..118e32a 100644
--- a/arch/mips/au1000/pb1500/board_setup.c
+++ b/arch/mips/au1000/pb1500/board_setup.c
@@ -56,7 +56,7 @@ void __init board_setup(void)
au_writel(0, SYS_PINSTATERD);
udelay(100);
 
-#ifdef CONFIG_USB_OHCI
+#ifdef CONFIG_USB_OHCI_HCD
 
/* GPIO201 is input for PCMCIA card detect */
/* GPIO203 is input for PCMCIA interrupt request */
@@ -85,7 +85,7 @@ void __init board_setup(void)
/*
 * Route 48MHz FREQ2 into USB Host and/or Device
 */
-#ifdef CONFIG_USB_OHCI
+#ifdef CONFIG_USB_OHCI_HCD
sys_clksrc |= ((4<<12) | (0<<11) | (0<<10));
 #endif
au_writel(sys_clksrc, SYS_CLKSRC);
@@ -95,7 +95,7 @@ void __init board_setup(void)
// 2nd USB port is USB host
pin_func |= 0x8000;
au_writel(pin_func, SYS_PINFUNC);
-#endif // defined (CONFIG_USB_OHCI)
+#endif // defined (CONFIG_USB_OHCI_HCD)
 
 
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] usbmon doc update - mention new wildcard ('0') bus

2007-08-24 Thread Paolo &#x27;Blaisorblade' Giarrusso
Update usbmon documentation, mentioning the "zero" (wildcard) bus.
Possibly, in my first hunk, the 'either ... or ...' should be rephrased a bit to
be expressed better.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
Cc: USB development list <[EMAIL PROTECTED]>
Cc: Pete Zaitcev <[EMAIL PROTECTED]>
Cc: Alan Stern <[EMAIL PROTECTED]>
---

 Documentation/usb/usbmon.txt |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/Documentation/usb/usbmon.txt b/Documentation/usb/usbmon.txt
index 53ae866..2917ce4 100644
--- a/Documentation/usb/usbmon.txt
+++ b/Documentation/usb/usbmon.txt
@@ -34,9 +34,12 @@ if usbmon is built into the kernel.
 Verify that bus sockets are present.
 
 # ls /sys/kernel/debug/usbmon
-1s  1t  1u  2s  2t  2u  3s  3t  3u  4s  4t  4u
+0s  0t  0u  1s  1t  1u  2s  2t  2u  3s  3t  3u  4s  4t  4u
 #
 
+Now you can choose to either use the sockets numbered '0' (to capture packets 
on
+all buses), and skip to step #3, or find the bus used by your device with step 
#2.
+
 2. Find which bus connects to the desired device
 
 Run "cat /proc/bus/usb/devices", and find the T-line which corresponds to
@@ -56,6 +59,10 @@ Bus=03 means it's bus 3.
 
 # cat /sys/kernel/debug/usbmon/3u > /tmp/1.mon.out
 
+to listen on a single bus, otherwise, to listen on all buses, type:
+
+# cat /sys/kernel/debug/usbmon/0u > /tmp/1.mon.out
+
 This process will be reading until killed. Naturally, the output can be
 redirected to a desirable location. This is preferred, because it is going
 to be quite long.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH 6/6] UML - Fix hostfs style

2007-08-23 Thread Blaisorblade
On sabato 18 agosto 2007, Satyam Sharma wrote:
> On Fri, 17 Aug 2007, Jeff Dike wrote:
> > Style fixes in hostfs.

> > @@ -328,17 +326,17 @@ int hostfs_readdir(struct file *file, vo
> > [...]
> > -   if(error) break;
> > +   if (error) break;
>
>   if (error)
>   break;
>
> > @@ -522,28 +523,28 @@ static int init_inode(struct inode *inod
> > [...]
> > else type = OS_TYPE_DIR;
>
> I wonder what's the generally accepted / followed coding style for this,
> actually. Personally I'd prefer:
>
>   else
>   type = OS_TYPE_DIR;

I strongly agree with this style; beyond style itself, one strong reason is 
that joining statements hinder singlestepping through function code (it's 
easy to run gdb on UML, and anyway kgdb exists).

Bye
-- 
"Doh!" (cit.), I've made another mistake!
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade


signature.asc
Description: This is a digitally signed message part.


Re: [uml-devel] [PATCH] UML - Add a .note.SuSE section

2007-08-23 Thread Blaisorblade
On mercoledì 22 agosto 2007, Jeff Dike wrote:
> On Tue, Aug 21, 2007 at 07:05:53PM +0200, Blaisorblade wrote:
> > It's not the first time we hit effects of such bugs, is it?
>
> I don't remember seeing this before.
>
> > The .note.ABI-tag fix, time ago, may be about the same problem.
>
> Are you referring to
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdi
>ff;h=c35e584c087381aaa5f1ed40a28b978535c18fb2;hp=a5bd1786fb30abe663b904f6d79
>bba413e9ba883?

Yes.

>If so, I never understood that - it just came in saying "this 
> fixes static building", so I sent it along.

In this case, I'm referring to the patch which had a typo, which is yours:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7632fc8f809a97f9d82ce125e8e3e579390ce2e5
Description follows:
"During a static link, ld has started putting a .note section in the
.uml.setup.init section.  This has the result that the UML setups begin
with 32 bytes of garbage and UML crashes immediately on boot.

This patch creates a specific .note section for ld to drop this stuff
into."

My patch only made your change work for real - IIRC you had fixed that exact 
typo too, but you forgot to run quilt refresh before sending the patch (btw, 
quilt pop -a will force you to refresh all patches to succeed - I do it 
frequently).

> BTW, that commit was singled 
> out by git-bisect as "causing" this particular problem.
>
> > Can you
> > double-check all UML linker scripts for more instances of this bug?
>
> I did, I have a patch, and it's been verified to fix the problem.

In this case, we _may_ want to remove the .note section altogether - even if 
it is likely to shake out more problems.

Good bye!
-- 
"Doh!" (cit.), I've made another another mistake!
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade


signature.asc
Description: This is a digitally signed message part.


Re: [uml-devel] [PATCH] UML - Add a .note.SuSE section

2007-08-21 Thread Blaisorblade
On giovedì 16 agosto 2007, Jeff Dike wrote:
> On Thu, Aug 16, 2007 at 10:04:55PM +0200, Sam Ravnborg wrote:
> > On Thu, Aug 16, 2007 at 03:26:39PM -0400, Jeff Dike wrote:
> > > The crash is in this section:
> > >
> > >   __uml_setup_start = .;
> > >   .uml.setup.init : { *(.uml.setup.init) }
> > >   __uml_setup_end = .;
> >
> > This looks like a classic bug.
> > You wanted this:
> > .uml.setup.init : {
> > __uml_setup_start = .;
> > *(.uml.setup.init)
> > __uml_setup_end = .;
> > }
>
> Ooh, this sounds promising, thanks.

It's not the first time we hit effects of such bugs, is it? The .note.ABI-tag 
fix, time ago, may be about the same problem. Can you double-check all UML 
linker scripts for more instances of this bug?

Bye
-- 
"Doh!" (cit.), I've made another another mistake!
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade


signature.asc
Description: This is a digitally signed message part.


Re: [uml-devel] [PATCH 4/5] UML - Simplify helper stack handling

2007-07-03 Thread Blaisorblade
On giovedì 28 giugno 2007, Andrew Morton wrote:
> So I'm running the generic version of this on i386 with 8k stacks (below),
> with a quick LTP run.
>
> Holy cow, either we use a _lot_ of stack or these numbers are off:
>
> vmm:/home/akpm> dmesg -s 100|grep 'bytes left'
> khelper used greatest stack depth: 7176 bytes left
> khelper used greatest stack depth: 7064 bytes left
> khelper used greatest stack depth: 6840 bytes left
> khelper used greatest stack depth: 6812 bytes left
> hostname used greatest stack depth: 6636 bytes left
> uname used greatest stack depth: 6592 bytes left
> uname used greatest stack depth: 6284 bytes left
> hotplug used greatest stack depth: 5568 bytes left
> rpc.nfsd used greatest stack depth: 5136 bytes left
> chown02 used greatest stack depth: 4956 bytes left
> fchown01 used greatest stack depth: 4892 bytes left

> That's the sum of process stack and interrupt stack, but I doubt if this
> little box is using much interrupt stack space.
>
> No wonder people are still getting stack overflows with 4k stacks...

First, those numbers pretend to be _unused_ stack space.

Well, UML tends to use more stack space than the rest of kernel. Apart it has 
a bit more layering (even if less than in the past), we must use libc's 
function too, and they're not written to be executed on an 8k stack.

We've reimplemented libc's printf() in terms of kernel sprintf() because it 
used 32K of stack.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade


signature.asc
Description: This is a digitally signed message part.


Re: [uml-devel] [PATCH 2/2] UML - Add stack usage monitoring

2007-06-20 Thread Blaisorblade
On mercoledì 20 giugno 2007, Jeff Dike wrote:
> On Wed, Jun 20, 2007 at 04:06:58PM +0200, Blaisorblade wrote:
> > Oh, it's exactly what CONFIG_DEBUG_STACK_USAGE does for i386... (not sure
> > if you were still wondering...).
>
> Where?  The only usage in i386 that I see is thread_info.h zeroing stacks
> as they are allocated.

I only looked at docs. But Andrew Morton said:

"Your new code should really be generic, utilising the
stack-page-zeroing which CONFIG_DEBUG_STACK_USAGE enables."

In fact, the other reference is in kernel/sched.c. You may (or may not) join 
the two stack walking (I would) and match a bit descriptions.

Personally, I'd put the Kconfig option in lib/Kconfig.debug and have a Kconfig 
flag named DEBUG_STACK_USAGE_SUPPORT, much like LOCKDEP_SUPPORT (defined only 
by architectures supporting the option), but have no time right now.

Bye
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH 2/2] UML - Add stack usage monitoring

2007-06-20 Thread Blaisorblade
On martedì 19 giugno 2007, Andrew Morton wrote:
> On Tue, 19 Jun 2007 14:42:45 -0400
>
> Jeff Dike <[EMAIL PROTECTED]> wrote:
> > Add a machanism to see how much of a kernel stack is used.  This
> > allocates zeroed stacks and sees where the lowest non-zero byte is on
> > process exit.  It keeps track of the lowest value and logs values as
> > they get lower.
>
> remind us again why the generic code is unsuitable?
>
> > +   for(p = stack; p < end; p++){
> > +   if(*p != 0)
> > +   if(left < lowest_to_date){
>
> Are there any plans to fix UML coding style?
In Italy we say "habits are hard to die"...

In an (unanswered) thread, titled "[RFC] Auto-fixups for CodingStyle against 
major UML violations" from 31/3/2007, also CC'ed to you, Jeff and LKML, I 
published a script (reattached here) which integrates with quilt and kbuild 
to fix all sources for these violations. It is not indent based, consequently 
it does not do any damages that indent would do.

Plus, with just a couple of tiny changes (for substitutions implying the use 
of '^'), it can also be run on unified diffs.

The only problem is just coordinating to run it together on a source tree and 
on the patch set applying on it. Otherwise the patchset manager would get 
hard-to-fix rejects.

In the end: are you interested in this stuff? I'm busy right now but can work 
to apply these changes after this thursday (i.e. tomorrow). I'd need to get 
Jeff's patchset to fix it.

Please let me know.

Bye!
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade


do-src-style-fix
Description: application/shellscript


Re: [uml-devel] [PATCH 2/2] UML - Add stack usage monitoring

2007-06-20 Thread Blaisorblade
On martedì 19 giugno 2007, Andrew Morton wrote:
> On Tue, 19 Jun 2007 15:50:03 -0400
>
> Jeff Dike <[EMAIL PROTECTED]> wrote:
> > On Tue, Jun 19, 2007 at 11:54:22AM -0700, Andrew Morton wrote:
> > > On Tue, 19 Jun 2007 14:42:45 -0400
> > >
> > > Jeff Dike <[EMAIL PROTECTED]> wrote:
> > > > Add a machanism to see how much of a kernel stack is used.  This
> > > > allocates zeroed stacks and sees where the lowest non-zero byte is on
> > > > process exit.  It keeps track of the lowest value and logs values as
> > > > they get lower.
> > >
> > > remind us again why the generic code is unsuitable?
> >
> > It does something different - it will tell you the greatest stack
> > usage of any currently running process.  What I want to be able to do
> > is run a workload and come back a few days later and see how close
> > anything came to running out of stack.
>
> 
>
> wth?  I'm _sure_ we used to have code in there which would, within
> do_exit(), work out the maximum amount of kernel stack which a task had
> used and if that was max-since-boot, drop a printk.
>
> Maybe I dreamed it, but I don't think so.
>
> I wonder where it went?

Oh, it's exactly what CONFIG_DEBUG_STACK_USAGE does for i386... (not sure if 
you were still wondering...).

> Oh well.  Your new code should really be generic, utilising the
> stack-page-zeroing which CONFIG_DEBUG_STACK_USAGE enables.  There's nothing
> UML-specific about it.

> low_water_lock and lowest_to_date should be static to check_stack_usage(),
> btw..

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] UML - Network and pcap cleanup

2007-05-03 Thread Blaisorblade
On martedì 1 maggio 2007, Jeff Dike wrote:
> [ Paolo - could you eyeball the globally valid MAC piece of this and
> see if you think it's OK? ]

Done, the patch can be accepted (I've not looked at the PCAP part). I've a 
note on the other fix there (the additional return).

> Some network device cleanup.
>
> When setup_etheraddr found a globally valid MAC being assigned to an
> interface, it went ahead and used it rather than assigning a random
> MAC like the other cases do.  This isn't really an error like the
> others, but it seems consistent to make it behave the same.
Fine, agreed. For this, you can add my Acked-by. Probably at that time MAC 
randomization wasn't implemented.

> We were getting some duplicate kfree() in the error case in
> eth_configure because platform_device_unregister frees buffers that
> the error cases following tried to free again.

This is due to patch:
"uml: drivers get release methods"
this could be useful to check whether other such changes are needed, by 
grepping for platform_device_unregister in exit paths, also for ubd driver.

That patch only fixed net_remove() and ubd_remove().
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH 2/4] UML - tidy process.c

2007-04-03 Thread Blaisorblade
On lunedì 2 aprile 2007, Jeff Dike wrote:
> Clean up arch/um/kernel/process.c -
>   lots of return(x); -> return x; conversions
>   a number of the small functions are either unused, in which
> case they are gone, along any declarations in a header, or could be
> made static.

>   current_pid is ifdefed on CONFIG_MODE_TT and its declaration
> is ifdefed on both CONFIG_MODE_TT and UML_CONFIG_MODE_TT because we
> don't know whether it's being used in a userspace or kernel file.

Please, simply include uml-config.h and use just UML_CONFIG_MODE_TT.

> Index: linux-2.6.21-mm/arch/um/include/kern_util.h
> ===
> --- linux-2.6.21-mm.orig/arch/um/include/kern_util.h  2007-03-30
> 16:01:21.0 -0400 +++
> linux-2.6.21-mm/arch/um/include/kern_util.h   2007-04-02 12:07:48.0
> -0400 @@ -33,7 +33,9 @@ extern int nsyscalls;
>   UML_ROUND_DOWN(((unsigned long) addr) + PAGE_SIZE - 1)
>
>  extern int kernel_fork(unsigned long flags, int (*fn)(void *), void *
> arg); +#if defined(CONFIG_MODE_TT) || defined(UML_CONFIG_MODE_TT)
>  extern unsigned long stack_sp(unsigned long page);
> +#endif
>  extern int kernel_thread_proc(void *data);
>  extern void syscall_segv(int sig);
>  extern int current_pid(void);



-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [RFC] UML kernel & rootfs bundle with every kernel release ?

2007-04-03 Thread Blaisorblade
On lunedì 2 aprile 2007, Antoine Martin wrote:
> Jeff Dike wrote:
> > On Sun, Apr 01, 2007 at 08:58:45PM +0100, Antoine Martin wrote:
> >> I reckon that one critical thing which could drastically increase the
> >> user base would be to have a working virtual framebuffer implementation.
> >
> > Why?  I've never understood what a framebuffer gives you that you
> > don't have now.
>
> Just like the network auto-configuration via dhcp,
Hmm... for that to be completely plug-and-play you need to make sure a dhcp 
server on the host exists.

Vmware runs a separate DHCP server exactly for this, even if we should avoid 
that as much as possible.

> it would allow users 
> to download images+kernel and run them like appliances without
> understanding anything about X or UML, just click and run.

> We are all capable of setting up Xvfb here, but most users are not,
> which is why they download ready-made images.

What about installing and pre-configuring Xnest on the image? With a suitable 
script calling xhost on the host, it just works.

This project did it:
http://umlbuilder.sourceforge.net/

although it stopped working for me ages ago (probably for some UML bug). I 
built a Mandrake image (that I now lost) with Xnest configured. With a script 
on the host which passes the host IP and that calls xhost, it should work 
easily. And btw, we need a standard startup script anyway.

> It would also make it a lot easier to focus on writing a management UI,
> hell if there isn't one shortly after, I'll do one myself!

Why not one management UI running from the host, a-la vmware? Possibly, with 
as much code as possible in scripting languages, for better transparency.

> Think of a UML browser image (running IE via wine in a limited image
> with just X + wine + IE - I would much prefer that to having wine+IE
> installed locally), testing framebuffer apps like gtk-fb/cairo-fb
> without risking your dev environment, etc...
>
> Antoine



-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] UML kernel & rootfs bundle with every kernel release ?

2007-04-01 Thread Blaisorblade
On domenica 1 aprile 2007, [EMAIL PROTECTED] wrote:
> Hello !
>
> i`m not very much into UML for the last months, but while playing around
> with dm-loop i just got one idea i`d like to share.
>
> Whenever you want to test some new kernel (feature), you may put you main
> system at risk, exactly know what you`re doing - or - use UserModeLinux.
>
> The "problem" with UML is:
>
> - you needs to compile an UML kernel first.
> - you needs some "basic" knowlege about UML to get things running.
> - you need to create an appropriate filesystem image for UML - or find some
> for download 

> - you need to copy appropriate kernel modules inside 

Well, you just build usually monolithic kernels, so you skip this problem.
And an idea that likely netkit uses is taking modules through an hostfs mount 
(or, it would be an easier setup). But you need modules to have uid 0...

> - you need to put kernel sources inside, have compiler..
> - you may need appropriate modle-init-tools,initrd, kernel specific tools
> (updated dmsetup, updatedwhatever)

You found 2.4 images, but 2.6 ones do exist.

> in short:
> it`s quite some work to be done to have your uml 2.6.21 with root-fs up and
> running and working cleanly. whenever i search the net for some appropriate
> UML fs image, those i find are very often old and outdated...

Hmm... I'd think we need a wizard for configuration. Plus some distro-like 
work for some specific issues - if I want to deploy a VM with hostname x, 
network config y, and with Xnest running, I need an easier way to do that.

You can add settings on kernel command line and parse them inside UML - we 
need standard packaged utilities for that (one of the rootfs builder 
installed such stuff).

> is there a project/website which is offering such ready to run "UML
> kernel+rootfs release bundles" for download (i.e. new kernel,generic 
> root-fs, modules inside, sources inside, compiler inside - in sync with the
> latest stable vanilla) , or , would it make sense to establish such project
> ? i.e. besides releasing the kernel, also releasing sort of a kernel
> "runtime kit" and/or "devkit" ?

The runtime kit is there on nagafix.co.uk. The devkit is a main idea - most of 
the work is to put something on the UML wiki and market well the idea - 
creating such an image would be easy. But I haven't clear what you're talking 
about - kernel development (why sources inside) or userspace development?

Also integrating all possible debug stuff would be useful, but I don't know 
what's needed.

> i think this could be very helpful for linux-kernel, because it could be
> tested by more people more quickly, more easily and thus, more often. just
> download, do few steps for setup, start up that virtual machine and there
> you go testing, hacking into the sources, do all that things you never
> would do on your main system,  whatever
>
> it would probably also add benefit to UML itself.

We need three things:
a) more performance
b) more users

c) more developers

a) leads to c), and b) too.

> does this sound dumb? i don`t know, so please comment.

No, it's not dumb. I'm even wanting to have a "Vmware-like" interface. Or at 
least standard scripts for guest management.

> regards
> roland
>
> PS:
> ok, this would be some 500M to 1G download, but there`s lot`s of bandwidth
> today - and P2P/Bittorrent.

uml.nagafix.co.uk has some good kernels + images. With compression, they're 
even as little as 50 Mb.

However, making UML easier to use, and marketing it for more application, is 
very important. Various project do exist but they're not integrated, and they 
do not try to (netkit is for network experimentation, but is also better as 
VM management tool).

In short, we'd need somebody helping out really with the website (there is the 
wiki but you must request an account via email, and it's not the main 
website), with uml-utilities, and with new uml-utilities (you know 
dm-snapshot is a faster COW? Something to setup it automatically would be 
good).

There would be more to say on this, but I can't right now (I've other stuff to 
do).

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] Auto-fixups for CodingStyle against major UML violations

2007-03-31 Thread Blaisorblade
Have you got sick of fixing your sources CodingStyle by hand? Are you 
reintroducing violations because you've always programmed in a certain style 
and those kernel hacker have dictated an insane one which you'll never learn?

Stop that, the spamful company "BlaisorBlade Inc. " has the right solution for 
you, and this spam letter is going to explain ;-) !

Without using lindent, and with just a few sed/vim substitutions, it fixes 
most of the problems we keep having. I wrote most of it with vim, and I 
discovered it has another advantage: most of the substitutions also work on 
patches (well, not so straightforward, but anyway good).

Also, it calls quilt to create a patch for all this stuff, and can optionally 
do binary comparison to verify substitutions are safe (this is only coded, 
not tested).

The only exception is the one to move labels to the first column - a slightly 
different sub would be needed:

sed -e 's/\(\(.*\))/return \1/' \
 -e 's/\ \?(\(.*\)){/if (\1) {/'  \
 -e 's/\(\(.*\))/if (\1)/' \
 -e 's/\ \?(\(.*\)){/for (\1) {/' \
 -e 's/\(\(.*\))/for (\1)/' \
 -e 's/\ \?(\(.*\)){/while (\1) {/' \
 -e 's/\(\(.*\))/while (\1)/' \
 -e 's/^ \([a-z_]*:\)/\1/' \

This:
 -e 's/^ \([a-z_]*:\)/\1/'
would become this:
 -e 's/^\([ +-]\) \([a-z_]*:\)/\1\2/'

To yet test well:
- spaces to tabs (easy)
- binary comparison
Missing features:
- break if (foo) bar(); on two lines (probably won't do this one)
- do the work on patches
- have a sane cmd line interface (most of config is inside it).

Results: 
*) in less than 10 seconds (cache-hot) generates a 416k on arch/um and 
include/asm-um:

$ diffstat $(quilt top)|tail -n 1
 147 files changed, 2360 insertions(+), 2360 deletions(-)

*) doesn't clutter the source tree nor temp directories, if you have quilt 
installed.

I attach this with no guarantee at all, however! Bye!
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade


do-src-style-fix
Description: application/shellscript


Re: [uml-devel] [PATCH] UML - fix I/O hang when multiple devices are in use

2007-03-31 Thread Blaisorblade
On giovedì 29 marzo 2007, Jeff Dike wrote:
> On Thu, Mar 29, 2007 at 02:36:43AM +0200, Blaisorblade wrote:
> > > Sometimes you need to. I'd probably just remove the do_ubd check and
> > > always recall the request function when handling completions, it's
> > > easier and safe.
>
> If I'm understanding this correctly, this is what happens now.  There
> is still the flag check and return if the queue is being run, but I
> don't see the advantage of removing that.
Possibly he just didn't understood what do_ubd was for, maybe there's some 
technical reason.

> That's a lot of mapping and unmapping though.  I wonder if just
> calling mmap would cause the COWed page to be dropped...
Yes, it would.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [patch 06/37] UML - Fix static linking

2007-03-30 Thread Blaisorblade
On venerdì 30 marzo 2007, Greg KH wrote:
> -stable review patch.  If anyone has any objections, please let us know.

I have one objection, the fix has a typo! This is the additional fix 
(note '.note' instead of 'note'):

--- linux-2.6.git.orig/include/asm-um/common.lds.S
+++ linux-2.6.git/include/asm-um/common.lds.S
@@ -15,7 +15,7 @@
   PROVIDE (_unprotected_end = .);

   . = ALIGN(4096);
-  .note : { *(note.*) }
+  .note : { *(.note.*) }
   __start___ex_table = .;
   __ex_table : { *(__ex_table) }
   __stop___ex_table = .;

With this, the fix should be merged - I just re-hit this bug and rechecked 
everything, now it's ok.

> --
> From: Jeff Dike <[EMAIL PROTECTED]>
>
> During a static link, ld has started putting a .note section in the
> .uml.setup.init section.  This has the result that the UML setups
> begin with 32 bytes of garbage and UML crashes immediately on boot.
>
> This patch creates a specific .note section for ld to drop this stuff
> into.
>
> Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
>
> ---
>  include/asm-um/common.lds.S |1 +
>  1 file changed, 1 insertion(+)
>
> --- a/include/asm-um/common.lds.S
> +++ b/include/asm-um/common.lds.S
> @@ -15,6 +15,7 @@
>PROVIDE (_unprotected_end = .);
>
>. = ALIGN(4096);
> +  .note : { *(note.*) }
>__start___ex_table = .;
>__ex_table : { *(__ex_table) }
>__stop___ex_table = .;



-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [PATCH] uml: fix static linking for real

2007-03-30 Thread Paolo &#x27;Blaisorblade' Giarrusso
There was a typo in commit 7632fc8f809a97f9d82ce125e8e3e579390ce2e5, preventing
it from working - 32bit binaries crashed hopelessly before the below fix and
work perfectly now.
Merge for 2.6.21, please.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 include/asm-um/common.lds.S |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/asm-um/common.lds.S b/include/asm-um/common.lds.S
index b16222b..f5de80c 100644
--- a/include/asm-um/common.lds.S
+++ b/include/asm-um/common.lds.S
@@ -15,7 +15,7 @@
   PROVIDE (_unprotected_end = .);
 
   . = ALIGN(4096);
-  .note : { *(note.*) }
+  .note : { *(.note.*) }
   __start___ex_table = .;
   __ex_table : { *(__ex_table) }
   __stop___ex_table = .;



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] slab: avoid __initdata warning (may be a bogus one)

2007-03-30 Thread Paolo &#x27;Blaisorblade' Giarrusso
set_up_list3s is not __init and references initkmem_list3.

Also, kmem_cache_create calls setup_cpu_cache which calls set_up_list3s. The
state machine _may_ prevent the code from accessing this data after freeing
initdata (it makes sure it's used only up to boot), so this warning may be a
false positive.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 mm/slab.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 0934f8d..0772faf 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -305,7 +305,7 @@ struct kmem_list3 {
  * Need this for bootstrapping a per node allocator.
  */
 #define NUM_INIT_LISTS (2 * MAX_NUMNODES + 1)
-struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS];
+struct kmem_list3 initkmem_list3[NUM_INIT_LISTS];
 #defineCACHE_CACHE 0
 #defineSIZE_AC 1
 #defineSIZE_L3 (1 + MAX_NUMNODES)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] utrace - uml: make UML compile with utrace enabled

2007-03-30 Thread Paolo &#x27;Blaisorblade' Giarrusso
* The prototype of arch_ptrace doesn't match the one in include/linux/ptrace.h.
* utrace_um_native is referred to by utrace_native_view but never defined.

Cc: Jeff Dike <[EMAIL PROTECTED]>
Cc: Roland McGrath <[EMAIL PROTECTED]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/um/kernel/ptrace.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index f66d01c..a42caf3 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -16,7 +16,12 @@ void ptrace_disable(struct task_struct *child)
 { 
 }
 
-long arch_ptrace(struct task_struct *child, long request, long addr, long data)
+const struct utrace_regset_view utrace_um_native;
+
+int arch_ptrace(long *request, struct task_struct *child,
+  struct utrace_attached_engine *engine,
+  unsigned long addr, unsigned long data,
+  long *retval)
 {
return -ENOSYS;
 }



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH 1/2] UML - Fix umid in xterm titles

2007-03-30 Thread Blaisorblade
On venerdì 30 marzo 2007, Jeff Dike wrote:
> From: Davide Brini <[EMAIL PROTECTED]>
>
> Calls lines_init() *after* xterm_title is modified to include umid.
>
> Signed-off-by: Davide Brini <[EMAIL PROTECTED]>
> Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>

Acked-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

> --
>  arch/um/drivers/ssl.c   |4 ++--
>  arch/um/drivers/stdio_console.c |4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> Index: linux-2.6.21-mm/arch/um/drivers/ssl.c
> ===
> --- linux-2.6.21-mm.orig/arch/um/drivers/ssl.c2007-03-30
> 10:11:01.0 -0400 +++
> linux-2.6.21-mm/arch/um/drivers/ssl.c 2007-03-30 10:28:51.0 -0400
> @@ -191,12 +191,12 @@ static int ssl_init(void)
>   ssl_driver = register_lines(&driver, &ssl_ops, serial_lines,
>   ARRAY_SIZE(serial_lines));
>
> - lines_init(serial_lines, ARRAY_SIZE(serial_lines), &opts);
> -
>   new_title = add_xterm_umid(opts.xterm_title);
>   if (new_title != NULL)
>   opts.xterm_title = new_title;
>
> + lines_init(serial_lines, ARRAY_SIZE(serial_lines), &opts);
> +
>   ssl_init_done = 1;
>   register_console(&ssl_cons);
>   return 0;
> Index: linux-2.6.21-mm/arch/um/drivers/stdio_console.c
> ===
> --- linux-2.6.21-mm.orig/arch/um/drivers/stdio_console.c  2007-03-30
> 10:11:01.0 -0400 +++
> linux-2.6.21-mm/arch/um/drivers/stdio_console.c   2007-03-30
> 10:28:51.0 -0400 @@ -166,12 +166,12 @@ int stdio_init(void)
>   return -1;
>   printk(KERN_INFO "Initialized stdio console driver\n");
>
> - lines_init(vts, ARRAY_SIZE(vts), &opts);
> -
>   new_title = add_xterm_umid(opts.xterm_title);
>   if(new_title != NULL)
>   opts.xterm_title = new_title;
>
> + lines_init(vts, ARRAY_SIZE(vts), &opts);
> +
>   con_init_done = 1;
>   register_console(&stdiocons);
>   return 0;
>
> -
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> ___
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel



-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] sys_futex64-allows-64bit-futexes-workaround for uml

2007-03-30 Thread Paolo &#x27;Blaisorblade' Giarrusso
Copy sys_futex64-allows-64bit-futexes-workaround.patch to UML (to unbreak the
UML build). Note however that in include/asm-generic/futex.h we have:

static inline int
futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
{
return -ENOSYS;
}

Which is a better solution. Pierre Peiffer, please consider that.

Cc: Pierre Peiffer <[EMAIL PROTECTED]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 include/asm-um/futex.h |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/include/asm-um/futex.h b/include/asm-um/futex.h
index 6a332a9..e875d3e 100644
--- a/include/asm-um/futex.h
+++ b/include/asm-um/futex.h
@@ -3,4 +3,17 @@
 
 #include 
 
+static inline u64
+futex_atomic_cmpxchg_inatomic64(u64 __user *uaddr, u64 oldval, u64 newval)
+{
+   return 0;
+}
+
+static inline int
+futex_atomic_op_inuser64 (int encoded_op, u64 __user *uaddr)
+{
+   return 0;
+}
+
+
 #endif



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] mm-only patches

2007-03-30 Thread Paolo &#x27;Blaisorblade' Giarrusso
Patch-arounds for mm-only compile errors/warnings, got on 2.6.21-rc5-mm2, still
apply on 2.6.21-rc5-mm3.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH 2/2] UML - Speed up exec

2007-03-30 Thread Blaisorblade
On venerdì 30 marzo 2007, Jeff Dike wrote:
> flush_thread doesn't need to do a full page table walk in order to
> clear the address space.  It knows what the end result needs to be, so
> it can call unmap directly.
>
> This results in a 10-20% speedup in an exec from bash.

Oh, yeah!
When porting part of Ingo's work, I realized that a similar thing can be done 
for fork().

If the whole address space is unmapped in init_new_context_skas(), the first 
fix_range_common() call won't need to call unmap at all. He did this with 
remap_file_pages(), where init_new_context_skas() must "unmap" everything 
anyway.

This is giving some speedup in lmbench (5% better in fork proc, 2% better in 
exec proc), but the results are still controversial, there is one benchmark 
with a 2% slowdown (called 'mmap latency').

In a loop, it maps, touches a byte per page and unmaps a region with growing 
size (up to 32MB).

However, since results aren't yet stable for some other benchmark (context 
switching benchmark is crazy), I'm still studying on this.

> Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
> --
>  arch/um/kernel/skas/exec.c |   12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.21-mm/arch/um/kernel/skas/exec.c
> ===
> --- linux-2.6.21-mm.orig/arch/um/kernel/skas/exec.c   2007-03-30
> 10:28:24.0 -0400 +++
> linux-2.6.21-mm/arch/um/kernel/skas/exec.c2007-03-30 10:30:15.0
> -0400 @@ -17,7 +17,17 @@
>
>  void flush_thread_skas(void)
>  {
> - force_flush_all();
> + void *data = NULL;
> + unsigned long end = proc_mm ? task_size : CONFIG_STUB_START;
> + int ret;
> +
> + ret = unmap(¤t->mm->context.skas.id, 0, end, 1, &data);
> + if(ret){
> + printk("flush_thread_skas - clearing address space failed, "
> +"err = %d\n", ret);
> + force_sig(SIGKILL, current);
> + }
> +
>   switch_mm_skas(¤t->mm->context.skas.id);
>  }


-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Index: linux-2.6.git/arch/um/include/skas/mmu-skas.h
===
--- linux-2.6.git.orig/arch/um/include/skas/mmu-skas.h
+++ linux-2.6.git/arch/um/include/skas/mmu-skas.h
@@ -16,6 +16,7 @@ struct mmu_context_skas {
 	unsigned long last_pmd;
 #endif
 	uml_ldt_t ldt;
+	int first_flush;
 };
 
 extern void switch_mm_skas(struct mm_id * mm_idp);
Index: linux-2.6.git/arch/um/kernel/skas/mmu.c
===
--- linux-2.6.git.orig/arch/um/kernel/skas/mmu.c
+++ linux-2.6.git/arch/um/kernel/skas/mmu.c
@@ -77,6 +77,7 @@ int init_new_context_skas(struct task_st
 	struct mmu_context_skas *to_mm = &mm->context.skas;
 	unsigned long stack = 0;
 	int ret = -ENOMEM;
+	void *unused = NULL;
 
 	if(skas_needs_stub){
 		stack = get_zeroed_page(GFP_KERNEL);
@@ -121,6 +122,14 @@ int init_new_context_skas(struct task_st
 		else to_mm->id.u.pid = start_userspace(stack);
 	}
 
+	mm->context.skas.first_flush = 1;
+	ret = unmap(&mm->context.skas.id, 0, TASK_SIZE, 1, &unused);
+	if (ret < 0) {
+		printk("init_new_context_skas - unmap failed, "
+		   "errno = %d; continuing\n", ret);
+		mm->context.skas.first_flush = 0;
+	}
+
 	ret = init_new_ldt(to_mm, from_mm);
 	if(ret < 0){
 		printk("init_new_context_skas - init_ldt"
Index: linux-2.6.git/arch/um/kernel/tlb.c
===
--- linux-2.6.git.orig/arch/um/kernel/tlb.c
+++ linux-2.6.git/arch/um/kernel/tlb.c
@@ -139,10 +139,17 @@ void fix_range_common(struct mm_struct *
 	void *flush = NULL;
 	int op_index = -1, last_op = ARRAY_SIZE(ops) - 1;
 	int ret = 0;
+	int first_flush;
 
 	if(mm == NULL)
 		return;
 
+	/* Nothing is mapped in this address space, so no call to add_munmap()
+	 * must be done */
+	first_flush = mm->context.skas.first_flush;
+
+	mm->context.skas.first_flush = 0;
+
 	ops[0].type = NONE;
 	for(addr = start_addr; addr < end_addr && !ret;){
 		npgd = pgd_offset(mm, addr);
@@ -151,9 +158,10 @@ void fix_range_common(struct mm_struct *
 			if(end > end_addr)
 end = end_addr;
 			if(force || pgd_newpage(*npgd)){
-ret = add_munmap(addr, end - addr, ops,
-		 &op_index, last_op, mmu,
-		 &flush, do_ops);
+if (!first_flush)
+	ret = add_munmap(addr, end - addr, ops,
+			 &op_index, last_op, mmu,
+			 &flush, do_ops);
 pgd_mkuptodate(*npgd);
 			}
 			addr = end;
@@ -166,9 +174,10 @@ void fix_range_common(struct mm_struct *
 			if(end > end_addr)
 end = end_addr;
 			if

Re: [uml-devel] [PATCH] UML - fix I/O hang when multiple devices are in use

2007-03-30 Thread Blaisorblade
On giovedì 29 marzo 2007, Jeff Dike wrote:
> On Thu, Mar 29, 2007 at 02:36:43AM +0200, Blaisorblade wrote:
> > > Sometimes you need to. I'd probably just remove the do_ubd check and
> > > always recall the request function when handling completions, it's
> > > easier and safe.
>
> If I'm understanding this correctly, this is what happens now.  There
> is still the flag check and return if the queue is being run, but I
> don't see the advantage of removing that.
>
> > Anyway, the main speedups to do on the UBD driver are:
> > * implement write barriers (so much less fsync) - this is performance
> > killer n.1
>
> You mean preventing the upper layers from calling fsync?

No. Since we don't know when the upper layers (including the journaling layer) 
wants to fsync, we call it everytime. But they pass this information. Chris 
Lightfoot implemented write barriers just before the API was changed, 
together with much of the other stuff I'm talking about.

It's impressive to check his original mail - the scenario with create a 32M 
file + delete it, where delete takes a minute on vanilla and 1 second on his 
patched code. I've downloaded the patch for future reference, even if I don't 
know when I'll have time to look at it.

> > * possibly to use the new 2.6 request layout with scatter/gather I/O, and
> > vectorized I/O on the host
>
> Yeah, this is something I've thought about on occassion but never
> done.
>
> > * while at vectorizing I/O using async I/O
>
> I have that, but haven't merged it since I see no performance benefit
> for some reason.
>
> > * to avoid passing requests on pipes (n.2) - on fast disk I/O becomes
> > cpu-bound.
>
> Right - I cooked up a scheme a while ago that had the requests on a
> list, being removed from one end and added to the other, with some
> minimal number of bytes going across the pipe to ensure a wakeup if
> the other side was possibly asleep.  But I never implemented it.
>
> > * using futexes instead of pipes for synchronization (required for
> > previous one).
>
> Yup - for this, we either need to test the host for futuxes and use
> pipes as a fallback or give up on 2.4 as the host.
>
> > I forgot one thing: remember ubd=mmap? Something like that could have
> > been done using MAP_PRIVATE, so that write had still to be called
> > explicitly but unchanged data was shared with the host.
> >
> > Once a page gets dirty but is then cleaned, sharing it back is
> > difficult - but even without that good savings could be
> > achievable. That's to explore for the very future though.
>
> Interesting idea.  That does avoid the formerly fatal mmap problem.
> If you unmap it, the private copy goes away because it lost its last
> reference, and if you map it again, you get the shared version.
>
> That's a lot of mapping and unmapping though.  I wonder if just
> calling mmap would cause the COWed page to be dropped...
>
>   Jeff



-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH] UML - fix I/O hang when multiple devices are in use

2007-03-28 Thread Blaisorblade
On giovedì 29 marzo 2007, Blaisorblade wrote:
> On mercoledì 28 marzo 2007, Jeff Dike wrote:
> > [ This patch needs to get into 2.6.21, as it fixes a serious bug
> > introduced soon after 2.6.20 ]
> >
> > Commit 62f96cb01e8de7a5daee472e540f726db2801499 introduced per-devices
> > queues and locks, which was fine as far as it went, but left in place
> > a global which controlled access to submitting requests to the host.
> > This should have been made per-device as well, since it causes I/O
> > hangs when multiple block devices are in use.
> >
> > This patch fixes that by replacing the global with an activity flag in
> > the device structure in order to tell whether the queue is currently
> > being run.
>
> Finally that variable has a understandable name. However in a mail from
> Jens Axboe, titled:
> "Re: [uml-devel] [PATCH 06/11] uml ubd driver: ubd_io_lock usage fixup" ,
> with Date: Mon, 30 Oct 2006 09:26:48 +0100, he suggested removing this flag
>
> altogether, so we may explore this for the future:
> > > Add some comments about requirements for ubd_io_lock and expand its
> > > use.
> > >
> > > When an irq signals that the "controller" (i.e. another thread on the
> > > host, which does the actual requests and is the only one blocked on I/O
> > > on the host) has done some work, we call again the request function
> > > ourselves (do_ubd_request).

> > > We now do that with ubd_io_lock held - that's useful to protect against
> > > concurrent calls to elv_next_request and so on.

> > Not only useful, required, as I think I complained about a year or more
> > ago :-)

> > > XXX: Maybe we shouldn't call at all the request function. Input needed
> > > on this. Are we supposed to plug and unplug the queue? That code
> > > "indirectly" does that by setting a flag, called do_ubd, which makes
> > > the request function return (it's a residual of 2.4 block layer
> > > interface).

> > Sometimes you need to. I'd probably just remove the do_ubd check and
> > always recall the request function when handling completions, it's
> > easier and safe.

> Anyway, the main speedups to do on the UBD driver are:
> * implement write barriers (so much less fsync) - this is performance
> killer n.1

> * possibly to use the new 2.6 request layout with scatter/gather I/O, and
> vectorized I/O on the host
> * while at vectorizing I/O using async I/O

> * to avoid passing requests on pipes (n.2) - on fast disk I/O becomes
> cpu-bound.
> To make a different but related example, with a SpeedScale laptop, it's
> interesting to double CPU frequency and observe tuntap speed double too.
> (with 1GHz I get on TCP numbers like 150 Mbit/s - 100 Mbit/s, depending
> whether UML trasmits or receives data; with 2GHz double rates).
> Update: I now get 150Mbit / 200Mbit (Uml receives/Uml sends) at 1GHz, and
> still the double at 2Ghz.
> This is a different UML though.

> * using futexes instead of pipes for synchronization (required for previous
> one).

I forgot one thing: remember ubd=mmap? Something like that could have been 
done using MAP_PRIVATE, so that write had still to be called explicitly but 
unchanged data was shared with the host.

Once a page gets dirty but is then cleaned, sharing it back is difficult - but 
even without that good savings could be achievable. That's to explore for the 
very future though.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH] UML - fix I/O hang when multiple devices are in use

2007-03-28 Thread Blaisorblade
On mercoledì 28 marzo 2007, Jeff Dike wrote:
> [ This patch needs to get into 2.6.21, as it fixes a serious bug
> introduced soon after 2.6.20 ]
>
> Commit 62f96cb01e8de7a5daee472e540f726db2801499 introduced per-devices
> queues and locks, which was fine as far as it went, but left in place
> a global which controlled access to submitting requests to the host.
> This should have been made per-device as well, since it causes I/O
> hangs when multiple block devices are in use.
>
> This patch fixes that by replacing the global with an activity flag in
> the device structure in order to tell whether the queue is currently
> being run.
Finally that variable has a understandable name. However in a mail from Jens 
Axboe, titled:
"Re: [uml-devel] [PATCH 06/11] uml ubd driver: ubd_io_lock usage fixup" , with 
Date: Mon, 30 Oct 2006 09:26:48 +0100, he suggested removing this flag 
altogether, so we may explore this for the future:

> > Add some comments about requirements for ubd_io_lock and expand its use.
> >
> > When an irq signals that the "controller" (i.e. another thread on the
> > host, which does the actual requests and is the only one blocked on I/O
> > on the host) has done some work, we call again the request function
> > ourselves (do_ubd_request).
> >
> > We now do that with ubd_io_lock held - that's useful to protect against
> > concurrent calls to elv_next_request and so on.
>
> Not only useful, required, as I think I complained about a year or more
> ago :-)
>
> > XXX: Maybe we shouldn't call at all the request function. Input needed on
> >  this. Are we supposed to plug and unplug the queue? That code
> > "indirectly" does that by setting a flag, called do_ubd, which makes the
> > request function return (it's a residual of 2.4 block layer interface).
>
> Sometimes you need to. I'd probably just remove the do_ubd check and
> always recall the request function when handling completions, it's
> easier and safe.

Anyway, the main speedups to do on the UBD driver are:
* implement write barriers (so much less fsync) - this is performance killer 
n.1

* possibly to use the new 2.6 request layout with scatter/gather I/O, and 
vectorized I/O on the host
* while at vectorizing I/O using async I/O

* to avoid passing requests on pipes (n.2) - on fast disk I/O becomes 
cpu-bound.
To make a different but related example, with a SpeedScale laptop, it's 
interesting to double CPU frequency and observe tuntap speed double too. 
(with 1GHz I get on TCP numbers like 150 Mbit/s - 100 Mbit/s, depending 
whether UML trasmits or receives data; with 2GHz double rates).
Update: I now get 150Mbit / 200Mbit (Uml receives/Uml sends) at 1GHz, and 
still the double at 2Ghz.
This is a different UML though.

* using futexes instead of pipes for synchronization (required for previous 
one).

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH][2.6.21] uml: fix unreasonably long udelay

2007-03-28 Thread Paolo &#x27;Blaisorblade' Giarrusso
Currently we have a confused udelay implementation.

* __const_udelay does not accept usecs but xloops in i386 and x86_64
* our implementation requires usecs as arg
* it gets a xloops count when called by asm/arch/delay.h

Bugs related to this (extremely long shutdown times) where reported by some
x86_64 users, especially using Device Mapper.

To hit this bug, a compile-time constant time parameter must be passed - that's
why UML seems to work most times.
Fix this with a simple udelay implementation.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/um/sys-i386/delay.c   |   11 ---
 arch/um/sys-x86_64/delay.c |   11 ---
 include/asm-um/delay.h |   17 ++---
 3 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/um/sys-i386/delay.c b/arch/um/sys-i386/delay.c
index 2c11b97..d623e07 100644
--- a/arch/um/sys-i386/delay.c
+++ b/arch/um/sys-i386/delay.c
@@ -27,14 +27,3 @@ void __udelay(unsigned long usecs)
 }
 
 EXPORT_SYMBOL(__udelay);
-
-void __const_udelay(unsigned long usecs)
-{
-   int i, n;
-
-   n = (loops_per_jiffy * HZ * usecs) / MILLION;
-for(i=0;i 2) ? \
+   __bad_udelay() : __udelay(n))
+
+/* It appears that ndelay is not used at all for UML, and has never been
+ * implemented. */
+extern void __unimplemented_ndelay(void);
+#define ndelay(n) __unimplemented_ndelay()
+
 #endif



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + uml-fix-pte-bit-collision.patch added to -mm tree

2007-03-28 Thread Blaisorblade
On martedì 27 marzo 2007, [EMAIL PROTECTED] wrote:
> The patch titled
>  uml: fix pte bit collision
> has been added to the -mm tree.  Its filename is
>  uml-fix-pte-bit-collision.patch

ACK from me:

Acked-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

> *** Remember to use Documentation/SubmitChecklist when testing your code
> ***
>
> See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
> out what to do about this

> --
> Subject: uml: fix pte bit collision
> From: Miklos Szeredi <[EMAIL PROTECTED]>
>
> _PAGE_PROTNONE conflicts with the lowest bit of pgoff.  This causes all
> sorts of weirdness when nonlinear mappings are used.

Hmm, I had unit-tested to death the code with a custom test-program...
The interesting thing is that I only tested my remap_file_pages changes (which 
are unaffected by this), not the existing code...
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + uml-create-archh.patch added to -mm tree

2007-03-22 Thread Blaisorblade
On Thursday 22 March 2007 22:44, [EMAIL PROTECTED] wrote:
> The patch titled
>  uml: mreate arch.h
   ^
> has been added to the -mm tree.  Its filename is
>  uml-create-archh.patch
mreate? I've also seen this in all other patches of this batch (examples 
below), and both Jeff's original mails and patch filenames are correct. What 
are your scripts doing here?

> The patch titled
>  uml: mreate as-layout.h
...
> The patch titled
>  uml: memove user_util.h
...
> The patch titled
>  uml: mdd missing __init declarations
...

Bye
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [ PATCH 4/7 ] UML - create as-layout.h

2007-03-22 Thread Blaisorblade
On Thursday 22 March 2007 17:06, Jeff Dike wrote:
> This patch moves all the the symbols defined in um_arch.c, which are
> mostly boundaries between different parts of the UML kernel address
> space, to a new header, as-layout.h.  There are also a few things here
> which aren't really related to address space layout, but which don't
> really have a better place to go.

Hey, I do like _these_ patches! A nice picture in that header could then be 
added (in the very future ;-) ), but at least one knows there are so much of 
them. And user_util.h is no more!

;-)

Bye!
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-21 Thread Blaisorblade
On Tuesday 20 March 2007 07:00, Nick Piggin wrote:
> On Mon, Mar 19, 2007 at 09:44:28PM +0100, Blaisorblade wrote:
> > On Sunday 18 March 2007 03:50, Nick Piggin wrote:
> > > > > Yes, I believe that is the case, however I wonder if that is going
> > > > > to be a problem for you to distinguish between write faults for
> > > > > clean writable ptes, and write faults for readonly ptes?

> > > > I wouldn't be able to distinguish them, but am I going to get write
> > > > faults for clean ptes when vma_wants_writenotify() is false (as seems
> > > > to be for tmpfs)? I guess not.

> > > > For tmpfs pages, clean writable PTEs are mapped as writable so they
> > > > won't give any problem, since vma_wants_writenotify() is false for
> > > > tmpfs. Correct?

> > > Yes, that should be the case. So would this mean that nonlinear
> > > protections don't work on regular files?

> > They still work in most cases (including for UML), but if the initial
> > mmap() specified PROT_WRITE, that is ignored, for pages which are not
> > remapped via remap_file_pages(). UML uses PROT_NONE for the initial mmap,
> > so that's no problem.

> But how are you going to distinguish a write fault on a readonly pte for
> dirty page accounting vs a read-only nonlinear protection?

Hmm... I was only thinking to PTEs which hadn't been remapped via 
remap_file_pages, but just faulted in with initial mmap() protection.

For the other PTEs, however, I overlooked that the current code ignores 
vma_wants_writenotify(), i.e. breaks dirty page accounting for them, and I 
refused to even consider this opportunity, even without knowing the purposes 
of dirty pages accounting (I found the commits explaining this however).

> You can't store any more data in a present pte AFAIK, so you'd have to
> have some out of band data. At which point, you may as well just forget
> about vma_wants_writenotify vmas, considering that everybody is using
> shmem/ramfs.

I was going to do that anyway. I'd guess that I should just disallow in 
remap_file_pages() the VM_MANYPROTS (i.e. MAP_CHGPROT in flags) && 
vma_wants_writenotify() combination, right? Ok, trivial (shouldn't even have 
pointed this out).
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-19 Thread Blaisorblade
On Sunday 18 March 2007 03:50, Nick Piggin wrote:
> On Sat, Mar 17, 2007 at 01:17:00PM +0100, Blaisorblade wrote:
> > On Tuesday 13 March 2007 02:19, Nick Piggin wrote:
> > > On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote:
> > > > On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
> > > > > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can
> > > > > > live with that as well, then I think it might be a good option.
> > > > >
> > > > > Oh, hmm if you can truncate these things then you still need to
> > > > > force unmap so you still need i_mmap_nonlinear.
> > > >
> > > > Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug,
> > > > which is way similar I guess.
> > > >
> > > > About the restriction to tmpfs, I have just discovered
> > > > '[PATCH] mm: tracking shared dirty pages' (commit
> > > > d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially
> > > > conflicts with remap_file_pages for file-based mmaps (and that's
> > > > fully fine, for now).
> > > >
> > > > Even if UML does not need it, till now if there is a VMA protection
> > > > and a page hasn't been remapped with remap_file_pages, the VMA
> > > > protection is used (just because it makes sense).
> > > >
> > > > However, it is only used when the PTE is first created - we can never
> > > > change protections on a VMA  - so it vma_wants_writenotify() is true
> > > > (on all file-based and on no shmfs based mapping, right?), and we
> > > > write-protect the VMA, it will always be write-protected.
> > >
> > > Yes, I believe that is the case, however I wonder if that is going to
> > > be a problem for you to distinguish between write faults for clean
> > > writable ptes, and write faults for readonly ptes?
> >
> > I wouldn't be able to distinguish them, but am I going to get write
> > faults for clean ptes when vma_wants_writenotify() is false (as seems to
> > be for tmpfs)? I guess not.
> >
> > For tmpfs pages, clean writable PTEs are mapped as writable so they won't
> > give any problem, since vma_wants_writenotify() is false for tmpfs.
> > Correct?
>
> Yes, that should be the case. So would this mean that nonlinear protections
> don't work on regular files?

They still work in most cases (including for UML), but if the initial mmap() 
specified PROT_WRITE, that is ignored, for pages which are not remapped via 
remap_file_pages(). UML uses PROT_NONE for the initial mmap, so that's no 
problem.

> I guess that's OK if Oracle and UML both use 
> tmpfs/shm?

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-17 Thread Blaisorblade
On Tuesday 13 March 2007 02:19, Nick Piggin wrote:
> On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote:
> > On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
> > > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live
> > > > with that as well, then I think it might be a good option.
> > >
> > > Oh, hmm if you can truncate these things then you still need to
> > > force unmap so you still need i_mmap_nonlinear.
> >
> > Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug,
> > which is way similar I guess.
> >
> > About the restriction to tmpfs, I have just discovered
> > '[PATCH] mm: tracking shared dirty pages' (commit
> > d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially
> > conflicts with remap_file_pages for file-based mmaps (and that's fully
> > fine, for now).
> >
> > Even if UML does not need it, till now if there is a VMA protection and a
> > page hasn't been remapped with remap_file_pages, the VMA protection is
> > used (just because it makes sense).
> >
> > However, it is only used when the PTE is first created - we can never
> > change protections on a VMA  - so it vma_wants_writenotify() is true (on
> > all file-based and on no shmfs based mapping, right?), and we
> > write-protect the VMA, it will always be write-protected.
>
> Yes, I believe that is the case, however I wonder if that is going to be
> a problem for you to distinguish between write faults for clean writable
> ptes, and write faults for readonly ptes?
I wouldn't be able to distinguish them, but am I going to get write faults for 
clean ptes when vma_wants_writenotify() is false (as seems to be for tmpfs)? 
I guess not.

For tmpfs pages, clean writable PTEs are mapped as writable so they won't give 
any problem, since vma_wants_writenotify() is false for tmpfs. Correct?

> > Also, I'm curious. Since my patches are already changing
> > remap_file_pages() code, should they be absolutely merged after yours?
>
> Is there a big clash? I don't think I did a great deal to fremap.c (mainly
> just removing stuff)...
Hopefully, we just both modify sys_remap_file_pages(), I'll see soon.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-12 Thread Blaisorblade
On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
> On Wed, Mar 07, 2007 at 10:49:47AM +0100, Nick Piggin wrote:
> > On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote:
> > > On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
> > > > Depending on whether anyone wants it, and what features they want, we
> > > > could emulate the old syscall, and make a new restricted one which is
> > > > much less intrusive.
> > > > For example, if we can operate only on MAP_ANONYMOUS memory and
> > > > specify that nonlinear mappings effectively mlock the pages, then we
> > > > can get rid of all the objrmap and unmap_mapping_range handling,
> > > > forget about the writeout and msync problems...
> > >
> > > Anonymous-only would make it a doorstop for Oracle, since its entire
> > > motive for using it is to window into objects larger than user virtual
> >
> > Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem
> > inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't
> > have a file descriptor to get a pgoff, then remap_file_pages is a
> > doorstop for everyone ;)
> >
> > > address spaces (this likely also applies to UML, though they should
> > > really chime in to confirm). Restrictions to tmpfs and/or ramfs would
> > > likely be liveable, though I suspect some things might want to do it to
> > > shm segments (I'll ask about that one). There's definitely no need for
> > > a persistent backing store for the object to be remapped in Oracle's
> > > case, in any event. It's largely the in-core destination and source of
> > > IO, not something saved on-disk itself.
> >
> > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with
> > that as well, then I think it might be a good option.
>
> Oh, hmm if you can truncate these things then you still need to
> force unmap so you still need i_mmap_nonlinear.

Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which is 
way similar I guess.

About the restriction to tmpfs, I have just discovered 
'[PATCH] mm: tracking shared dirty pages' (commit 
d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts 
with remap_file_pages for file-based mmaps (and that's fully fine, for now).

Even if UML does not need it, till now if there is a VMA protection and a page 
hasn't been remapped with remap_file_pages, the VMA protection is used (just 
because it makes sense).

However, it is only used when the PTE is first created - we can never change 
protections on a VMA  - so it vma_wants_writenotify() is true (on all 
file-based and on no shmfs based mapping, right?), and we write-protect the 
VMA, it will always be write-protected.

That's no problem for UML, but for any other user (I guess I'll have to 
prevent callers from trying such stuff - I started from a pretty generic 
patch).

> But come to think of it, I still don't think nonlinear mappings are
> too bad as they are ;)

Btw, I really like removing ->populate and merging the common code together. 
filemap_populate and shmem_populate are so obnoxiously different that I 
already wanted to do that (after merging remap_file_pages() core).

Also, I'm curious. Since my patches are already changing remap_file_pages() 
code, should they be absolutely merged after yours?
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Lockdep report against pktcdvd

2007-03-09 Thread Blaisorblade
When booting my laptop with a 2.6.20.1 laptop with lockdep enabled to test my 
code, I got a lockdep warning on pktcdvd on setup. It seems that do_open on a 
pktcdvd device causes another do_open on the underlying device, and that 
mutex_lock_nested is called with the same subclass (the for_part argument to 
do_open). So this may be a false positive, after all, but I'll let you 
decide.

I've installed and configured Ubuntu udftools so that pktcdvd0 is linked 
to /dev/cdrw, i.e. /dev/sr0, on my system.

This is an extract from my /proc/config.gz - it shows that both LOCKDEP and 
FRAME_POINTER are enabled, so the stack trace below out to be correct.

#
# Kernel hacking
#
CONFIG_DEBUG_KERNEL=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
CONFIG_STACKTRACE=y
CONFIG_FRAME_POINTER=y
CONFIG_DEBUG_RODATA=y
CONFIG_DEBUG_STACKOVERFLOW=y
~

[   56.517353] pktcdvd: writer pktcdvd0 mapped to sr0
[   56.525469] 
[   56.525471] =
[   56.525476] [ INFO: possible recursive locking detected ]
[   56.525479] 2.6.20.1-rfp+skas-v9-pre9+skas-dbg #3
[   56.525498] -
[   56.525501] vol_id/4536 is trying to acquire lock:
[   56.525503]  (&bdev->bd_mutex){--..}, at: [] 
do_open+0x7b/0x2c4
[   56.525515] 
[   56.525521] but task is already holding lock:
[   56.525536]  (&bdev->bd_mutex){--..}, at: [] 
do_open+0x7b/0x2c4
[   56.525560] 
[   56.525561] other info that might help us debug this:
[   56.525579] 2 locks held by vol_id/4536:
[   56.525593]  #0:  (&bdev->bd_mutex){--..}, at: [] 
do_open+0x7b/0x2c4
[   56.525610]  #1:  (&ctl_mutex#2){--..}, at: [] 
mutex_lock+0x22/0x26
[   56.525634] 
[   56.525634] stack backtrace:
[   56.525646] 
[   56.525652] Call Trace:
[   56.525666]  [] __lock_acquire+0x137/0xa62
[   56.525687]  [] __mutex_unlock_slowpath+0x129/0x14f
[   56.525712]  [] lock_acquire+0x4d/0x69
[   56.525732]  [] do_open+0x7b/0x2c4
[   56.525750]  [] mutex_lock_nested+0x106/0x2cd
[   56.525774]  [] do_open+0x7b/0x2c4
[   56.525795]  [] __blkdev_get+0x7b/0x8d
[   56.525830]  [] blkdev_get+0xb/0xd
[   56.525853]  [] :pktcdvd:pkt_open+0xb5/0xd52
[   56.525876]  [] __d_lookup+0x116/0x142
[   56.525897]  [] debug_check_no_locks_freed+0x12b/0x13a
[   56.525922]  [] trace_hardirqs_on+0x11a/0x13e
[   56.525944]  [] lockdep_init_map+0xa6/0x326
[   56.525968]  [] __mutex_lock_slowpath+0x281/0x2b4
[   56.525990]  [] mark_held_locks+0x53/0x71
[   56.526010]  [] __mutex_lock_slowpath+0x281/0x2b4
[   56.526034]  [] __mutex_unlock_slowpath+0x129/0x14f
[   56.526054]  [] mutex_lock_nested+0x298/0x2cd
[   56.526075]  [] mark_held_locks+0x53/0x71
[   56.526095]  [] mutex_lock_nested+0x298/0x2cd
[   56.526117]  [] debug_mutex_free_waiter+0x58/0x5c
[   56.526141]  [] mutex_lock_nested+0x2be/0x2cd
[   56.526165]  [] do_open+0xae/0x2c4
[   56.526184]  [] _spin_unlock+0x2d/0x4b
[   56.526205]  [] blkdev_open+0x0/0x6b
[   56.526225]  [] blkdev_open+0x34/0x6b
[   56.526247]  [] __dentry_open+0x128/0x201
[   56.526270]  [] nameidata_to_filp+0x2a/0x3c
[   56.526291]  [] do_filp_open+0x3d/0x4f
[   56.526315]  [] _spin_unlock+0x2d/0x4b
[   56.526335]  [] get_unused_fd+0xfa/0x10b
[   56.526356]  [] do_sys_open+0x4d/0xd5
[   56.526377]  [] sys_open+0x1b/0x1d
[   56.526396]  [] system_call+0x7e/0x83
[   56.526417] 
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-08 Thread Blaisorblade
On Wednesday 07 March 2007 10:44, Bill Irwin wrote:
> On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
> > Depending on whether anyone wants it, and what features they want, we
> > could emulate the old syscall, and make a new restricted one which is
> > much less intrusive.
> > For example, if we can operate only on MAP_ANONYMOUS memory and specify
> > that nonlinear mappings effectively mlock the pages, then we can get
> > rid of all the objrmap and unmap_mapping_range handling, forget about
> > the writeout and msync problems...
>
> Anonymous-only would make it a doorstop for Oracle, since its entire
> motive for using it is to window into objects larger than user virtual
> address spaces (this likely also applies to UML, though they should
> really chime in to confirm).

We need it for shared file mappings (for tmpfs only).

Our scenario is:
RAM is implemented through a shared mapped file, kept on tmpfs (except by dumb 
users); various processes share an fd for this file (it's opened and 
immediately deleted).

We maintain page tables in x86 style, and TLB flush is implemented through 
mmap()/munmap()/mprotect().

Having a VMA per each 4K is not the intended VMA usage: for instance, the 
default /proc/sys/vm/max_map_count (64K) is saturated by a UML process with 
64K * 4K = 256M of resident memory.

> Restrictions to tmpfs and/or ramfs would 
> likely be liveable, though I suspect some things might want to do it to
> shm segments (I'll ask about that one).

> There's definitely no need for a 
> persistent backing store for the object to be remapped in Oracle's case,
> in any event. It's largely the in-core destination and source of IO, not
> something saved on-disk itself.
>
>
> -- wli

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 027/101] Kconfig: FAULT_INJECTION can be selected only if LOCKDEP is enabled.

2007-03-08 Thread Blaisorblade
On Wednesday 07 March 2007 18:11, Greg KH wrote:
> From: "Paolo 'Blaisorblade' Giarrusso" <[EMAIL PROTECTED]>
>
> There is no prompt for STACKTRACE, so it is enabled only when 'select'ed.
> FAULT_INJECTION depends on it, while LOCKDEP selects it. So FAULT_INJECTION
> becomes visible in Kconfig only when LOCKDEP is enabled.

Please replace with the attached patch, sorry for being late (I thought it had 
been dropped). Otherwise a regression would be caused for archs like ia64 on 
allyesconfig; the change is needed, as discussed with Roman Zippel.

> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
>
> ---
>  lib/Kconfig.debug |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- linux-2.6.20.1.orig/lib/Kconfig.debug
> +++ linux-2.6.20.1/lib/Kconfig.debug
> @@ -400,7 +400,7 @@ config LKDTM
>  config FAULT_INJECTION
>   bool "Fault-injection framework"
>   depends on DEBUG_KERNEL
> - depends on STACKTRACE
> + select STACKTRACE
>   select FRAME_POINTER
>   help
> Provide fault-injection framework.
>
> --

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
[PATCH] Kconfig: FAULT_INJECTION can be selected only if LOCKDEP is enabled.

There is no prompt for STACKTRACE, so it is enabled only when 'select'ed.
FAULT_INJECTION depends on it, while LOCKDEP selects it. So FAULT_INJECTION
becomes visible in Kconfig only when LOCKDEP is enabled.

Update: fixed for architectures not supporting STACKTRACE_SUPPORT.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
Index: linux-2.6.git/lib/Kconfig.debug
===
--- linux-2.6.git.orig/lib/Kconfig.debug
+++ linux-2.6.git/lib/Kconfig.debug
@@ -399,8 +399,8 @@ config LKDTM
 
 config FAULT_INJECTION
 	bool "Fault-injection framework"
-	depends on DEBUG_KERNEL
-	depends on STACKTRACE
+	depends on DEBUG_KERNEL && STACKTRACE_SUPPORT
+	select STACKTRACE
 	select FRAME_POINTER
 	help
 	  Provide fault-injection framework.


Re: [uml-devel] [PATCH 4/5] UML - driver formatting fixes

2007-03-06 Thread Blaisorblade
On Tuesday 06 March 2007 19:32, Jeff Dike wrote:
> Fix a bunch of formatting violations in the drivers:
>   return(n) -> return n
>   whitespace fixes
>   emacs formatting comment removal
>   breaking if(foo) return(n) into two lines
>
> There are also a couple of errno use bugs:
>   using errno in a printk when the failure put errno into a local variable
>   saving errno after a printk, which can change it
>
> Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>

> Index: test/arch/um/drivers/chan_user.c
> ===
> --- test.orig/arch/um/drivers/chan_user.c 2007-03-06 12:09:47.0
> -0500 +++ test/arch/um/drivers/chan_user.c2007-03-06 12:10:12.0
> -0500 @@ -158,7 +158,7 @@ static int winch_tramp(int fd, struct tt
>*/
>   err = run_helper_thread(winch_thread, &data, CLONE_FILES, &stack, 0);
>   if(err < 0){
> - printk("fork of winch_thread failed - errno = %d\n", errno);
> + printk("fork of winch_thread failed - errno = %d\n", err);
>   goto out_close;
>   }

The second line should better say -err instead of err.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] uml: make sigio_lock() irq-safe

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

sigio_lock is taken both from process context and from interrupt context. So we
*must* use irqsave.

Then, remove irq disabling from update_thread(), as it's called with
sigio_lock() held (yes, set_signals(0) is local_irq_save).

In fact, I've seen this causing frequent hangs with spinlock debugging enabled
(I've verified well that the cause was an interrupt causing re-acquiring of
this lock); however, now it's causing hangs as interrupt disabling causes
some sleep-inside-spinlock checks to trigger - and then printk deadlocks.

I've tested this for a long time and it is stable.
I've also verified that nothing can sleep within this lock; to this purpose,
I've had to verify everything inside um_request_irq; since it calls again
write_sigio_workaround(), I've had to make atomic the allocation inside
setup_initial_poll.

HOWEVER, request_irq() can sleep, and in write_sigio_irq() thanks to
IRQF_SAMPLE_RANDOM it _does_ sleep. So a separate patch makes write_sigio_irq()
be called outside of sigio_lock().

Actually, I'm also quite dubious that an interrupt caused by other interrupts is
a reliable entropy source, but that is another thing completely.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/um/include/sigio.h  |4 ++--
 arch/um/kernel/sigio.c   |   12 
 arch/um/os-Linux/sigio.c |   36 
 3 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/arch/um/include/sigio.h b/arch/um/include/sigio.h
index 434f1a9..a58bc1d 100644
--- a/arch/um/include/sigio.h
+++ b/arch/um/include/sigio.h
@@ -8,7 +8,7 @@
 
 extern int write_sigio_irq(int fd);
 extern int register_sigio_fd(int fd);
-extern void sigio_lock(void);
-extern void sigio_unlock(void);
+extern unsigned long sigio_lock(void);
+extern void sigio_unlock(unsigned long flags);
 
 #endif
diff --git a/arch/um/kernel/sigio.c b/arch/um/kernel/sigio.c
index 89f9866..cc54db7 100644
--- a/arch/um/kernel/sigio.c
+++ b/arch/um/kernel/sigio.c
@@ -45,12 +45,16 @@ int write_sigio_irq(int fd)
 /* These are called from os-Linux/sigio.c to protect its pollfds arrays. */
 static DEFINE_SPINLOCK(sigio_spinlock);
 
-void sigio_lock(void)
+unsigned long sigio_lock(void)
 {
-   spin_lock(&sigio_spinlock);
+   unsigned long flags;
+
+   spin_lock_irqsave(&sigio_spinlock, flags);
+
+   return flags;
 }
 
-void sigio_unlock(void)
+void sigio_unlock(unsigned long flags)
 {
-   spin_unlock(&sigio_spinlock);
+   spin_unlock_irqrestore(&sigio_spinlock, flags);
 }
diff --git a/arch/um/os-Linux/sigio.c b/arch/um/os-Linux/sigio.c
index 3fc43b3..88988fb 100644
--- a/arch/um/os-Linux/sigio.c
+++ b/arch/um/os-Linux/sigio.c
@@ -21,6 +21,10 @@
 #include "os.h"
 #include "um_malloc.h"
 
+/* Nothing in this file can sleep. I've verified each and every function. The
+ * only "exception" is write_sigio_thread, which runs in a host thread, so it
+ * has no chance of sleeping. */
+
 /* Protected by sigio_lock(), also used by sigio_cleanup, which is an
  * exitcall.
  */
@@ -121,11 +125,9 @@ static int need_poll(struct pollfds *polls, int n)
  */
 static void update_thread(void)
 {
-   unsigned long flags;
int n;
char c;
 
-   flags = set_signals(0);
n = os_write_file(sigio_private[0], &c, sizeof(c));
if(n != sizeof(c)){
printk("update_thread : write failed, err = %d\n", -n);
@@ -138,7 +140,6 @@ static void update_thread(void)
goto fail;
}
 
-   set_signals(flags);
return;
  fail:
/* Critical section start */
@@ -150,15 +151,15 @@ static void update_thread(void)
close(write_sigio_fds[0]);
close(write_sigio_fds[1]);
/* Critical section end */
-   set_signals(flags);
 }
 
 int add_sigio_fd(int fd)
 {
struct pollfd *p;
int err = 0, i, n;
+   unsigned long flags;
 
-   sigio_lock();
+   flags = sigio_lock();
for(i = 0; i < all_sigio_fds.used; i++){
if(all_sigio_fds.poll[i].fd == fd)
break;
@@ -184,7 +185,7 @@ int add_sigio_fd(int fd)
next_poll.used = n + 1;
update_thread();
  out:
-   sigio_unlock();
+   sigio_unlock(flags);
return err;
 }
 
@@ -192,6 +193,7 @@ int ignore_sigio_fd(int fd)
 {
struct pollfd *p;
int err = 0, i, n = 0;
+   unsigned long flags;
 
/* This is called from exitcalls elsewhere in UML - if
 * sigio_cleanup has already run, then update_thread will hang
@@ -200,7 +202,7 @@ int ignore_sigio_fd(int fd)
if(write_sigio_pid == -1)
return -EIO;
 
-   sigio_lock();
+   flags = sigio_lock();
for(i = 0; i < current_poll.used; i++){
if(current_poll.poll[i].fd ==

[PATCH 2/2] uml: avoid calling request_irq in atomic context

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

To avoid calling request_irq under a spinlock, and to simplify code around, code
a state machine to allow safely dropping and retaking sigio_lock during
initialization. The state variable is protected by a spinlock together with much
other stuff (so there's no reason to use atomic_t). See the long comment for
further details.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/um/os-Linux/sigio.c |   74 +-
 1 files changed, 46 insertions(+), 28 deletions(-)

diff --git a/arch/um/os-Linux/sigio.c b/arch/um/os-Linux/sigio.c
index 88988fb..44a2d1f 100644
--- a/arch/um/os-Linux/sigio.c
+++ b/arch/um/os-Linux/sigio.c
@@ -29,6 +29,24 @@
  * exitcall.
  */
 static int write_sigio_pid = -1;
+/* State machine:
+ *
+ * write_sigio_workaround transitions from INACTIVE to STARTING; all methods
+ * exit when they detect the STARTING state, so it's effectively a lock without
+ * wait possibility. After initialization completes it goes to ACTIVE.
+ *
+ * Errors in initialization or in use (say the thread does not answer) move us
+ * to the BROKEN terminal state; this is a change (we went to INACTIVE), and it
+ * could be reverted so that we go back to INACTIVE, if this case is handled
+ * correctly without leaks.
+ *
+ * closing the device moves us to the CLOSED terminal state.
+ *
+ * The state must be read or written only with sigio_lock() held, even when we
+ * are in STARTING, to ensure global visibility across CPUs.
+ */
+enum {INACTIVE, STARTING, ACTIVE, BROKEN, CLOSED};
+static int write_sigio_state = INACTIVE;
 
 /* These arrays are initialized before the sigio thread is started, and
  * the descriptors closed after it is killed.  So, it can't see them change.
@@ -121,7 +139,7 @@ static int need_poll(struct pollfds *polls, int n)
 }
 
 /* Must be called with sigio_lock held, because it's needed by the marked
- * critical section.
+ * critical section. The lock could be moved inside, but it is not needed.
  */
 static void update_thread(void)
 {
@@ -146,6 +164,9 @@ static void update_thread(void)
if(write_sigio_pid != -1)
os_kill_process(write_sigio_pid, 1);
write_sigio_pid = -1;
+   /* Since the deinit is not complete, we cannot move into INACTIVE state.
+* Or can we? */
+   write_sigio_state = BROKEN;
close(sigio_private[0]);
close(sigio_private[1]);
close(write_sigio_fds[0]);
@@ -199,7 +220,7 @@ int ignore_sigio_fd(int fd)
 * sigio_cleanup has already run, then update_thread will hang
 * or fail because the thread is no longer running.
 */
-   if(write_sigio_pid == -1)
+   if(write_sigio_state != ACTIVE)
return -EIO;
 
flags = sigio_lock();
@@ -246,59 +267,53 @@ static void write_sigio_workaround(void)
unsigned long stack;
struct pollfd *p;
int err;
-   int l_write_sigio_fds[2];
-   int l_sigio_private[2];
-   int l_write_sigio_pid;
unsigned long flags;
 
/* We call this *tons* of times - and most ones we must just fail. */
flags = sigio_lock();
-   l_write_sigio_pid = write_sigio_pid;
-   sigio_unlock(flags);
-
-   if (l_write_sigio_pid != -1)
+   if (write_sigio_state != INACTIVE) {
+   sigio_unlock(flags);
return;
+   } else {
+   write_sigio_state = STARTING;
+   }
+   sigio_unlock(flags);
 
-   err = os_pipe(l_write_sigio_fds, 1, 1);
+   err = os_pipe(write_sigio_fds, 1, 1);
if(err < 0){
printk("write_sigio_workaround - os_pipe 1 failed, "
   "err = %d\n", -err);
return;
}
-   err = os_pipe(l_sigio_private, 1, 1);
+   err = os_pipe(sigio_private, 1, 1);
if(err < 0){
printk("write_sigio_workaround - os_pipe 2 failed, "
   "err = %d\n", -err);
goto out_close1;
}
 
-   p = setup_initial_poll(l_sigio_private[1]);
+   p = setup_initial_poll(sigio_private[1]);
if(!p)
goto out_close2;
 
-   flags = sigio_lock();
-
-   /* Did we race? Don't try to optimize this, please, it's not so likely
-* to happen, and no more than once at the boot. */
-   if(write_sigio_pid != -1)
-   goto out_free;
+   if (current_poll.poll)
+   kfree(current_poll.poll);
 
current_poll = ((struct pollfds) { .poll= p,
   .used= 1,
   .size= 1 });
 
-   if (write_sigio_irq(l_write_sigio_fds[0]))
+   if (write_sigio_irq(write_sigio_fds[0]))
goto out_clear_poll;
 
-   memcpy(write_sig

Re: [uml-devel] [PATCH 04/11] uml - hostfs: avoid possible escapes from hostfs=.

2007-03-05 Thread Blaisorblade
On Monday 05 March 2007 23:03, Jeff Dike wrote:
> On Mon, Mar 05, 2007 at 09:49:02PM +0100, Paolo 'Blaisorblade' Giarrusso 
wrote:
> > From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
> >
> > Avoid accepting things like -o .., -o dir/../../dir2, -o dir/../.. .
> > This may be considered useless, but YMMV. I consider that this has a
> > limited security value, exactly like disabling module support (in many
> > case it is useful).
>
> Two comments on this one:
> > +   return  strstr(path, "/../") != NULL ||
> > +   strcmp(path, "..") == 0 ||
> > +   strncmp(path, "../", strlen("../")) == 0 ||
> > +   str_ends_with(path, "/..");
>
> Minor style point - I'd be happier with more parens:
>
> + return  (strstr(path, "/../") != NULL) ||
> + (strcmp(path, "..") == 0) ||
> + (strncmp(path, "../", strlen("../")) == 0) ||
> + str_ends_with(path, "/..");

Hmm. Personally I prefer the earlier style, but I haven't the last word on 
this.

> C gets operator precedence wrong in one or two cases, so I just put parens
> any place it might matter.

Indeed. For instance this patch is wrong, I did this once in a patch, and I 
saw it another time in current Ubuntu kernels:

-   a + b / 4
+   a + b >> 2

This is instead needed:

+   a + (b >> 2)

> Second, there is code in externfs which does the same thing without
> parsing paths which you might consider instead.

I must, indeed - your comment points out the symlink issue, which I didn't 
think of, and which makes my patch moot.

> It sees whether the 
> requested directory is outside the jail by cd-ing to it and then
> repeatedly cd .. until it either reaches / or the jail root.
>
> A copy is below for your reading pleasure.

I gave a look, and it's nice. Except that maybe "escapes_jail" would be a 
clearer name (there's confusion about the subject of "escaping").

Also, what about concurrent UML threads caring about current directory? I know 
that without SMP/preemption we can't have this problem, but when I see this I 
count a future bug unless this is checked (I think I reason mathematically 
about correctness, even if not formally).

And I haven't the time to check this - I think your version could be merged 
anyway, but I'm not sure.

>   Jeff

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/11] uml - hostfs: make hostfs= option work as a jail, as intended.

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

When a given host directory is specified to be mounted both in hostfs=path1 and
with mount option -o path2, we should give access to path1/path2, but this does
not happen. Fix that in the simpler way.

Also, root_ino can be the empty string, since we use %s/%s as format.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 fs/hostfs/hostfs_kern.c |   12 +++-
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index 6f10e43..9baf697 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -47,7 +47,7 @@ struct dentry_operations hostfs_dentry_ops = {
 };
 
 /* Changed in hostfs_args before the kernel starts running */
-static char *root_ino = "/";
+static char *root_ino = "";
 static int append = 0;
 
 #define HOSTFS_SUPER_MAGIC 0x00c0ffee
@@ -947,15 +947,17 @@ static int hostfs_fill_sb_common(struct super_block *sb, 
void *d, int silent)
sb->s_magic = HOSTFS_SUPER_MAGIC;
sb->s_op = &hostfs_sbops;
 
-   if((data == NULL) || (*data == '\0'))
-   data = root_ino;
+   /* NULL is printed as  by sprintf: avoid that. */
+   if (data == NULL)
+   data = "";
 
err = -ENOMEM;
-   name = kmalloc(strlen(data) + 1, GFP_KERNEL);
+   name = kmalloc(strlen(root_ino) + 1
+   + strlen(data) + 1, GFP_KERNEL);
if(name == NULL)
goto out;
 
-   strcpy(name, data);
+   sprintf(name, "%s/%s", root_ino, data);
 
root_inode = iget(sb, 0);
if(root_inode == NULL)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/11] uml: mark both consoles as CON_ANYTIME

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Since both UML consoles do not use percpu variables, they may be called when the
cpu is still offline, and they may be marked CON_ANYTIME (this is documented in
kernel/printk.c, grep for CON_ANYTIME to find mentions of this).

Works well in testing done with lock debug enabled, should be safe but is
not needed for next release.

This would probably help also stderr_console.c, but this is yet to test.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/um/drivers/ssl.c   |2 +-
 arch/um/drivers/stdio_console.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/um/drivers/ssl.c b/arch/um/drivers/ssl.c
index fc22b9b..4b382a6 100644
--- a/arch/um/drivers/ssl.c
+++ b/arch/um/drivers/ssl.c
@@ -179,7 +179,7 @@ static struct console ssl_cons = {
.write  = ssl_console_write,
.device = ssl_console_device,
.setup  = ssl_console_setup,
-   .flags  = CON_PRINTBUFFER,
+   .flags  = CON_PRINTBUFFER|CON_ANYTIME,
.index  = -1,
 };
 
diff --git a/arch/um/drivers/stdio_console.c b/arch/um/drivers/stdio_console.c
index 7ff0b0f..76d1f1c 100644
--- a/arch/um/drivers/stdio_console.c
+++ b/arch/um/drivers/stdio_console.c
@@ -153,7 +153,7 @@ static struct console stdiocons = {
.write  = uml_console_write,
.device = uml_console_device,
.setup  = uml_console_setup,
-   .flags  = CON_PRINTBUFFER,
+   .flags  = CON_PRINTBUFFER|CON_ANYTIME,
.index  = -1,
 };
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/11] hostfs: rename some vars for clarity

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

* rename name to host_root_path
* rename data to req_root.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 fs/hostfs/hostfs_kern.c |   26 +-
 1 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index 0bcf7ac..b4e127b 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -961,7 +961,7 @@ static int contains_dotdot(const char* path)
 static int hostfs_fill_sb_common(struct super_block *sb, void *d, int silent)
 {
struct inode *root_inode;
-   char *name, *data = d;
+   char *host_root_path, *req_root = d;
int err;
 
sb->s_blocksize = 1024;
@@ -970,20 +970,20 @@ static int hostfs_fill_sb_common(struct super_block *sb, 
void *d, int silent)
sb->s_op = &hostfs_sbops;
 
/* NULL is printed as  by sprintf: avoid that. */
-   if (data == NULL)
-   data = "";
+   if (req_root == NULL)
+   req_root = "";
 
err = -EINVAL;
-   if (unlikely(contains_dotdot(data)))
+   if (unlikely(contains_dotdot(req_root)))
goto out;
 
err = -ENOMEM;
-   name = kmalloc(strlen(root_ino) + 1
-   + strlen(data) + 1, GFP_KERNEL);
-   if(name == NULL)
+   host_root_path = kmalloc(strlen(root_ino) + 1
+   + strlen(req_root) + 1, GFP_KERNEL);
+   if(host_root_path == NULL)
goto out;
 
-   sprintf(name, "%s/%s", root_ino, data);
+   sprintf(host_root_path, "%s/%s", root_ino, req_root);
 
root_inode = iget(sb, 0);
if(root_inode == NULL)
@@ -993,10 +993,10 @@ static int hostfs_fill_sb_common(struct super_block *sb, 
void *d, int silent)
if(err)
goto out_put;
 
-   HOSTFS_I(root_inode)->host_filename = name;
-   /* Avoid that in the error path, iput(root_inode) frees again name 
through
-* hostfs_destroy_inode! */
-   name = NULL;
+   HOSTFS_I(root_inode)->host_filename = host_root_path;
+   /* Avoid that in the error path, iput(root_inode) frees again
+* host_root_path through hostfs_destroy_inode! */
+   host_root_path = NULL;
 
err = -ENOMEM;
sb->s_root = d_alloc_root(root_inode);
@@ -1016,7 +1016,7 @@ static int hostfs_fill_sb_common(struct super_block *sb, 
void *d, int silent)
  out_put:
 iput(root_inode);
  out_free:
-   kfree(name);
+   kfree(host_root_path);
  out:
return(err);
 }


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/11] uml: fix errno usage

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Avoid reusing userspace errno twice - it can be cleared by libc code everywhere
(in particular printk() does clear it in my setup).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/um/drivers/daemon_user.c |   17 +
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/um/drivers/daemon_user.c b/arch/um/drivers/daemon_user.c
index 310af0f..021b82c 100644
--- a/arch/um/drivers/daemon_user.c
+++ b/arch/um/drivers/daemon_user.c
@@ -56,30 +56,31 @@ static int connect_to_switch(struct daemon_data *pri)
 
pri->control = socket(AF_UNIX, SOCK_STREAM, 0);
if(pri->control < 0){
+   err = -errno;
printk("daemon_open : control socket failed, errno = %d\n", 
-  errno);  
-   return(-errno);
+  -err);
+   return err;
}
 
if(connect(pri->control, (struct sockaddr *) ctl_addr, 
   sizeof(*ctl_addr)) < 0){
-   printk("daemon_open : control connect failed, errno = %d\n",
-  errno);
err = -errno;
+   printk("daemon_open : control connect failed, errno = %d\n",
+  -err);
goto out;
}
 
fd = socket(AF_UNIX, SOCK_DGRAM, 0);
if(fd < 0){
-   printk("daemon_open : data socket failed, errno = %d\n", 
-  errno);
err = -errno;
+   printk("daemon_open : data socket failed, errno = %d\n",
+  -err);
goto out;
}
if(bind(fd, (struct sockaddr *) local_addr, sizeof(*local_addr)) < 0){
-   printk("daemon_open : data bind failed, errno = %d\n", 
-  errno);
err = -errno;
+   printk("daemon_open : data bind failed, errno = %d\n",
+  -err);
goto out_close;
}
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/11] uml: remove dead code about os_usr1_signal() and os_usr1_process()

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

os_usr1_signal() is totally unused, os_usr1_process() is used only by TT mode.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/um/include/os.h   |3 ++-
 arch/um/os-Linux/process.c |3 +++
 arch/um/os-Linux/signal.c  |5 -
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/um/include/os.h b/arch/um/include/os.h
index 8629bd1..5c74da4 100644
--- a/arch/um/include/os.h
+++ b/arch/um/include/os.h
@@ -192,7 +192,9 @@ extern int os_process_parent(int pid);
 extern void os_stop_process(int pid);
 extern void os_kill_process(int pid, int reap_child);
 extern void os_kill_ptraced_process(int pid, int reap_child);
+#ifdef UML_CONFIG_MODE_TT
 extern void os_usr1_process(int pid);
+#endif
 extern long os_ptrace_ldt(long pid, long addr, long data);
 
 extern int os_getpid(void);
@@ -261,7 +263,6 @@ extern void block_signals(void);
 extern void unblock_signals(void);
 extern int get_signals(void);
 extern int set_signals(int enable);
-extern void os_usr1_signal(int on);
 
 /* trap.c */
 extern void os_fill_handlinfo(struct kern_handlers h);
diff --git a/arch/um/os-Linux/process.c b/arch/um/os-Linux/process.c
index c692a19..76bdd67 100644
--- a/arch/um/os-Linux/process.c
+++ b/arch/um/os-Linux/process.c
@@ -21,6 +21,7 @@
 #include "longjmp.h"
 #include "skas_ptrace.h"
 #include "kern_constants.h"
+#include "uml-config.h"
 
 #define ARBITRARY_ADDR -1
 #define FAILURE_PID-1
@@ -131,10 +132,12 @@ void os_kill_ptraced_process(int pid, int reap_child)
CATCH_EINTR(waitpid(pid, NULL, 0));
 }
 
+#ifdef UML_CONFIG_MODE_TT
 void os_usr1_process(int pid)
 {
kill(pid, SIGUSR1);
 }
+#endif
 
 /* Don't use the glibc version, which caches the result in TLS. It misses some
  * syscalls, and also breaks with clone(), which does not unshare the TLS.
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index b897e85..2667686 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -243,8 +243,3 @@ int set_signals(int enable)
 
return ret;
 }
-
-void os_usr1_signal(int on)
-{
-   change_sig(SIGUSR1, on);
-}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/11] Uml simple fixes

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
Some tested UML fixes - should be applied for 2.6.21.
-- 
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/11] uml: code convention cleanup of a file

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Fix coding conventions violations is arch/um/os-Linux/helper.c.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/um/os-Linux/helper.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/um/os-Linux/helper.c b/arch/um/os-Linux/helper.c
index c7ad630..7954357 100644
--- a/arch/um/os-Linux/helper.c
+++ b/arch/um/os-Linux/helper.c
@@ -38,7 +38,7 @@ static int helper_child(void *arg)
char **argv = data->argv;
int errval;
 
-   if (helper_pause){
+   if (helper_pause) {
signal(SIGHUP, helper_hup);
pause();
}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/11] uml - hostfs: avoid possible escapes from hostfs=.

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Avoid accepting things like -o .., -o dir/../../dir2, -o dir/../.. .
This may be considered useless, but YMMV. I consider that this has a limited
security value, exactly like disabling module support (in many case it is
useful).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 fs/hostfs/hostfs_kern.c |   26 ++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index 9baf697..0bcf7ac 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -936,6 +936,28 @@ static const struct address_space_operations 
hostfs_link_aops = {
.readpage   = hostfs_link_readpage,
 };
 
+static inline int str_ends_with(const char * str, const char* suffix)
+{
+   size_t len = strlen(str), suffix_len = strlen(suffix);
+   return strcmp(str + len - suffix_len, suffix) == 0;
+}
+
+static int contains_dotdot(const char* path)
+{
+   /*
+* Prevent escaping from hostfs=folder, even if this is not useful to
+* jail the UML superuser.
+* Since foo..bar is a valid name, we must look for /../ in the string,
+* or for ../ at the beginning, /.. at the end, or check whether '..' is
+* the complete string.
+*/
+
+   return  strstr(path, "/../") != NULL ||
+   strcmp(path, "..") == 0 ||
+   strncmp(path, "../", strlen("../")) == 0 ||
+   str_ends_with(path, "/..");
+}
+
 static int hostfs_fill_sb_common(struct super_block *sb, void *d, int silent)
 {
struct inode *root_inode;
@@ -951,6 +973,10 @@ static int hostfs_fill_sb_common(struct super_block *sb, 
void *d, int silent)
if (data == NULL)
data = "";
 
+   err = -EINVAL;
+   if (unlikely(contains_dotdot(data)))
+   goto out;
+
err = -ENOMEM;
name = kmalloc(strlen(root_ino) + 1
+ strlen(data) + 1, GFP_KERNEL);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/11] uml: fix confusion irq early reenabling

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Fix confusion about call context - comments and code are inconsistent and plain
wrong, my fault.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/um/drivers/line.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/um/drivers/line.c b/arch/um/drivers/line.c
index 01d4ab6..f75d7b0 100644
--- a/arch/um/drivers/line.c
+++ b/arch/um/drivers/line.c
@@ -370,10 +370,10 @@ static irqreturn_t line_write_interrupt(int irq, void 
*data)
struct tty_struct *tty = line->tty;
int err;
 
-   /* Interrupts are enabled here because we registered the interrupt with
+   /* Interrupts are disabled here because we registered the interrupt with
 * IRQF_DISABLED (see line_setup_irq).*/
 
-   spin_lock_irq(&line->lock);
+   spin_lock(&line->lock);
err = flush_buffer(line);
if (err == 0) {
return IRQ_NONE;
@@ -381,7 +381,7 @@ static irqreturn_t line_write_interrupt(int irq, void *data)
line->head = line->buffer;
line->tail = line->buffer;
}
-   spin_unlock_irq(&line->lock);
+   spin_unlock(&line->lock);
 
if(tty == NULL)
return IRQ_NONE;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/11] uml: fix a memory leak in the multicast driver

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Memory allocated by mcast_user_init must be freed in the matching mcast_remove.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/um/drivers/mcast_user.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/um/drivers/mcast_user.c b/arch/um/drivers/mcast_user.c
index 8138f5e..b827e82 100644
--- a/arch/um/drivers/mcast_user.c
+++ b/arch/um/drivers/mcast_user.c
@@ -50,6 +50,14 @@ static void mcast_user_init(void *data, void *dev)
pri->dev = dev;
 }
 
+static void mcast_remove(void *data)
+{
+   struct mcast_data *pri = data;
+
+   kfree(pri->mcast_addr);
+   pri->mcast_addr = NULL;
+}
+
 static int mcast_open(void *data)
 {
struct mcast_data *pri = data;
@@ -157,7 +165,7 @@ const struct net_user_info mcast_user_info = {
.init   = mcast_user_init,
.open   = mcast_open,
.close  = mcast_close,
-   .remove = NULL,
+   .remove = mcast_remove,
.set_mtu= mcast_set_mtu,
.add_address= NULL,
.delete_address = NULL,


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/11] uml - hostfs: fix double free

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Fix double free in the error path - when name is assigned into root_inode we do
not own it any more and we must not kfree() it - see patch for details.

Thanks to William Stearns for the initial report.

CC: William Stearns <[EMAIL PROTECTED]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 fs/hostfs/hostfs_kern.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index e965eb1..6f10e43 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -966,6 +966,9 @@ static int hostfs_fill_sb_common(struct super_block *sb, 
void *d, int silent)
goto out_put;
 
HOSTFS_I(root_inode)->host_filename = name;
+   /* Avoid that in the error path, iput(root_inode) frees again name 
through
+* hostfs_destroy_inode! */
+   name = NULL;
 
err = -ENOMEM;
sb->s_root = d_alloc_root(root_inode);
@@ -977,7 +980,7 @@ static int hostfs_fill_sb_common(struct super_block *sb, 
void *d, int silent)
 /* No iput in this case because the dput does that for us */
 dput(sb->s_root);
 sb->s_root = NULL;
-   goto out_free;
+   goto out;
 }
 
return(0);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/11] uml - activate_fd: return ENOMEM only when appropriate

2007-03-05 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Avoid returning ENOMEM in case of a duplicate IRQ - ENOMEM was saved into err
earlier.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/um/kernel/irq.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
index 50a288b..dbf2f5b 100644
--- a/arch/um/kernel/irq.c
+++ b/arch/um/kernel/irq.c
@@ -142,6 +142,7 @@ int activate_fd(int irq, int fd, int type, void *dev_id)
 .events= events,
 .current_events= 0 } );
 
+   err = -EBUSY;
spin_lock_irqsave(&irq_lock, flags);
for (irq_fd = active_fds; irq_fd != NULL; irq_fd = irq_fd->next) {
if ((irq_fd->fd == fd) && (irq_fd->type == type)) {


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH] UML - Fix 2.6.20 hang

2007-02-21 Thread Blaisorblade
On Thursday 15 February 2007 18:07, Jeff Dike wrote:
> Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
[...]
> @@ -331,10 +334,9 @@ void maybe_sigio_broken(int fd, int read
>
>   sigio_lock();
>   err = need_poll(&all_sigio_fds, all_sigio_fds.used + 1);
> - if(err){
> - printk("maybe_sigio_broken - failed to add pollfd\n");
> + if(err)
>   goto out;
> - }
> +
>   all_sigio_fds.poll[all_sigio_fds.used++] =
>   ((struct pollfd) { .fd  = fd,
>  .events  = read ? POLLIN : POLLOUT,
>
Was that removal wanted or it happened by mistake? That way, err is completely 
lost.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH 2/3] UML - x86_64 thread fixes

2007-02-21 Thread Blaisorblade
On Thursday 08 February 2007 22:57, Jeff Dike wrote:
> x86_64 needs some TLS fixes.  What was missing was remembering the
> child thread id during clone and stuffing it into the child during
> each context switch.
>
> The %fs value is stored separately in the thread structure since the
> host controls what effect it has on the actual register file.  The
> host also needs to store it in its own thread struct, so we need the
> value kept outside the register file.
Is there any reason for avoiding this treatment to %gs, apart that TLS code 
usually does not need it, even if the API allows for it to exist? I _do_ 
believe this patch fixes bug which can be tested (not verified everything), 
even if I wonder why you didn't look at the patch I sent to you time ago (I 
hadn't finished it, in truth, but there was most stuff - it had problems I 
couldn't debug at that time).
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH 4/4] UML - lock host ldt retrieval

2007-02-21 Thread Blaisorblade
On Wednesday 21 February 2007 21:25, Jeff Dike wrote:
> Add some locking to host_ldt_entries to prevent racing when reading
> LDT information from the host.

Please remove GFP_KERNEL allocation under spin_lock - simplest way is using a 
mutex unless this can be important for performance.

> @@ -386,15 +387,20 @@ static long do_modify_ldt_skas(int func,
>   return ret;
>  }
>
> -short dummy_list[9] = {0, -1};
> -short * host_ldt_entries = NULL;
> +static DEFINE_SPINLOCK(host_ldt_lock);
> +static short dummy_list[9] = {0, -1};
> +static short * host_ldt_entries = NULL;
>
> -void ldt_get_host_info(void)
> +static void ldt_get_host_info(void)
>  {
>   long ret;
>   struct ldt_entry * ldt;
>   int i, size, k, order;
>
> + spin_lock(&host_ldt_lock);
> + if(host_ldt_entries != NULL)
> + goto out_unlock;
> +
>   host_ldt_entries = dummy_list+1;
>
>   for(i = LDT_PAGES_MAX-1, order=0; i; i>>=1, order++);
> @@ -402,8 +408,9 @@ void ldt_get_host_info(void)
>   ldt = (struct ldt_entry *)
> __get_free_pages(GFP_KERNEL|__GFP_ZERO, order);
>   if(ldt == NULL) {
> - printk("ldt_get_host_info: couldn't allocate buffer for host 
> ldt\n");
> - return;
> + printk("ldt_get_host_info: couldn't allocate buffer for host "
> +"ldt\n");
> + goto out_unlock;
>   }
>
>   ret = modify_ldt(0, ldt, (1< @@ -428,7 +435,8 @@ void ldt_get_host_info(void)
>   size = (size + 1) * sizeof(dummy_list[0]);
>   host_ldt_entries = kmalloc(size, GFP_KERNEL);
>   if(host_ldt_entries == NULL) {
> - printk("ldt_get_host_info: couldn't allocate host ldt 
> list\n");
> + printk("ldt_get_host_info: couldn't allocate host ldt "
> +"list\n");
>   goto out_free;
>   }
>   }
> @@ -442,6 +450,8 @@ void ldt_get_host_info(void)
>
>  out_free:
>   free_pages((unsigned long)ldt, order);
> +out_unlock:
> + spin_unlock(&host_ldt_lock);
>  }
>
>  long init_new_ldt(struct mmu_context_skas * new_mm,
> @@ -480,8 +490,7 @@ long init_new_ldt(struct mmu_context_ska
>* inherited from the host. All ldt-entries found
>* will be reset in the following loop
>*/
> - if(host_ldt_entries == NULL)
> - ldt_get_host_info();
> + ldt_get_host_info();
>   for(num_p=host_ldt_entries; *num_p != -1; num_p++){
>   desc.entry_number = *num_p;
>   err = write_ldt_entry(&new_mm->id, 1, &desc,
> @@ -560,6 +569,6 @@ void free_ldt(struct mmu_context_skas *
>
>  int sys_modify_ldt(int func, void __user *ptr, unsigned long bytecount)
>  {
> - return(CHOOSE_MODE_PROC(do_modify_ldt_tt, do_modify_ldt_skas, func,
> - ptr, bytecount));
> + return CHOOSE_MODE_PROC(do_modify_ldt_tt, do_modify_ldt_skas, func,
> + ptr, bytecount);
>  }
>
>
> -
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> ___
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: patch x86_64-fix-2.6.18-regression-ptrace_oldsetoptions-should-be-accepted.patch queued to -stable tree

2007-02-21 Thread Blaisorblade
On Wednesday 21 February 2007 00:41, [EMAIL PROTECTED] wrote:
> This is a note to let you know that we have just queued up the patch titled
>
>  Subject: x86_64: fix 2.6.18 regression - PTRACE_OLDSETOPTIONS should
> be accepted
>
> to the 2.6.18-stable tree.  Its filename is

Since you are still maintaining 2.6.18, I've just sent another patch for that, 
i.e. the backport of commit 14679eb3c50897889ba62f9a37e3bcd8a205b5e7.
Could you still merge it in this release, especially since this is the last 
2.6.18-stable you are doing?
Also, this patch should also be merged in 2.6.20, but I saw no mail about 
this, so I wanted to make sure it's heading there too.

> x86_64-fix-2.6.18-regression-ptrace_oldsetoptions-should-be-accepted.patch
>
> A git repo of this tree can be found at
>
> http://www.kernel.org/git/?p=linux/kernel/git/gregkh/stable-queue.git;a=sum
>mary

Hmm, this should be (note the missing gregkh in the path):
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


x86_64: PTRACE_[GS]ET_THREAD_AREA should be accepted

2007-02-21 Thread Paolo &#x27;Blaisorblade' Giarrusso
This patch backports from 2.6.19 a fix to a 2.6.18 regression.

Like for PTRACE_OLDSETOPTIONS, we should fix PTRACE_[GS]ET_THREAD_AREA. This had
been done already for 2.6.19, so this is for 2.6.18-stable only.
This was tested with UML/32bit as API consumer, both before and after this
patch.

Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
Index: linux-2.6.18/arch/x86_64/ia32/ptrace32.c
===
--- linux-2.6.18.orig/arch/x86_64/ia32/ptrace32.c
+++ linux-2.6.18/arch/x86_64/ia32/ptrace32.c
@@ -241,6 +241,8 @@ asmlinkage long sys32_ptrace(long reques
case PTRACE_SYSCALL:
case PTRACE_OLDSETOPTIONS:
case PTRACE_SETOPTIONS:
+   case PTRACE_SET_THREAD_AREA:
+   case PTRACE_GET_THREAD_AREA:
return sys_ptrace(request, pid, addr, data); 
 
default:


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] x86_64: fix 2.6.18 regression - PTRACE_OLDSETOPTIONS should be accepted

2007-02-16 Thread Blaisorblade
On Friday 16 February 2007 20:02, Jeff Dike wrote:
> On Thu, Feb 15, 2007 at 08:05:56PM +0100, Blaisorblade wrote:
> > Jeff, I verified my patch is _almost_ enough for 2.6.18 for fully booting
> > a 32bit UML; on 2.6.18 I had to also add PTRACE_GET/SET_THREAD_AREA (this
> > fix was merged in 2.6.19) to avoid tons of TLS errors.
>
> I'm not seeing that.  With the current set of patches, I have 32-bit UMLs
> happily booting on x86_64.
Which kernel? I've not yet tested 2.6.20. I'll try debugging this 
subsequently.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] x86_64: fix 2.6.18 regression - PTRACE_OLDSETOPTIONS should be accepted

2007-02-16 Thread Blaisorblade
On Thursday 15 February 2007 18:01, Jeff Dike wrote:
> On Wed, Feb 14, 2007 at 09:51:23PM -0800, Andrew Morton wrote:
> > Whatever happens, please ensure that the final fix makes it into -stable
> > as well.  Jeff's version of this patch wasn't cc'ed to [EMAIL PROTECTED]
>
> Paolo's patch was sent to -stable.  His should be used everywhere, and mine
> should be dropped.

Jeff, I verified my patch is _almost_ enough for 2.6.18 for fully booting a 
32bit UML; on 2.6.18 I had to also add PTRACE_GET/SET_THREAD_AREA (this fix 
was merged in 2.6.19) to avoid tons of TLS errors.

On 2.6.19, the crash at boot is removed (btw, that crash output no message - I 
hope that with your fatal/nonfatal/etc. introduction I would get a message) 
but another one happens when starting init. I'll test 2.6.20 ASAP.

Bye
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade

Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] x86_64: fix 2.6.18 regression - PTRACE_OLDSETOPTIONS should be accepted

2007-02-14 Thread Blaisorblade
On Thursday 15 February 2007 03:54, Jeff Dike wrote:
> On Thu, Feb 15, 2007 at 03:34:23AM +0100, Paolo 'Blaisorblade' Giarrusso 
wrote:
> > Index: linux-2.6.git/arch/x86_64/ia32/ptrace32.c
> > ===
> > --- linux-2.6.git.orig/arch/x86_64/ia32/ptrace32.c
> > +++ linux-2.6.git/arch/x86_64/ia32/ptrace32.c
> > @@ -246,6 +246,7 @@ asmlinkage long sys32_ptrace(long reques
> > case PTRACE_SINGLESTEP:
> > case PTRACE_DETACH:
> > case PTRACE_SYSCALL:
> > +   case PTRACE_OLDSETOPTIONS:
> > case PTRACE_SETOPTIONS:
> > case PTRACE_SET_THREAD_AREA:
> > case PTRACE_GET_THREAD_AREA:
>
> I sent an equivalent patch in earlier today:
Doh! Interesting this timing...

> Index: linux-2.6/arch/x86_64/ia32/ptrace32.c
> ===
> --- linux-2.6.orig/arch/x86_64/ia32/ptrace32.c
> +++ linux-2.6/arch/x86_64/ia32/ptrace32.c
> @@ -239,6 +239,8 @@ asmlinkage long sys32_ptrace(long reques
>   __u32 val;
>
>   switch (request) {
> + case PTRACE_OLDSETOPTIONS:
> + request = PTRACE_SETOPTIONS;
>   case PTRACE_TRACEME:
>   case PTRACE_ATTACH:
>   case PTRACE_KILL:
>
> I change the request so that PTRACE_OLDSETOPTIONS doesn't need to
> propogate any further.  However, it is present in include/asm-x86_64,
> so I guess that counts as being part of the x86_64 ABI.  That being
> the case, I guess my patch can be dropped in favor of this one.

It is handled in ptrace_request, unless there are include problems. I'm going 
to reboot and test mine for any remaining problem.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


x86_64: fix 2.6.18 regression - PTRACE_OLDSETOPTIONS should be accepted

2007-02-14 Thread Paolo &#x27;Blaisorblade' Giarrusso
Also PTRACE_OLDSETOPTIONS should be accepted, as done by kernel/ptrace.c and
forced by binary compatibility. UML/32bit breaks because of this - since it is 
wise
enough to use PTRACE_OLDSETOPTIONS to be binary compatible with 2.4 host
kernels.

Until 2.6.17 (commit f0f2d6536e3515b5b1b7ae97dc8f176860c8c2ce) we had:

   default:
return sys_ptrace(request, pid, addr, data);

Instead here we have:
case PTRACE_GET_THREAD_AREA:
case ...:
return sys_ptrace(request, pid, addr, data);

default:
return -EINVAL;

This change was a style change - when a case is added, it must be explicitly
tested this way. In this case, not enough testing was done.

Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
Index: linux-2.6.git/arch/x86_64/ia32/ptrace32.c
===
--- linux-2.6.git.orig/arch/x86_64/ia32/ptrace32.c
+++ linux-2.6.git/arch/x86_64/ia32/ptrace32.c
@@ -246,6 +246,7 @@ asmlinkage long sys32_ptrace(long reques
case PTRACE_SINGLESTEP:
case PTRACE_DETACH:
case PTRACE_SYSCALL:
+   case PTRACE_OLDSETOPTIONS:
case PTRACE_SETOPTIONS:
case PTRACE_SET_THREAD_AREA:
case PTRACE_GET_THREAD_AREA:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Kconfig: FAULT_INJECTION can be selected only if LOCKDEP is enabled.

2007-02-12 Thread Paolo &#x27;Blaisorblade' Giarrusso
There is no prompt for STACKTRACE, so it is enabled only when 'select'ed.
FAULT_INJECTION depends on it, while LOCKDEP selects it. So FAULT_INJECTION
becomes visible in Kconfig only when LOCKDEP is enabled.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
Index: linux-2.6.git/lib/Kconfig.debug
===
--- linux-2.6.git.orig/lib/Kconfig.debug
+++ linux-2.6.git/lib/Kconfig.debug
@@ -400,7 +400,7 @@ config LKDTM
 config FAULT_INJECTION
bool "Fault-injection framework"
depends on DEBUG_KERNEL
-   depends on STACKTRACE
+   select STACKTRACE
select FRAME_POINTER
help
  Provide fault-injection framework.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH] [UML] fix mknod

2007-01-26 Thread Blaisorblade
On Thursday 25 January 2007 05:31, Andrew Morton wrote:
> On Tue, 23 Jan 2007 23:33:29 +0100
>
> Blaisorblade <[EMAIL PROTECTED]> wrote:
> > On Tuesday 23 January 2007 14:17, Johannes Stezenbach wrote:
> > > On Tue, Jan 23, 2007 at 09:02:30AM +0100, Blaisorblade wrote:
> > > > On Monday 22 January 2007 21:13, Johannes Stezenbach wrote:
> > > > > I was playing with user-mode Linux and found that mknod creates
> > > > > devices node in hostfs with wrong major/minor numbers.
> > > > > The patch below fixes it for me.
> > > >
> > > > Hmpf. Still having this bug on hostfs is quite bad. Thanks for
> > > > reporting.
> >
> > I've now seen - we never fixed this one, we fixed the analogous problem
> > on 'ls' output and friends (in init_inode, which is used in many places).
> >
> > > > It should be hostfs_user.c to take major and minor and to combine
> > > > them correctly - it can use libc's macros.
> > >
> > > Right, below is a better patch.
> >
> > Exactly what I meant, thanks!
> > I'd say:
> > Acked-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
> > This can go to 2.6.20, and possibly even to -stable (after either me or
> > Jeff tests it once).

> So..  did you test it?

Just tested, I could reproduce the bug and verify the fix works, on 2.6.18; 
and for me it's enough to merge it, considering that little changes happened 
on hostfs, and because it's well understood with static review that this fix 
is needed.

It should also be merged in 2.6.16-stable.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH] [UML] fix mknod

2007-01-23 Thread Blaisorblade
On Tuesday 23 January 2007 14:17, Johannes Stezenbach wrote:
> On Tue, Jan 23, 2007 at 09:02:30AM +0100, Blaisorblade wrote:
> > On Monday 22 January 2007 21:13, Johannes Stezenbach wrote:
> > > I was playing with user-mode Linux and found that mknod creates
> > > devices node in hostfs with wrong major/minor numbers.
> > > The patch below fixes it for me.
> >
> > Hmpf. Still having this bug on hostfs is quite bad. Thanks for reporting.

I've now seen - we never fixed this one, we fixed the analogous problem 
on 'ls' output and friends (in init_inode, which is used in many places).

> > It should be hostfs_user.c to take major and minor and to combine them
> > correctly - it can use libc's macros.
>
> Right, below is a better patch.

Exactly what I meant, thanks!
I'd say:
Acked-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
This can go to 2.6.20, and possibly even to -stable (after either me or Jeff 
tests it once).

> ---
> Fix UML hostfs mknod(): userspace has differernt
> dev_t size and encoding than kernel, so extract major/minor
> and reencode using glibc makedev() macro.
>
> Signed-off-by: Johannes Stezenbach <[EMAIL PROTECTED]>
>
> diff -rup linux-2.6.19.orig/fs/hostfs/hostfs.h
> linux-2.6.19/fs/hostfs/hostfs.h ---
> linux-2.6.19.orig/fs/hostfs/hostfs.h  2006-11-29 22:57:37.0 +0100
> +++ linux-2.6.19/fs/hostfs/hostfs.h   2007-01-23 14:11:03.0 +0100 @@
> -76,7 +76,7 @@ extern int make_symlink(const char *from
>  extern int unlink_file(const char *file);
>  extern int do_mkdir(const char *file, int mode);
>  extern int do_rmdir(const char *file);
> -extern int do_mknod(const char *file, int mode, int dev);
> +extern int do_mknod(const char *file, int mode, unsigned int major,
> unsigned int minor);
> extern int link_file(const char *from, const char 
> *to);
>  extern int do_readlink(char *file, char *buf, int size);
>  extern int rename_file(char *from, char *to);
> diff -rup linux-2.6.19.orig/fs/hostfs/hostfs_kern.c
> linux-2.6.19/fs/hostfs/hostfs_kern.c ---
> linux-2.6.19.orig/fs/hostfs/hostfs_kern.c 2006-11-29 22:57:37.0
> +0100 +++ linux-2.6.19/fs/hostfs/hostfs_kern.c2007-01-23
> 14:11:20.0 +0100 @@ -755,7 +755,7 @@ int hostfs_mknod(struct inode
> *dir, stru
>   goto out_put;
>
>   init_special_inode(inode, mode, dev);
> - err = do_mknod(name, mode, dev);
> + err = do_mknod(name, mode, MAJOR(dev), MINOR(dev));
>   if(err)
>   goto out_free;
>
> diff -rup linux-2.6.19.orig/fs/hostfs/hostfs_user.c
> linux-2.6.19/fs/hostfs/hostfs_user.c ---
> linux-2.6.19.orig/fs/hostfs/hostfs_user.c 2006-11-29 22:57:37.0
> +0100 +++ linux-2.6.19/fs/hostfs/hostfs_user.c2007-01-23
> 14:11:39.0 +0100 @@ -295,11 +295,11 @@ int do_rmdir(const char
> *file)
>   return(0);
>  }
>
> -int do_mknod(const char *file, int mode, int dev)
> +int do_mknod(const char *file, int mode, unsigned int major, unsigned int
> minor) {
>   int err;
>
> - err = mknod(file, mode, dev);
> + err = mknod(file, mode, makedev(major, minor));
>   if(err) return(-errno);
>   return(0);
>  }

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH] [UML] fix mknod

2007-01-23 Thread Blaisorblade
On Monday 22 January 2007 21:13, Johannes Stezenbach wrote:
> Hi,
>
> I was playing with user-mode Linux and found that mknod creates
> devices node in hostfs with wrong major/minor numbers.
> The patch below fixes it for me.
>
> Johannes

Hmpf. Still having this bug on hostfs is quite bad. Thanks for reporting.

It should be hostfs_user.c to take major and minor and to combine them 
correctly - it can use libc's macros.

> Signed-off-by: Johannes Stezenbach <[EMAIL PROTECTED]>
>
>
> diff -rup linux-2.6.19.orig/fs/hostfs/hostfs.h
> linux-2.6.19/fs/hostfs/hostfs.h ---
> linux-2.6.19.orig/fs/hostfs/hostfs.h  2006-11-29 22:57:37.0 +0100
> +++ linux-2.6.19/fs/hostfs/hostfs.h   2007-01-22 20:53:23.0 +0100 @@
> -76,7 +76,17 @@ extern int make_symlink(const char *from
>  extern int unlink_file(const char *file);
>  extern int do_mkdir(const char *file, int mode);
>  extern int do_rmdir(const char *file);
> -extern int do_mknod(const char *file, int mode, int dev);
> +
> +/* gnu_dev_makedev from glibc's sys/sysmacros.h */
> +static inline unsigned long long makedev(unsigned int major, unsigned int
> minor) +{
> + return ((minor & 0xff) | ((major & 0xfff) << 8)
> + | (((unsigned long long int) (minor & ~0xff)) << 12)
> + | (((unsigned long long int) (major & ~0xfff)) << 32));
> +
> +}
> +
> +extern int do_mknod(const char *file, int mode, unsigned long long dev);
>  extern int link_file(const char *from, const char *to);
>  extern int do_readlink(char *file, char *buf, int size);
>  extern int rename_file(char *from, char *to);
> diff -rup linux-2.6.19.orig/fs/hostfs/hostfs_kern.c
> linux-2.6.19/fs/hostfs/hostfs_kern.c ---
> linux-2.6.19.orig/fs/hostfs/hostfs_kern.c 2006-11-29 22:57:37.0
> +0100 +++ linux-2.6.19/fs/hostfs/hostfs_kern.c2007-01-22
> 20:57:58.0 +0100 @@ -740,6 +740,7 @@ int hostfs_mknod(struct inode
> *dir, stru
>   struct inode *inode;
>   char *name;
>   int err = -ENOMEM;
> + unsigned long long udev;
>
>   inode = iget(dir->i_sb, 0);
>   if(inode == NULL)
> @@ -755,7 +756,9 @@ int hostfs_mknod(struct inode *dir, stru
>   goto out_put;
>
>   init_special_inode(inode, mode, dev);
> - err = do_mknod(name, mode, dev);
> + /* userspace has different dev_t encoding than kernel... */
> + udev = makedev(MAJOR(dev), MINOR(dev));
> + err = do_mknod(name, mode, udev);
>   if(err)
>   goto out_free;
>
> diff -rup linux-2.6.19.orig/fs/hostfs/hostfs_user.c
> linux-2.6.19/fs/hostfs/hostfs_user.c ---
> linux-2.6.19.orig/fs/hostfs/hostfs_user.c 2006-11-29 22:57:37.0
> +0100 +++ linux-2.6.19/fs/hostfs/hostfs_user.c2007-01-22
> 20:54:37.0 +0100 @@ -295,7 +295,7 @@ int do_rmdir(const char *file)
>   return(0);
>  }
>
> -int do_mknod(const char *file, int mode, int dev)
> +int do_mknod(const char *file, int mode, unsigned long long dev)
>  {
>   int err;
>
>
> -
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> ___
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH 1/6] UML - Console locking fixes

2007-01-03 Thread Blaisorblade
On Saturday 30 December 2006 00:41, Jeff Dike wrote:
> Clean up the console driver locking.  There are various problems here,
> including sleeping under a spinlock and spinlock recursion, some of
> which are fixed here.  This patch deals with the locking involved with
> opens and closes.  The problem is that an mconsole request to change a
> console's configuration can race with an open.  Changing a
> configuration should only be done when a console isn't opened.  Also,
> an open must be looking at a stable configuration.  In addition, a get
> configuration request must observe the same locking since it must also
> see a stable configuration.  With the old locking, it was possible for
> this to hang indefinitely in some cases because open would block for a
> long time waiting for a connection from the host while holding the
> lock needed by the mconsole request.
>
> As explained in the long comment, this is fixed by adding a spinlock
> for the use count and configuration and a mutex for the actual open
> and close.
>
> Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>

> +
>  int line_open(struct line *lines, struct tty_struct *tty)
>  {
> - struct line *line;
> + struct line *line = &lines[tty->index];
>   int err = -ENODEV;
>
> - line = &lines[tty->index];
> - tty->driver_data = line;
> + spin_lock(&line->count_lock);
> + if(!line->valid)
> + goto out_unlock;
> +
> + err = 0;
> + if(tty->count > 1)
> + goto out_unlock;
>
> - /* The IRQ which takes this lock is not yet enabled and won't be run
> -  * before the end, so we don't need to use spin_lock_irq.*/
> - spin_lock(&line->lock);
> + mutex_lock(&line->open_mutex);
> + spin_unlock(&line->count_lock);

This is an obnoxious thing to do unless you specifically prove otherwise. You 
cannot take a mutex (and possibly sleep) while holding a spinlock.

You must have either:
+   spin_unlock(&line->count_lock);
+   mutex_lock(&line->open_mutex);

or take count_lock inside open_mutex (which looks like being correct here).

In the first solution, you can create a OPENING flag (via a state variable), 
and add the rule that (unlike the count) nobody but the original setter is 
allowed to change it, and that who finds it set (say a concurrent open) must 
return without touching it.

The state diagram is like:
CLOSED -> OPENING -> OPEN
(only the function which triggered the transition from CLOSED to OPENING can 
trigger the transition from OPENING to OPEN). It can probably be simplified 
to OPENING <-> ! OPENING.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade

Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Make x86_64 udelay() round up instead of down - try2

2006-11-17 Thread Blaisorblade
On Friday 17 November 2006 23:00, Andrew Morton wrote:
> On Fri, 17 Nov 2006 20:30:47 +0100
>
> "Paolo 'Blaisorblade' Giarrusso" <[EMAIL PROTECTED]> wrote:
> > Port two patches from i386 to x86_64 delay.c to make sure all rounding is
> > done upward instead of downward.
>
> Andi already has a patch in his, tree, only it's different.
>
> ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt-current/patches/make-x86_64-ud
>elay-round-up-instead-of-down.
Ok, a fixed-up version of what I sent - I implemented Pavel's suggestion, the 
the choice is just a taste matter.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


TTY layer locking design

2006-11-17 Thread Blaisorblade
I've started cleaning up locking in UML TTY drivers, and I've found some 
difficulties in making it work cleanly.

I was starting looking well into the TTY layer locking and its design, and at 
a first (and carent) look I found it difficult to follow; recent changes to 
tty refcounting seem to point out that much work needs to be done, as one can 
guess after looking at the code (I wasn't sure whether the problem was on the 
code or on the reader ;-), however).

So I have some questions before starting really to delve here:

*) who is maintaining this aspect of code ? The only name found in MAINTAINERS 
and CREDITS is the one of James Simmons.
*) would the locking need to be redesigned?
*) is the current design reputed solid? I'm not only talking about the big 
kernel lock, but also about whether drivers need to reinvent (incorrectly) 
the wheel for their locking. UML drivers are very bad on this, but I've found 
difficulty both at reading the code and at finding documentation.
*) Documentation/tty.txt is quite carent.

*) there is no generic way to handle tty's which are also consoles, except 
drivers/char/vt.c - that code is written as if it were the only case where 
that applies. Instead, UML drivers are an exception to this - UML cannot use 
virtual terminals.
Having a generic console driver using tty methods appears to be a cleaner 
design (think, in filesystem writing, of page cache methods based 
on ->readpage and ->writepage).

I'm trying to establish whether it is possible, for instance, for ->close to 
be called in parallel to ->write and such; in other driver layer this is 
impossible because refcounts are used (normal files, char & block devices) 
or, in the network layer, where refcount usage is impossible, because of 
state machines (in the network layer).

It seems not to happen for the console layer - is this true?
Also, since write must use a spinlock because it must protect from interrupt 
races, and open cannot, must we use both a mutex and a spinlock in ->write 
and similar methods? This can be avoided in other drivers.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade

 
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


We're still coping with GCC < 3.0

2006-11-17 Thread Blaisorblade
(CC me on replies as I'm not subscribed)
In arch/i386/kernel/irq.c (current git head) I found this comment:

/*
 * These should really be __section__(".bss.page_aligned") as well, but
 * gcc's 3.0 and earlier don't handle that correctly.
 */
static char softirq_stack[NR_CPUS * THREAD_SIZE]
__attribute__((__aligned__(THREAD_SIZE)));

static char hardirq_stack[NR_CPUS * THREAD_SIZE]
__attribute__((__aligned__(THREAD_SIZE)));

That should be fixed now that we require GCC 3.0, not?

Btw, there are other such comments, like in include/asm-i386/semaphore.h: 
sema_init (for GCC 2.7!). That one might not be the case to fix because of the 
increased stack usage

I've seen other similar tests around, so I thought that it'd be useful to 
centralize all tests for GCC versions to headers like include/compiler.h so 
they're promptly removed when deprecating old compilers.

What about this?
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Make x86_64 udelay() round up instead of down - try2

2006-11-17 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Port two patches from i386 to x86_64 delay.c to make sure all rounding is done
upward instead of downward.

There is no sign in commit messages that the mismatch was done on purpose, and
"delay() guarantees sleeping at least for the specified time" is still a valid
rule IMHO.

The original x86 patches are both from pre-GIT era, i.e.:

"[PATCH] round up  in __udelay()" in commit
54c7e1f5cc6771ff644d7bc21a2b829308bd126f

"[PATCH] add 1 in __const_udelay()" in commit
42c77a9801b8877d8b90f65f75db758822a0bccc

(both commits are from the BK repository converted to git).

Changes from try1:
* fixed the code, compile tested against warnings;
* now it is a real round up rather than "round down and add 1"

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 arch/x86_64/lib/delay.c|4 ++--
 include/asm-x86_64/delay.h |2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86_64/lib/delay.c b/arch/x86_64/lib/delay.c
index 50be909..b488743 100644
--- a/arch/x86_64/lib/delay.c
+++ b/arch/x86_64/lib/delay.c
@@ -40,13 +40,13 @@ EXPORT_SYMBOL(__delay);
 
 inline void __const_udelay(unsigned long xloops)
 {
-   __delay((xloops * HZ * 
cpu_data[raw_smp_processor_id()].loops_per_jiffy) >> 32);
+   __delay((xloops * HZ * cpu_data[raw_smp_processor_id()].loops_per_jiffy 
+ (1UL << 32) - 1) >> 32);
 }
 EXPORT_SYMBOL(__const_udelay);
 
 void __udelay(unsigned long usecs)
 {
-   __const_udelay(usecs * 0x10c6);  /* 2**32 / 100 */
+   __const_udelay(usecs * 0x10c7);  /* 2**32 / 100 (rounded up) */
 }
 EXPORT_SYMBOL(__udelay);
 
diff --git a/include/asm-x86_64/delay.h b/include/asm-x86_64/delay.h
index 65f64ac..40146f6 100644
--- a/include/asm-x86_64/delay.h
+++ b/include/asm-x86_64/delay.h
@@ -16,7 +16,7 @@ extern void __const_udelay(unsigned long
 extern void __delay(unsigned long loops);
 
 #define udelay(n) (__builtin_constant_p(n) ? \
-   ((n) > 2 ? __bad_udelay() : __const_udelay((n) * 0x10c6ul)) : \
+   ((n) > 2 ? __bad_udelay() : __const_udelay((n) * 0x10c7ul)) : \
__udelay(n))
 
 #define ndelay(n) (__builtin_constant_p(n) ? \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] i386, x86_64: comment magic constants in delay.h

2006-11-17 Thread Paolo &#x27;Blaisorblade' Giarrusso
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

For both i386 and x86_64, copy from arch/$ARCH/lib/delay.c comments about the
used magic constants, plus a few other niceties.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 include/asm-i386/delay.h   |5 -
 include/asm-x86_64/delay.h |5 -
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/asm-i386/delay.h b/include/asm-i386/delay.h
index b1c7650..9ae5e37 100644
--- a/include/asm-i386/delay.h
+++ b/include/asm-i386/delay.h
@@ -7,6 +7,7 @@
  * Delay routines calling functions in arch/i386/lib/delay.c
  */
  
+/* Undefined functions to get compile-time errors */
 extern void __bad_udelay(void);
 extern void __bad_ndelay(void);
 
@@ -15,10 +16,12 @@ extern void __ndelay(unsigned long nsecs
 extern void __const_udelay(unsigned long usecs);
 extern void __delay(unsigned long loops);
 
+/* 0x10c7 is 2**32 / 100 (rounded up) */
 #define udelay(n) (__builtin_constant_p(n) ? \
((n) > 2 ? __bad_udelay() : __const_udelay((n) * 0x10c7ul)) : \
__udelay(n))
-   
+
+/* 0x5 is 2**32 / 10 (rounded up) */
 #define ndelay(n) (__builtin_constant_p(n) ? \
((n) > 2 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \
__ndelay(n))
diff --git a/include/asm-x86_64/delay.h b/include/asm-x86_64/delay.h
index 40146f6..c2669f1 100644
--- a/include/asm-x86_64/delay.h
+++ b/include/asm-x86_64/delay.h
@@ -7,18 +7,21 @@
  * Delay routines calling functions in arch/x86_64/lib/delay.c
  */
  
+/* Undefined functions to get compile-time errors */
 extern void __bad_udelay(void);
 extern void __bad_ndelay(void);
 
 extern void __udelay(unsigned long usecs);
-extern void __ndelay(unsigned long usecs);
+extern void __ndelay(unsigned long nsecs);
 extern void __const_udelay(unsigned long usecs);
 extern void __delay(unsigned long loops);
 
+/* 0x10c7 is 2**32 / 100 (rounded up) */
 #define udelay(n) (__builtin_constant_p(n) ? \
((n) > 2 ? __bad_udelay() : __const_udelay((n) * 0x10c7ul)) : \
__udelay(n))
 
+/* 0x5 is 2**32 / 10 (rounded up) */
 #define ndelay(n) (__builtin_constant_p(n) ? \
((n) > 2 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \
__ndelay(n))
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] Re: [RFC] [patch 0/18] remap_file_pages protection support (for UML), try 3

2005-09-04 Thread Blaisorblade
On Friday 02 September 2005 23:02, Hugh Dickins wrote:
> On Fri, 26 Aug 2005, Blaisorblade wrote:
> > * The first 2 patches modify the PTE encoding macros and start preparing
> > the VM for the new situation (i.e. VMA which have variable protections,
> > which are called VM_NONUNIFORM. I dropped the early VM_MANYPROTS name).

> What a pity: please revert.  VM_NONUNIFORM sounds impressive, but might
> mean all kinds of things, maybe to do with NUMA.  VM_MANYPROTS is good,
> it says what it means.
Ok. Btw, before I forget: I assume I should redo the patches rather than fix 
what you say on top of mine, (at least when not changing behaviour), right?
> > * Patch 11 is a big simplification. Since we must encode the PTE's on
> > swapout like in VM_NONLINEAR vmas, the simplest way to reuse the existing
> > code is to make sure that VM_NONUNIFORM vmas are also marked as
> > VM_NONLINEAR.

> In some places you seem to say that you (UML) only need VM_MANYPROTS vmas
> linear, in other places you seem to say that your VM_MANYPROTS vmas will
> be nonlinear.  I've no idea which way round it is.  Perhaps the "non"
> sometimes goes missing (another reason to avoid NONUNIFORM).

> I wrote that yesterday.  Thanks, you've cleared it up in private mail:
> the VM_MANYPROTS vmas that UML wants are VM_NONLINEAR anyway.

> Yes, I see your dilemma there: you rightly want to avoid adding bloat
> by distinguishing cases that you don't need distinguished; but equally
> rightly fear that someone somewhere will start using the VM_MANYPROTS
> for other reasons, and hit the inefficiencies of VM_NONLINEAR
> unnecessarily.  I share your uncertainty, I don't have an immediate
> feel for the right direction on that.
We'll see later if we can cater to this case without messing up zap_pte_range 
as I did in last patch (that is the only one with which I was able to break 
something - not in the version I sent, however).
> > *) No more usage of a new syscall slot: to use the new interface,
> > application will use the new MAP_NOINHERIT flag I've added. I've still
> > the patches to use the old -mm ABI, if there's any reason they're needed.

> I'm glad you've scrapped the new syscall slot, that really put me off
> the old patch (though I was probably being silly about it).  This way
> is much better, but again I quarrel with your naming.

> "Inherit" is about parents and children, this is not; and furthermore,
> some UNIXes had a MAP_INHERIT (see asm-alpha/mman.h) which was about
> passing an mmap across exec.  Your MAP_NOINHERIT has nothing to do
> with that.  MAP_MANYPROTS would help us to follow the trail more
> easily (though it's true that you can't actually pass many prots
> in to a single remap_file_pages call).
MAP_CHGPROT? MAP_CHANGEPROT? MAP_REPROT?
VM_MANYPROTS is internal name, so there's no reason to have the same name 
either.
> > Subject: [patch 01/18] remap_file_pages protection support: uml, i386,
> > x64 bits
> >
> > Update pte encoding macros for UML, i386 and x86-64.
> > Also, add the MAP_NOINHERIT flag to arch headers.
>
> Well, I don't find your patch division very helpful, since you introduce
> these without us seeing what use is made of them.  And the MAP_NOINHERIT
> additions cover a different subset of arches (ppc, ppc64, s390 in there):
> those should be in some other patch.
For this patch, I joined up everything because people get scared when they see 
39 patches (and I've not really removed code, apart for things which were 
introduced and later rewritten, just changed the presentation between the two 
sends).
> Usually we just do the i386 arch first, and supply some other patch(es)
> for all the others.  But you've good reason to start with UML too, and
> it makes sense to include x86_64 along too if you're happy to do so.

> But it'll probably waste your time and mine to go on discussng patch
> division, let's leave it at that.

> > *** remap_file_pages protection support: improvement for UML bits

> > Recover one bit by additionally using _PAGE_NEWPROT. Since I wasn't sure
> > this would work, I've split this out, but it has worked well. We rely on
> > the fact that pte_newprot always checks first if the PTE is marked
> > present.

> And we never hear of _PAGE_NEWPROT or pte_newprot again.  Ah, they're
> already defined in and peculiar to UML, I see.  Well, if this some
> UML improvement change, please put that in a separate UML patch.
As above, I joined altogether more patches to reduce noise. And after proper 
unit testing and checking, it was safe anyway to join it.

> > -#define pte_to_pgoff(pte) (pte_val(pte) >> 4)
> > +#def

Re: [uml-devel] Re: [patch 1/3] uml: share page bits handling between 2 and 3 level pagetables

2005-09-04 Thread Blaisorblade
On Friday 02 September 2005 22:17, Jeff Dike wrote:
> On Wed, Aug 10, 2005 at 09:37:28PM +0200, Blaisorblade wrote:
> > Also look, on the "set_pte" theme, at the attached patch.

> +   WARN_ON(!pte_young(*pte) || pte_write(*pte) && !pte_dirty(*pte));

> This one has been firing on me, and I decided to figure out why.  The
> culprit is this code in do_no_page:

>   if (pte_none(*page_table)) {
>   if (!PageReserved(new_page))
>   inc_mm_counter(mm, rss);
>
>   flush_icache_page(vma, new_page);
>   entry = mk_pte(new_page, vma->vm_page_prot);
>   if (write_access)
>   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>   set_pte_at(mm, address, page_table, entry);
>
> The first mk_pte immediately sets the pte to the protection limits of
> the VMA, regardless of the access type.

> So, if it's a read access on 
> a writeable page, we get a writeable, but not dirty pte, since the
> mkdirty never happens.  The exercises the warning you added.
Thanks for noticing - I had really this doubt when writing some code (in the 
patch, I've added a dirty PTEs on read accesses because I was unsure, and 
even because of my warning).

> This seems somewhat bogus to me.  If we set the pte protection to its
> limits, then the maybe_mkwrite is unneccesary.

> This doesn't seem to harm our dirty bit emulation.  fix_range_common
> checks the dirty and accessed bits and disables read and write
> protection as appropriate.

> So, it seems like the warning could be dropped, or perhaps made more
> selective, like checking for is_write == 0 and VM_WRITE, but then the
> test is getting complicated.

No, just replace pte_write() with is_write, as below. They might not coincide, 
but if on a write fault we return with a clean PTE, we'll loop indefinitely 
(experienced while hacking on remap_f_p), so the warning above is definitely 
correct.

   WARN_ON(!pte_young(*pte) || is_write && !pte_dirty(*pte));
>   Heff

-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


___ 
Yahoo! Messenger: chiamate gratuite in tutto il mondo 
http://it.beta.messenger.yahoo.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] [PATCH 10/12] UML - Allow host capability usage to be disabled

2005-09-02 Thread Blaisorblade
On Friday 02 September 2005 00:17, Jeff Dike wrote:
> From: Bodo Stroesser <[EMAIL PROTECTED]>
>
> Add new cmdline setups:
>   - noprocmm
>   - noptracefaultinfo
> In case of testing, they can be used to switch off usage of
> /proc/mm and PTRACE_FAULTINFO independently.
Is "skas0" cmd line option preserved?
> Signed-off-by: Bodo Stroesser <[EMAIL PROTECTED]>
> Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
>
> Index: test/arch/um/os-Linux/start_up.c
> ===
> --- test.orig/arch/um/os-Linux/start_up.c 2005-09-01 16:42:42.0
> -0400 +++ test/arch/um/os-Linux/start_up.c2005-09-01 16:51:23.0
> -0400 @@ -275,6 +275,30 @@
>   check_ptrace();
>  }
>
> +static int __init noprocmm_cmd_param(char *str, int* add)
> +{
> + proc_mm = 0;
> + return 0;
> +}
> +
> +__uml_setup("noprocmm", noprocmm_cmd_param,
> +"noprocmm\n"
> +"Turns off usage of /proc/mm, even if host supports it.\n"
> +"To support /proc/mm, the host needs to be patched using\n"
> +"the current skas3 patch.\n\n");
> +
> +static int __init noptracefaultinfo_cmd_param(char *str, int* add)
> +{
> + ptrace_faultinfo = 0;
> + return 0;
> +}
> +
> +__uml_setup("noptracefaultinfo", noptracefaultinfo_cmd_param,
> +"noptracefaultinfo\n"
> +"Turns off usage of PTRACE_FAULTINFO, even if host supports\n"
> +"it. To support PTRACE_FAULTINFO, the host needs to be patched\n"
> +"using the current skas3 patch.\n\n");
> +
>  #ifdef UML_CONFIG_MODE_SKAS
>  static inline void check_skas3_ptrace_support(void)
>  {
>
>
>
> ---
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> ___
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade





___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] Ptrace - i386: fix "syscall audit" interaction with singlestep

2005-09-01 Thread Blaisorblade
On Wednesday 31 August 2005 04:02, Andrew Morton wrote:
> [EMAIL PROTECTED] wrote:
> > From: Bodo Stroesser <[EMAIL PROTECTED]>, Paolo
> > 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> CC: Roland McGrath
> > <[EMAIL PROTECTED]>
> >
> > Avoid giving two traps for singlestep instead of one, when syscall
> > auditing is enabled.
> >
> > In fact no singlestep trap is sent on syscall entry, only on syscall
> > exit, as can be seen in entry.S:
> >
> > # Note that in this mask _TIF_SINGLESTEP is not tested !!! <<<<<<<<<<<<<<
> > testb
> > $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SECCOMP),TI_flags(%ebp) jnz
> > syscall_trace_entry
> > ...
> > syscall_trace_entry:
> > ...
> > call do_syscall_trace
> >
> > But auditing a SINGLESTEP'ed process causes do_syscall_trace to be
> > called, so the tracer will get one more trap on the syscall entry path,
> > which it shouldn't.
> >
> > This does not affect (to my knowledge) UML, nor is critical, so this
> > shouldn't IMHO go in 2.6.13.
> >
> > Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
> > ---
> >
> >  linux-2.6.git-paolo/arch/i386/kernel/ptrace.c |   15 +--
> >  1 files changed, 13 insertions(+), 2 deletions(-)
> >
> > diff -puN arch/i386/kernel/ptrace.c~sysaudit-singlestep-non-umlhost
> > arch/i386/kernel/ptrace.c ---
> > linux-2.6.git/arch/i386/kernel/ptrace.c~sysaudit-singlestep-non-umlhost 
> > 2
> >005-07-26 20:22:40.0 +0200 +++
> > linux-2.6.git-paolo/arch/i386/kernel/ptrace.c   2005-07-26
> > 20:23:44.0 +0200 @@ -683,8 +683,19 @@ void
> > do_syscall_trace(struct pt_regs *re
> > /* do the secure computing check first */
> > secure_computing(regs->orig_eax);
> >
> > -   if (unlikely(current->audit_context) && entryexit)
> > -   audit_syscall_exit(current, AUDITSC_RESULT(regs->eax), 
> > regs->eax);
> > +   if (unlikely(current->audit_context)) {
> > +   if (entryexit)
> > +   audit_syscall_exit(current, AUDITSC_RESULT(regs->eax), 
> > regs->eax);
> > +
> > +   /* Debug traps, when using PTRACE_SINGLESTEP, must be sent only
> > +* on the syscall exit path. Normally, when TIF_SYSCALL_AUDIT is
> > +* not used, entry.S will call us only on syscall exit, not
> > +* entry ; so when TIF_SYSCALL_AUDIT is used we must avoid
> > +* calling send_sigtrap() on syscall entry.
> > +*/
> > +   else if (is_singlestep)
> > +   goto out;
> > +   }

> This appears to be a UML patch,
No, absolutely.
> applied to x86, which has no 
> `is_singlestep'.

It is a x86 patch, is_singlestep just comes from later patches (in fact -mm 
has built because that var is created in later patches (about SYSEMU) from 
me).

I took this from one of your mail notices:

ptrace-i386-fix-syscall-audit-interaction-with-singlestep.patch
uml-support-ptrace-adds-the-host-sysemu-support-for-uml-and-general-usage.patch
uml-support-reorganize-ptrace_sysemu-support.patch
uml-support-add-ptrace_sysemu_singlestep-option-to-i386.patch
sysemu-fix-sysaudit--singlestep-interaction.patch

Note in particular the last 
(sysemu-fix-sysaudit--singlestep-interaction.patch) is the original version 
of the patch you're talking about (i.e. this fix was first made again the 
SYSEMU patch, even if it's of general interest).

Just use test_thread_flag(TIF_SINGLESTEP), but leave later patches as-is, they 
need the current 
   int is_singlestep = !is_sysemu && test_thread_flag(TIF_SINGLESTEP);
to be left there, and is_singlestpe to be used in that check.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade





___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Memory reclaim: permanently pinned dentries (aka libfs/sysfs) and the blunderbuss effect

2005-09-01 Thread Blaisorblade
Martin J. Bligh described at OLS the "blunderbuss effect", i.e. the 
inefficiency of the dentry cache shrinker at freeing whole pages, since we 
could leave (worst-case) one dentry per page because it's at the end of the 
LRU list.

Pinned dentries (in first place libfs ones, but he also includes directories 
one - I think they are just hard to free, not really pinned) are allocated 
from the common dentry_cache, i.e. mixed with normal ones - why don't we fix 
that?

It seems that adding an (optional) flag to a new __d_alloc (with d_alloc 
becoming its wrapper) would be enough, since dentries are always allocated 
directly by filesystems (either on lookup or on creation of the pinned 
dentry). Or call it d_alloc_lively().

Also, it seems that the slab allocator willl allocate objects at fixed 
locations inside a page (even with page colouring, colour_offset is fixed 
per-slab and saved)*, once that slab has been allocated... so if we add a 
"DCACHE_FREED" flag and zero slabs content on alloc (at least for this slab), 
we could maybe enumerate all dentries in a page and try to free them, to 
finally free the whole slab.

* Otherwise this problem could probably be fixed some way.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade






___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] Fixup symlink function pointers for hppfs [for 2.6.13]

2005-08-26 Thread Blaisorblade
On Friday 26 August 2005 21:03, Al Viro wrote:
> On Fri, Aug 26, 2005 at 04:57:44PM +0200, [EMAIL PROTECTED] wrote:
> > From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

> > Update hppfs for the symlink functions prototype change.
> > Should be trivial, but please verify it's correct.

> > Yes, I know the code I leave there is still _bogus_, see next patch for
> > this.

About what it's doing, hppfs (HoneyPot Proc FS) is a wrapper for procfs, which 
must be able to hide part of the content for avoid an hacker inside UML 
realize he's hacking a virtual machine, and it's normally mounted on /proc, 
if used.

> Assuming that the next patch was "hppfs: fix symlink error path",
Yes.
> you've still left BS in there -
BullShit? Thanks for improving my acronym dictionary!
> > proc_file = dentry_open(dget(proc_dentry), NULL, O_RDONLY);

> is obviously wrong; at the very least you need vfsmount in there.
And beyond that what? I cannot even think what's the rest *. And "obvious" 
doesn't hold with me.

I'm _not_ a VFS hacker, I don't go beyond Documentation/filesystems/vfs.txt, 
so I'd better leave fixing that to you.

At least what you don't mention. I'll fix vfsmount in these days (if you want 
to do it yourself, I've put together needed info below).

I had to check dentry_open prototype to realize you're referring to the NULL 
there and not to dget.

And the dentries you see are all descendants of the root one, taken in 
hppfs_fill_super() from 

   err = init_inode(root_inode, proc_sb->s_root);

I guess the current hack could be replaced with reading 
fs/proc/inode.c:proc_mnt... I wouldn't pass proc_mnt directly because we 
don't know we took _that_ mount inside hppfs_fill_super(), but I like 
replacing

list_entry(get_fs_type("proc")->fs_supers.next,...) 

with proc_mnt->mnt_sb (assuming it's always filled in - IIRC I already checked 
the initialization order).

* I've verified that there's no missing dput() in failure case as that's 
handled by dentry_open().
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


___ 
Yahoo! Messenger: chiamate gratuite in tutto il mondo 
http://it.beta.messenger.yahoo.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [uml-devel] Re: [RFC] [patch 0/18] remap_file_pages protection support (for UML), try 3

2005-08-26 Thread Blaisorblade
On Friday 26 August 2005 21:11, Hugh Dickins wrote:
> On Fri, 26 Aug 2005, Blaisorblade wrote:
> > This is a followup to my post of last week (Aug 12) about
> > remap_file_pages protection support. I've improved and consolidated the
> > patches and updated them against 2.6.13-rc6/rc7 (the same patches apply
> > against both versions). I'm sending the full patch series only to akpm,
> > mingo and LKML.
> >
> > I've also reduced them to only 18, and made the splitting more
> > significant. I'm not resending all the patches for foreign architectures,
> > because they're almost unchanged since last time (there's just a trivial
> > reject from ppc32, because one change has already been done after -rc4).
> >
> > I'm working on this to provide support for UML, which currently easily
> > creates more than 64K (the default limit) vma's for a single process.
> > Actually, it needs one VMA per each page. So, with this patch and
> > specific UML support, which Ingo wrote and which I'm porting to recent
> > UMLs.
>
> I'll try to take a look sometime next week - or, if I wait until
> next Friday, can we expect it to have come down to 9 patches ;-?
:-) Don't think so, unless you want just me to join patches together. However, 
there are still some oneliners, so the patchset is not so huge.

Well, diffstat seems to contradict me (on the joined-up patch):

 Documentation/feature-removal-schedule.txt |   12 +
 arch/i386/mm/fault.c   |   19 ++
 arch/um/kernel/trap_kern.c |   52 ++-
 arch/x86_64/mm/fault.c |6
 include/asm-i386/mman.h|1
 include/asm-i386/pgtable-2level.h  |   15 +-
 include/asm-i386/pgtable-3level.h  |   11 +
 include/asm-i386/pgtable.h |3
 include/asm-ia64/mman.h|1
 include/asm-ppc/mman.h |1
 include/asm-ppc64/mman.h   |1
 include/asm-s390/mman.h|1
 include/asm-um/pgtable-2level.h|   15 +-
 include/asm-um/pgtable-3level.h|   21 ++-
 include/asm-um/pgtable.h   |3
 include/asm-x86_64/mman.h  |1
 include/asm-x86_64/pgtable.h   |   12 +
 include/linux/mm.h |   40 --
 include/linux/pagemap.h|   32 
 mm/filemap.c   |   18 ++
 mm/fremap.c|  135 +++-
 mm/madvise.c   |2
 mm/memory.c|  193 
++---
 mm/mmap.c  |   14 +-
 mm/mprotect.c  |3
 mm/rmap.c  |6
 mm/shmem.c |   13 +

> I should say, my initial reaction is very much like Andi's last week.

> sys_remap_file_pages solves a real problem, but it does so by breaking
> lots of rules.  For more than a year after it came in, almost every
> development we tried in mm would come up against "but then what do we
> do about the nonlinear mappings?".

Nonuniform mappings are much less of a problem. Really. The reason nonlinear 
mappings reached mainline before nonuniform ones (and I don't know if they 
willl ever) is not simplicity, but the miss of uses until now. And also the 
fact that Ingo hadn't the time to finish it.

In fact, I've been playing a lot with the GIT history during this month of 
development, particularly with objrmap, so I've come across those problems 
quite a lot, but what I noticed is that you mostly don't care about the VMA 
to be uniform, just it to be linear or not (because nonlinear VMAs break 
objrmap).

This patch, in comparison, is just:

*) check permissions in the generic VM when faulting in pages, if and only if 
the vma is nonuniform (yes, nontrivial at all).

*) be anally picky to save the PTE protections together with the rest, and do 
it *everywhere*; but if you say "nonuniform implies linear", you lose this 
problem almost completely.

The only exception is that when you remap an address range with PROT_NONE 
(thus effectively unmapping those addressed), you can't clear the PTEs but 
you must use pfn_pte(0, __S000) to fill them in (this is done in the 
optimization-fixup patch #15).

> That has settled down now, but I don't look forward to extending it.
> On the other hand, UML does deserve better support.

> Hugh

-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade




   

[RFC] [patch 0/18] remap_file_pages protection support (for UML), try 3

2005-08-26 Thread Blaisorblade
, 
or with patch 18.

==
Still todo
==
*) ->populate flushes each TLB individually, instead of using mmu_gathers as 
it should; this was suggested even by Ingo when sending the patch, but it 
seems he didn't get the time to finish this. And I'm now wondering how would 
that relate with I/O... at each I/O point we should finish and regather the 
mmu_gather, as in zap_page_range. But here we are reading pages, not the 
reverse!

Seems rewriting the kernel locking is a quite time-consuming task!
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


fremap-test-complete.c.bz2
Description: BZip2 compressed data


Re: remove-stale-comment-from-swapfilec.patch added to -mm tree

2005-08-26 Thread Blaisorblade
On Wednesday 24 August 2005 15:26, Hugh Dickins wrote:
> On Wed, 17 Aug 2005 [EMAIL PROTECTED] wrote:
> > From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

> > Seems like on 2.4.9.4 this comment got out of sync ;-)

> Not at all.  That comment (not mine) was valid before and after 2.4.9,
> though agreed may be mystifying since there's no visible pte_wrprotect.
> That's because vm_page_prot must not include write permission there,
> so the pte being constructed is automatically write-protected.
Ahhh! Thanks.
> (But there are several other places in the mm source, I'll not look
> them up right now, where there is an explicit pte_wrprotect, despite
> that vm_page_prot cannot have write permission in that place.
> I think do_anonymous_page gives an example of that.  Inconsistent.)
Ok, a shared writable mapping is made shmfs-based, so nopage is used there. A 
bit tricky to notice.
> > I'm not completely sure on which basis we don't need any more to do as
> > the comment suggests,

> There has been no change regarding write protection of the pte there,
> not since before 2.4.0 anyway.

> > but it seems that when faulting in a second time the same
> > swap page,  can_share_swap_page() returns false, and we do an early COW
> > break, so there's no need to write-protect the page.
> >
> > No idea why we don't defer the COW break.

> I don't understand what's being asserted there.

> If do_swap_page gets 
> a write fault, it either determines it can go ahead and use the swap
> page, or if it can't, gets do_wp_page to Copy-On-Write for it (that's
> a call I added in 2.6.7, as an optimization, and as a necessity for
> correct behaviour of ptrace's get_user_pages; the latter has just in
> 2.6.13-rc been made more resilient, so we could remove do_swap_page's
> call to do_wp_page now - though I'm inclined to let it stay as an
> optimization, avoiding the second fault which would follow).
get_user_pages() can still get two faults there, because VM_FAULT_WRITE is not 
returned by do_swap_page(). And faults can be very expensive (for UML a fault 
is given by a SIGSEGV delivery).

> If do_swap_page gets a read fault, it doesn't COW at all.

> I don't know what the "early" COW break referred to is: the write_access
> call to do_wp_page could be deferred, yes, but it's hardly early.
The idea in my mind is that after loading the page from swap the first time 
there's no need to copy the page to give a private copy to the process, if 
the page is kept on swap.

We COW it anyway to break the sharing, to keep the original copy in the 
swapcache, instead of reading it again from the disk. This is *early*. 

> Usually that's a reason to mark it as young (recently referenced).

> Yes, it was me who added that pte_mkold to unuse_pte in 2.4.10:
> because the user process is not faulting the page in (referencing it),
> we're just bringing it in because we're forced to empty out swap.
> Mark it as old because it hasn't been referenced by the process.

> But in 2.6.8, amusingly, Andrew introduced an activate_page there:
> because people were irritated by the way in which a swapoff followed
> by a swapon (which should "clean out" swap) led to the pages which
> had been swapped out before, quickly being swapped out again.

> My pte_mkold, and Andrew's activate_page, both have good justifications,
> but work right against each other.  Perhaps Andrew should have just
> removed my pte_mkold to get the effect he wanted.

Removing pte_mkold() and leaving the _PAGE_ACCESSED in vma->vm_page_prot would 
just turn out in one call to mark_page_accessed() I guess,, i.e. from 
inactive, unreferenced to inactive, referenced, while activate_page makes two 
steps...

> Oh, and now we 
> have another in unuse_mm - I thought I'd moved the unuse_pte one
> there, but there's two now.
> Amusing, but not very important. 
swapoff() and swapon() are not a real workload, agreed. But taking twice the 
spinlocks is bad, no?
> Andrew, please drop remove-stale-comment-from-swapfilec.patch.
> It was a good way to prod me into writing about a few things,
> but the comments are just wrong in too many ways.
Sorry for that and thanks for the insights!
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade





___ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 05/18] remap_file_pages protection support: enhance syscall interface

2005-08-26 Thread blaisorblade

From: Ingo Molnar <[EMAIL PROTECTED]>, Paolo 'Blaisorblade' Giarrusso <[EMAIL 
PROTECTED]>

This contains simply the changes to the syscall code, based on Ingo's patch.
Differently from his one, I've *not* added a new syscall, choosing to add a
new flag (MAP_NOINHERIT) which the application must specify to get the new
behavior (prot != 0 is accepted and prot == 0 means PROT_NONE).

Enable the 'prot' parameter for shared-writable mappings (the ones which are
the primary target for remap_file_pages), without breaking up the vma.

*** remap_file_pages protection support: use EOVERFLOW ret code

Use -EOVERFLOW ("Value too large for defined data type") rather than -EINVAL
when we cannot store the file offset in the PTE.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |   40 +++-
 1 files changed, 31 insertions(+), 9 deletions(-)

diff -puN mm/fremap.c~rfp-enhance-syscall mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-enhance-syscall   2005-08-24 
20:55:35.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-24 20:56:49.0 +0200
@@ -4,6 +4,10 @@
  * Explicit pagetable population and nonlinear (random) mappings support.
  *
  * started by Ingo Molnar, Copyright (C) 2002, 2003
+ * 
+ * support of nonuniform remappings:
+ * Copyright (C) 2004 Ingo Molnar
+ * Copyright (C) 2005 Paolo 'Blaisorblade' Giarrusso
  */
 
 #include 
@@ -164,18 +168,14 @@ err_unlock:
  *file within an existing vma.
  * @start: start of the remapped virtual memory range
  * @size: size of the remapped virtual memory range
- * @prot: new protection bits of the range
+ * @prot: new protection bits of the range, must be 0 if not using 
MAP_NOINHERIT
  * @pgoff: to be mapped page of the backing store file
- * @flags: 0 or MAP_NONBLOCKED - the later will cause no IO.
+ * @flags: bits MAP_NOINHERIT or MAP_NONBLOCKED - the later will cause no IO.
  *
  * this syscall works purely via pagetables, so it's the most efficient
  * way to map the same (large) file into a given virtual window. Unlike
  * mmap()/mremap() it does not create any new vmas. The new mappings are
  * also safe across swapout.
- *
- * NOTE: the 'prot' parameter right now is ignored, and the vma's default
- * protection is used. Arbitrary protections might be implemented in the
- * future.
  */
 asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
unsigned long prot, unsigned long pgoff, unsigned long flags)
@@ -188,7 +188,7 @@ asmlinkage long sys_remap_file_pages(uns
int has_write_lock = 0;
pgprot_t pgprot;
 
-   if (prot)
+   if (prot && !(flags & MAP_NOINHERIT))
goto out;
/*
 * Sanitize the syscall parameters:
@@ -203,7 +203,7 @@ asmlinkage long sys_remap_file_pages(uns
/* Can we represent this offset inside this architecture's pte's? */
 #if PTE_FILE_MAX_BITS < BITS_PER_LONG
if (pgoff + (size >> PAGE_SHIFT) >= (1UL << PTE_FILE_MAX_BITS))
-   return err;
+   return -EOVERFLOW;
 #endif
 
/* We need down_write() to change vma->vm_flags. */
@@ -228,7 +228,18 @@ retry:
vma->vm_start || end > vma->vm_end)
goto out_unlock;
 
-   pgprot = vma->vm_page_prot;
+   if (flags & MAP_NOINHERIT) {
+   err = -EPERM;
+   if (((prot & PROT_READ) && !(vma->vm_flags & VM_MAYREAD)))
+   goto out_unlock;
+   if (((prot & PROT_WRITE) && !(vma->vm_flags & VM_MAYWRITE)))
+   goto out_unlock;
+   if (((prot & PROT_EXEC) && !(vma->vm_flags & VM_MAYEXEC)))
+   goto out_unlock;
+   err = -EINVAL;
+   pgprot = protection_map[calc_vm_prot_bits(prot) | VM_SHARED];
+   } else 
+   pgprot = vma->vm_page_prot;
 
if (!vma->vm_private_data ||
(vma->vm_flags & (VM_NONLINEAR|VM_RESERVED))) {
@@ -251,6 +262,17 @@ retry:
spin_unlock(&mapping->i_mmap_lock);
}
 
+   if (pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot) &&
+   !(vma->vm_flags & VM_NONUNIFORM)) {
+   if (!has_write_lock) {
+   up_read(&mm->mmap_sem);
+   down_write(&mm->mmap_sem);
+   has_write_lock = 1;
+   goto retry;
+   }
+   vma->vm_flags |= VM_NONUNIFORM;
+   }
+
/* Do NOT hold the write lock while doing any I/O,

[patch 18/18] remap_file_pages linear nonuniform support: (2) fix truncation on nonuniform VMA

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Since we aren't going to support nonuniform linear VMAs, this is probably not
needed. Otherwise, you may want to look at this patch, but I must first note
that it's a bit intrusive.

We must save protections and support when truncating pages in a linear
nonuniform VMA.  I could indeed verify this failure: protections were reset on
a linear nonuniform VMA by a madvise(MADV_DONTNEED) or by a truncate(), while
they weren't when the VMA was also nonlinear.

Also, we can have pte_file PTE's even when details is clear. This used to
happen for MAP_POPULATE|MAP_NONBLOCK only, where clearing the PTE is allowed;
but now this can happen on nonuniform vmas, where we mustn't clear them. We
then unconditionally add vma to the zap_pte_range parameters. So, I've turned
details->nonlinear_vma into a bitfield, and I added another bitfield:
 ->must_unmap.

In fact, in the case we're being called by do_munmap() or exit_mmap() (rather
than truncating its pagecache), we must still clear mappings (leaving them in
place would create problems if we map another file in the same area, since we
would reuse the stored offset and protections).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/linux/mm.h |4 +-
 linux-2.6.git-paolo/mm/madvise.c   |2 -
 linux-2.6.git-paolo/mm/memory.c|   55 +++--
 linux-2.6.git-paolo/mm/mmap.c  |   10 --
 4 files changed, 45 insertions(+), 26 deletions(-)

diff -puN include/linux/mm.h~rfp-fix-linear-nonunif-truncation 
include/linux/mm.h
--- linux-2.6.git/include/linux/mm.h~rfp-fix-linear-nonunif-truncation  
2005-08-25 14:03:59.0 +0200
+++ linux-2.6.git-paolo/include/linux/mm.h  2005-08-25 14:03:59.0 
+0200
@@ -680,7 +680,6 @@ extern void user_shm_unlock(size_t, stru
  * Parameter block passed down to zap_pte_range in exceptional cases.
  */
 struct zap_details {
-   struct vm_area_struct *nonlinear_vma;   /* Check page->index if set */
struct address_space *check_mapping;/* Check page->mapping if set */
pgoff_t first_index;/* Lowest page->index to unmap 
*/
pgoff_t last_index; /* Highest page->index to unmap 
*/
@@ -689,6 +688,9 @@ struct zap_details {
unsigned prot_none_ptes : 1;/* If 1, set all PTE's to
   PROT_NONE ones, and all other
   fields must be clear */
+   unsigned nonlinear_vma : 1; /* Check page->index if set */
+   unsigned must_unmap : 1;/* Totally zap page tables, it's
+  an unmap not a truncation. */
 };
 
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
diff -puN mm/madvise.c~rfp-fix-linear-nonunif-truncation mm/madvise.c
--- linux-2.6.git/mm/madvise.c~rfp-fix-linear-nonunif-truncation
2005-08-25 14:03:59.0 +0200
+++ linux-2.6.git-paolo/mm/madvise.c2005-08-25 14:03:59.0 +0200
@@ -128,7 +128,7 @@ static long madvise_dontneed(struct vm_a
 
if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
struct zap_details details = {
-   .nonlinear_vma = vma,
+   .nonlinear_vma = 1,
.last_index = ULONG_MAX,
};
zap_page_range(vma, start, end - start, &details);
diff -puN mm/memory.c~rfp-fix-linear-nonunif-truncation mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-fix-linear-nonunif-truncation 2005-08-25 
14:03:59.0 +0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-25 14:03:59.0 +0200
@@ -514,9 +514,9 @@ int copy_page_range(struct mm_struct *ds
return 0;
 }
 
-static void zap_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
-   unsigned long addr, unsigned long end,
-   struct zap_details *details)
+static void zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+   pmd_t *pmd, unsigned long addr,
+   unsigned long end, struct zap_details *details)
 {
pte_t *pte;
 
@@ -536,8 +536,8 @@ static void zap_pte_range(struct mmu_gat
if (PageReserved(page))
page = NULL;
}
-   if (unlikely(details && !details->prot_none_ptes) &&
-   page) {
+   if (unlikely(details && !details->prot_none_ptes &&
+   !details->must_unmap) && page) {

[patch 01/18] remap_file_pages protection support: uml, i386, x64 bits

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>, Ingo Molnar <[EMAIL 
PROTECTED]>

Update pte encoding macros for UML, i386 and x86-64. Also, add the
MAP_NOINHERIT flag to arch headers.

*** remap_file_pages protection support: improvement for UML bits

Recover one bit by additionally using _PAGE_NEWPROT. Since I wasn't sure this
would work, I've split this out, but it has worked well. We rely on the fact
that pte_newprot always checks first if the PTE is marked present.


Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/asm-i386/mman.h   |1 
 linux-2.6.git-paolo/include/asm-i386/pgtable-2level.h |   15 
 linux-2.6.git-paolo/include/asm-i386/pgtable-3level.h |   11 -
 linux-2.6.git-paolo/include/asm-ia64/mman.h   |1 
 linux-2.6.git-paolo/include/asm-ppc/mman.h|1 
 linux-2.6.git-paolo/include/asm-ppc64/mman.h  |1 
 linux-2.6.git-paolo/include/asm-s390/mman.h   |1 
 linux-2.6.git-paolo/include/asm-um/pgtable-2level.h   |   15 +---
 linux-2.6.git-paolo/include/asm-um/pgtable-3level.h   |   21 +-
 linux-2.6.git-paolo/include/asm-x86_64/mman.h |1 
 linux-2.6.git-paolo/include/asm-x86_64/pgtable.h  |   12 +-
 11 files changed, 64 insertions(+), 16 deletions(-)

diff -puN include/asm-um/pgtable-2level.h~rfp-arch-uml 
include/asm-um/pgtable-2level.h
--- linux-2.6.git/include/asm-um/pgtable-2level.h~rfp-arch-uml  2005-08-21 
21:09:42.0 +0200
+++ linux-2.6.git-paolo/include/asm-um/pgtable-2level.h 2005-08-21 
21:09:42.0 +0200
@@ -72,12 +72,19 @@ static inline void set_pte(pte_t *pteptr
((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
 
 /*
- * Bits 0 through 3 are taken
+ * Bits 0, 1, 3 to 5 are taken, split up the 27 bits of offset
+ * into this range:
  */
-#define PTE_FILE_MAX_BITS  28
+#define PTE_FILE_MAX_BITS  27
 
-#define pte_to_pgoff(pte) (pte_val(pte) >> 4)
+#define pte_to_pgoff(pte) (((pte_val(pte) >> 6) << 1) | ((pte_val(pte) >> 2) & 
0x1))
+#define pte_to_pgprot(pte) \
+   __pgprot((pte_val(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \
+   | ((pte_val(pte) & _PAGE_PROTNONE) ? 0 : \
+   (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED)
 
-#define pgoff_to_pte(off) ((pte_t) { ((off) << 4) + _PAGE_FILE })
+#define pgoff_prot_to_pte(off, prot) \
+   __pteoff) >> 1) << 6) + (((off) & 0x1) << 2) + \
+(pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) + _PAGE_FILE)
 
 #endif
diff -puN include/asm-um/pgtable-3level.h~rfp-arch-uml 
include/asm-um/pgtable-3level.h
--- linux-2.6.git/include/asm-um/pgtable-3level.h~rfp-arch-uml  2005-08-21 
21:09:42.0 +0200
+++ linux-2.6.git-paolo/include/asm-um/pgtable-3level.h 2005-08-21 
21:09:42.0 +0200
@@ -140,25 +140,36 @@ static inline pmd_t pfn_pmd(pfn_t page_n
 }
 
 /*
- * Bits 0 through 3 are taken in the low part of the pte,
+ * Bits 0 through 5 are taken in the low part of the pte,
  * put the 32 bits of offset into the high part.
  */
 #define PTE_FILE_MAX_BITS  32
 
+
 #ifdef CONFIG_64BIT
 
 #define pte_to_pgoff(p) ((p).pte >> 32)
-
-#define pgoff_to_pte(off) ((pte_t) { ((off) << 32) | _PAGE_FILE })
+#define pgoff_to_pte(off) ((pte_t) { ((off) << 32) | _PAGE_FILE | \
+   (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) })
+#define pte_flags(pte) pte_val(pte)
 
 #else
 
 #define pte_to_pgoff(pte) ((pte).pte_high)
-
-#define pgoff_to_pte(off) ((pte_t) { _PAGE_FILE, (off) })
+#define pgoff_prot_to_pte(off, prot) ((pte_t) { \
+   (pgprot_val(prot) & (_PAGE_RW | _PAGE_PROTNONE)) | _PAGE_FILE, \
+   (off) })
+/* Don't use pte_val below, useless to join the two halves */
+#define pte_flags(pte) ((pte).pte_low)
 
 #endif
 
+#define pte_to_pgprot(pte) \
+   __pgprot((pte_flags(pte) & (_PAGE_RW | _PAGE_PROTNONE)) \
+   | ((pte_flags(pte) & _PAGE_PROTNONE) ? 0 : \
+   (_PAGE_USER | _PAGE_PRESENT)) | _PAGE_ACCESSED)
+#undef pte_flags
+
 #endif
 
 /*
diff -puN include/asm-i386/pgtable-2level.h~rfp-arch-uml 
include/asm-i386/pgtable-2level.h
--- linux-2.6.git/include/asm-i386/pgtable-2level.h~rfp-arch-uml
2005-08-21 21:09:53.0 +0200
+++ linux-2.6.git-paolo/include/asm-i386/pgtable-2level.h   2005-08-21 
21:09:53.0 +0200
@@ -48,16 +48,21 @@ static inline int pte_exec_kernel(pte_t 
 }
 
 /*
- * Bits 0, 6 and 7 are taken, split up the 29 bits of offset
+ * Bits 0, 1, 6 and 7 are taken, split up the 28 bits of offset
  * into this range:
  */
-#define PTE_FILE_MAX_BITS  29
+#define PTE_FILE_MAX_BITS  28
 
 #define pte_to_pgoff(pte) \
-   pte).pte_low >> 1) & 0x1f ) + (((pte).pte_low >> 8) << 5 ))
+   (

[patch 11/18] remap_file_pages protection support: also set VM_NONLINEAR on nonuniform VMAs

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

To simplify the VM code, and to reflect expected application usage, we decide
to also set VM_NONLINEAR when setting VM_NONUNIFORM. Otherwise, we'd have to
possibly save nonlinear PTEs even on paths which cope with linear VMAs. It's
possible, but intrusive (it's done in one of the next patches).

Obviously, this has a performance cost, since we potentially have to handle a
linear VMA with nonlinear handling code. But I don't know of any application
which might have this usage.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |   27 ++-
 1 files changed, 14 insertions(+), 13 deletions(-)

diff -puN mm/fremap.c~rfp-nonuniform-implies-nonlinear mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-nonuniform-implies-nonlinear  2005-08-25 
12:50:00.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-25 12:50:00.0 +0200
@@ -246,8 +246,9 @@ retry:
if (!vma->vm_private_data ||
(vma->vm_flags & (VM_NONLINEAR|VM_RESERVED))) {
/* Must set VM_NONLINEAR before any pages are populated. */
-   if (pgoff != linear_page_index(vma, start) &&
-   !(vma->vm_flags & VM_NONLINEAR)) {
+   if (!(vma->vm_flags & VM_NONLINEAR) &&
+   (pgoff != linear_page_index(vma, start) ||
+   pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot))) {
if (!(vma->vm_flags & VM_SHARED))
goto out_unlock;
if (!has_write_lock) {
@@ -264,19 +265,19 @@ retry:
vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear);
flush_dcache_mmap_unlock(mapping);
spin_unlock(&mapping->i_mmap_lock);
-   }
 
-   if (pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot) &&
-   !(vma->vm_flags & VM_NONUNIFORM)) {
-   if (!(vma->vm_flags & VM_SHARED))
-   goto out_unlock;
-   if (!has_write_lock) {
-   up_read(&mm->mmap_sem);
-   down_write(&mm->mmap_sem);
-   has_write_lock = 1;
-   goto retry;
+   if (!(vma->vm_flags & VM_NONUNIFORM) &&
+   pgprot_val(pgprot) != 
pgprot_val(vma->vm_page_prot)) {
+   if (!(vma->vm_flags & VM_SHARED))
+   goto out_unlock;
+   if (!has_write_lock) {
+   up_read(&mm->mmap_sem);
+   down_write(&mm->mmap_sem);
+   has_write_lock = 1;
+   goto retry;
+   }
+   vma->vm_flags |= VM_NONUNIFORM;
}
-   vma->vm_flags |= VM_NONUNIFORM;
}
 
/* Do NOT hold the write lock while doing any I/O, nor when
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 17/18] remap_file_pages linear nonuniform support: (1) try_to_unmap_one fix

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Since we probably will not support linear nonuniform VMAs, this will not
probably be needed. However I'm sending it just in case.

Fix try_to_unmap_one for linear VM_NONUNIFORM vma's.

When unmapping linear but non uniform VMA's in try_to_unmap_one, we must
encode the prots in the PTE.

However, we don't use the generic save_nonlinear_pte() function as it allows
for nonlinear offsets, on which we instead BUG() in this code path, by using
save_nonuniform_pte().

I've not added any TLB flush because PTE's have already been cleared and
flushed in both cases, and (I assume from existing practice and common sense,
but I don't trust CPU architects on having the latter ;-) ) TLB won't need to
know about changes in the "software" part of absent PTEs.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/linux/pagemap.h |   11 +++
 linux-2.6.git-paolo/mm/rmap.c   |3 +++
 2 files changed, 14 insertions(+)

diff -puN include/linux/pagemap.h~rfp-linear-nonuniform-1 
include/linux/pagemap.h
--- linux-2.6.git/include/linux/pagemap.h~rfp-linear-nonuniform-1   
2005-08-25 12:46:20.0 +0200
+++ linux-2.6.git-paolo/include/linux/pagemap.h 2005-08-25 12:46:20.0 
+0200
@@ -180,6 +180,17 @@ static inline void save_nonlinear_pte(pt
set_pte_at(mm, addr, ptep, pgoff_prot_to_pte(page->index, 
pgprot));
 }
 
+/* For linear but nonuniform VMA's*/
+static inline void save_nonuniform_pte(pte_t pte, pte_t * ptep, struct
+   vm_area_struct *vma, struct mm_struct *mm, struct page* page,
+   unsigned long addr)
+{
+   pgprot_t pgprot = pte_to_pgprot(pte);
+   BUG_ON(linear_page_index(vma, addr) != page->index);
+   if (pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot))
+   set_pte_at(mm, addr, ptep, pgoff_prot_to_pte(page->index, 
pgprot));
+}
+
 extern void FASTCALL(__lock_page(struct page *page));
 extern void FASTCALL(unlock_page(struct page *page));
 
diff -puN mm/rmap.c~rfp-linear-nonuniform-1 mm/rmap.c
--- linux-2.6.git/mm/rmap.c~rfp-linear-nonuniform-1 2005-08-25 
12:46:20.0 +0200
+++ linux-2.6.git-paolo/mm/rmap.c   2005-08-25 12:46:20.0 +0200
@@ -543,6 +543,9 @@ static int try_to_unmap_one(struct page 
flush_cache_page(vma, address, page_to_pfn(page));
pteval = ptep_clear_flush(vma, address, pte);
 
+   /* If nonlinear, store the file page offset in the pte. */
+   save_nonuniform_pte(pteval, pte, vma, mm, page, address);
+
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
set_page_dirty(page);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 16/18] remap_file_pages protection support: avoid truncating COW PTEs

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>, Ingo Molnar <[EMAIL 
PROTECTED]>

This patch may or may not be wanted. I took this from the original Ingo's
patch and improved it, but probably we want to keep this bug. However, I'm not
sure of what Ingo wanted to do.

If on a private writable mapping we call remap_file_pages() without
altering the file offset or the protections, after writing on the page to
create a COW mapping, with this patch we refuse reinstalling the old page, as
we should.

However, I'm not sure there's a point for an app to do this.

It is possible that we return an error even if the present page is already the
same one; however, that shouldn't be a big problem. In fact, the main purpose
of supporting private VMAs in remap_file_pages is allowing mmap(MAP_PRIVATE |
MAP_POPULATE) to work, and for that case existing mappings have already been
cleared and this patch is unneeded.

Note that this patch *needs* testing on each existing arch - I already got
subtle failures on i386 and not on UML on this patch (I had forgot to test
pte_present(), and pte_file() returned true, because _PAGE_DIRTY and
_PAGE_FILE share the same slot).

Setting CONFIG_DEBUG_PRIVATE in the test-program provides a mean to test this.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |   19 +++
 1 files changed, 19 insertions(+)

diff -puN mm/fremap.c~rfp-notrunc-priv-mappings mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-notrunc-priv-mappings 2005-08-24 
20:57:34.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-24 20:57:34.0 +0200
@@ -94,6 +94,16 @@ int install_page(struct mm_struct *mm, s
if (!page->mapping || page->index >= size)
goto err_unlock;
 
+   /*
+* On private (and thus uniform) mapping, we don't want to truncate COW
+* page, so we can only override pte_none or pte_file PTEs, not swap or
+* present ones.
+*/
+   err = -EEXIST;
+   if (unlikely(!(vma->vm_flags & VM_SHARED)) && (pte_present(*pte) ||
+   (!pte_none(*pte) && !pte_file(*pte
+   goto err_unlock;
+
zap_pte(mm, vma, addr, pte);
 
inc_mm_counter(mm,rss);
@@ -155,6 +165,15 @@ int install_file_pte(struct mm_struct *m
err = 0;
if (uniform && pte_none(*pte))
goto err_unlock;
+   /*
+* On private (and thus uniform) mapping, we don't want to truncate COW
+* page, so we can only override pte_none or pte_file PTEs, not swap or
+* present ones.
+*/
+   err = -EEXIST;
+   if (unlikely(!(vma->vm_flags & VM_SHARED)) && (pte_present(*pte) ||
+   (!pte_none(*pte) && !pte_file(*pte
+   goto err_unlock;
 
err = 0;
zap_pte(mm, vma, addr, pte);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 10/18] remap_file_pages protection support: adapt to uml peculiarities

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Uml is particular in respect with other architectures (and possibly this is to
fix) in the fact that our arch fault handler handles indifferently both TLB
and page faults. In particular, we may get to call handle_mm_fault() when the
PTE is already correct, but simply it's not flushed.

And rfp-fault-sigsegv-2 breaks this, because when getting a fault on a
pte_present PTE and non-uniform VMA, it assumes the fault is due to a
protection fault, and signals the caller a SIGSEGV must be sent.

*** remap_file_pages protection support: fix unflushed TLB errors detection

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

We got unflushed PTE's marked up-to-date, because they were protected to get
dirtying / accessing faults. So, don't test the PTE for being up-to-date, but
check directly the permission (since the PTE is not protected for that).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/um/kernel/trap_kern.c |   37 +
 1 files changed, 32 insertions(+), 5 deletions(-)

diff -puN arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-handle-tlb-faults 
arch/um/kernel/trap_kern.c
--- linux-2.6.git/arch/um/kernel/trap_kern.c~rfp-sigsegv-uml-handle-tlb-faults  
2005-08-21 21:32:13.0 +0200
+++ linux-2.6.git-paolo/arch/um/kernel/trap_kern.c  2005-08-21 
21:32:13.0 +0200
@@ -35,7 +35,7 @@ int handle_page_fault(unsigned long addr
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
-   pte_t *pte;
+   pte_t *pte, entry;
int err = -EFAULT;
int access_mask = 0;
 
@@ -84,8 +84,37 @@ handle_fault:
err = -EACCES;
goto out;
case VM_FAULT_SIGSEGV:
-   err = -EFAULT;
-   goto out;
+   WARN_ON(!(vma->vm_flags & VM_NONUNIFORM));
+   /* Duplicate this code here. */
+   pgd = pgd_offset(mm, address);
+   pud = pud_offset(pgd, address);
+   pmd = pmd_offset(pud, address);
+   pte = pte_offset_kernel(pmd, address);
+   if (likely (pte_newpage(*pte) || pte_newprot(*pte)) ||
+   (is_write ? pte_write(*pte) : pte_read(*pte)) ) 
{
+   /* The page hadn't been flushed, or it had been
+* flushed but without access to get a dirtying
+* / accessing fault. */
+
+   /* __handle_mm_fault() didn't dirty / young this
+* PTE, probably we won't get another fault for
+* this page, so fix things now. */
+   entry = *pte;
+   entry = pte_mkyoung(*pte);
+   if(pte_write(entry))
+   entry = pte_mkdirty(entry);
+   /* Yes, this will set the page as NEWPAGE. We
+* want this, otherwise things won't work.
+* Indeed, the
+* *pte = pte_mkyoung(*pte);
+* we used to have (uselessly) didn't work at
+* all! */
+   set_pte(pte, entry);
+   break;
+   } else {
+   err = -EFAULT;
+   goto out;
+   }
case VM_FAULT_OOM:
err = -ENOMEM;
goto out_of_memory;
@@ -98,8 +127,6 @@ handle_fault:
pte = pte_offset_kernel(pmd, address);
} while(!pte_present(*pte));
err = 0;
-   *pte = pte_mkyoung(*pte);
-   if(pte_write(*pte)) *pte = pte_mkdirty(*pte);
flush_tlb_page(vma, address);
 out:
up_read(&mm->mmap_sem);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 08/18] remap file pages protection support: use FAULT_SIGSEGV for protection checking

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>, Ingo Molnar <[EMAIL 
PROTECTED]>

This is the more intrusive patch, but it can't be reduced a lot, even if I
limit the protection support to the bare minimum for Uml.

The arch handler used to check itself protection, now we must possibly move
that to the generic VM if the VMA is non-uniform, since vma protections are
totally unreliable in that case (except in the ->nopage case).

So, we change the prototype of __handle_mm_fault() to inform it of the access
kind, so it does protection checking. handle_mm_fault() keeps its API, but has
the new VM_FAULT_SIGSEGV return value.

This value must be handled in every arch-specific fault handlers, otherwise we
might get spurious BUG()/oom killing. However, I've alleviated this need via
the previous "safety net" patch.

* do_file_page installs the PTE and doesn't check the fault type, if it
  was wrong, then it'll do another fault and die only then. I've left this for
  now to exercise more the code, and it works anyway; beyond, this way the
  fast-path is potentially more efficient.

* I've also changed do_no_pages to fault in pages with their *exact* permissions
  for non-uniform VMAs. There we mark pages as dirty even on read faults - I
  don't know if this can be skipped.

* For checking, we simply reuse the standard protection_map, by creating a
  pte_t value with the vma->vm_page_prot protection and testing directly
  pte_{read,write,exec} on it.

* I use the physical frame number "0" to create the PTE, even if this isn't
  probably realistic, but I assume that pfn_pte() and the access macros will
  work anyway.

* Also, there is a (potential) problem: on VM_NONUNIFORM vmas, in
  handle_pte_fault(), if the PTE is present we unconditionally return
  VM_FAULT_SIGSEGV, because the PTE was already up-to-date. This has proven to
  be a bit strict, at least for UML - so this may break other arches too (only
  for new functionality). At least, peculiar ones - this problem was due to
  handle_mm_fault() called for TLB faults rather than PTE faults.

* Another problem I've just discovered is that PTRACE_POKETEXT access_process_vm
  on VM_NONUNIFORM write-protected vma's won't work. This is handled in a
  specific patch.

Changes are included for the i386 and UML handler. It isn't enough to make UML
work, however, because UML has some peculiarities. Subsequent patches fix
this. x86_64 only contains the "silly" part (just handles VM_FAULT_SIGSEGV).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/arch/i386/mm/fault.c   |   19 -
 linux-2.6.git-paolo/arch/um/kernel/trap_kern.c |   19 -
 linux-2.6.git-paolo/arch/x86_64/mm/fault.c |6 +
 linux-2.6.git-paolo/include/linux/mm.h |   15 ++--
 linux-2.6.git-paolo/mm/memory.c|   85 -
 5 files changed, 116 insertions(+), 28 deletions(-)

diff -puN arch/i386/mm/fault.c~rfp-add-vm_fault_sigsegv arch/i386/mm/fault.c
--- linux-2.6.git/arch/i386/mm/fault.c~rfp-add-vm_fault_sigsegv 2005-08-24 
12:01:07.0 +0200
+++ linux-2.6.git-paolo/arch/i386/mm/fault.c2005-08-24 12:01:07.0 
+0200
@@ -219,6 +219,7 @@ fastcall void do_page_fault(struct pt_re
unsigned long address;
unsigned long page;
int write;
+   int access_mask = 0;
siginfo_t info;
 
/* get the address */
@@ -315,6 +316,14 @@ fastcall void do_page_fault(struct pt_re
 good_area:
info.si_code = SEGV_ACCERR;
write = 0;
+
+   /* If the PTE is not present, the vma protection are not accurate if
+* VM_NONUNIFORM; present PTE's are correct for VM_NONUNIFORM. */
+   if (unlikely(vma->vm_flags & VM_NONUNIFORM)) {
+   access_mask = write ? VM_WRITE : VM_READ;
+   goto handle_fault;
+   }
+
switch (error_code & 3) {
default:/* 3: write, present */
 #ifdef TEST_VERIFY_AREA
@@ -334,13 +343,15 @@ good_area:
goto bad_area;
}
 
- survive:
+   access_mask = write ? VM_WRITE : 0;
+handle_fault:
/*
 * If for any reason at all we couldn't handle the fault,
 * make sure we exit gracefully rather than endlessly redo
 * the fault.
 */
-   switch (handle_mm_fault(mm, vma, address, write)) {
+   switch (__handle_mm_fault(mm, vma, address, access_mask) &
+   (~VM_FAULT_WRITE)) {
case VM_FAULT_MINOR:
tsk->min_flt++;
break;
@@ -351,6 +362,8 @@ good_area:
goto do_sigbus;
case VM_FAULT_OOM:
goto out_of_memory;
+   case VM_FAULT_SIGSEGV:
+   goto

[patch 12/18] remap_file_pages protection support: optimize install_file_pte for MAP_POPULATE

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Add an optimization to install_file_pte: if the VMA is uniform (and thus we're
likely called by MAP_POPULATE), and the PTE was null, it will be installed
correctly if needed at fault time - we avoid thus touching the page tables,
but we must still do the walk...

The PTE could have a wrong value only if we are in a private VMA, and it was a
broken COW page, either installed or swapped. So, in subsequent patches we
even optimize away the walk.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |9 +
 1 files changed, 9 insertions(+)

diff -puN mm/fremap.c~rfp-linear-optim-v2 mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-linear-optim-v2   2005-08-25 
12:50:10.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-25 12:58:29.0 +0200
@@ -125,6 +125,9 @@ int install_file_pte(struct mm_struct *m
pud_t *pud;
pgd_t *pgd;
pte_t pte_val;
+   int uniform = !(vma->vm_flags & (VM_NONUNIFORM | VM_NONLINEAR));
+
+   BUG_ON(!uniform && !(vma->vm_flags & VM_SHARED));
 
pgd = pgd_offset(mm, addr);
spin_lock(&mm->page_table_lock);
@@ -140,6 +143,12 @@ int install_file_pte(struct mm_struct *m
pte = pte_alloc_map(mm, pmd, addr);
if (!pte)
goto err_unlock;
+   /*
+* Skip uniform non-existent ptes:
+*/
+   err = 0;
+   if (uniform && pte_none(*pte))
+   goto err_unlock;
 
zap_pte(mm, vma, addr, pte);
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 09/18] remap_file_pages protection support: fix get_user_pages() on VM_NONUNIFORM vmas

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

get_user_pages may well call handle_mm_fault on present and valid PTEs. Signal
that by using VM_MAYREAD in the access_mask.

Also, get_user_pages() may give write faults on present readonly PTEs in
VM_NONUNIFORM areas (think of PTRACE_POKETEXT), so we must still do do_wp_page
even on VM_NONUNIFORM areas.

So, possibly use VM_MAYWRITE in the access_mask and check VM_NONUNIFORM in
maybe_mkwrite_file (new variant of maybe_mkwrite).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/linux/mm.h |   13 +++-
 linux-2.6.git-paolo/mm/memory.c|   52 +++--
 2 files changed, 49 insertions(+), 16 deletions(-)

diff -puN mm/memory.c~rfp-fix-get_user_pages mm/memory.c
--- linux-2.6.git/mm/memory.c~rfp-fix-get_user_pages2005-08-24 
13:34:45.0 +0200
+++ linux-2.6.git-paolo/mm/memory.c 2005-08-24 15:53:50.0 +0200
@@ -945,7 +945,7 @@ int get_user_pages(struct task_struct *t
}
spin_lock(&mm->page_table_lock);
do {
-   int write_access = write ? VM_WRITE : 0;
+   int write_access = flags & (VM_MAYWRITE | VM_WRITE);
struct page *page;
 
cond_resched_lock(&mm->page_table_lock);
@@ -964,7 +964,8 @@ int get_user_pages(struct task_struct *t
break;
}
spin_unlock(&mm->page_table_lock);
-   ret = __handle_mm_fault(mm, vma, start, 
write_access);
+   ret = __handle_mm_fault(mm, vma, start,
+   write_access | VM_MAYREAD);
 
/*
 * The VM_FAULT_WRITE bit tells us that 
do_wp_page has
@@ -1190,7 +1191,20 @@ EXPORT_SYMBOL(remap_pfn_range);
  * servicing faults for write access.  In the normal case, do always want
  * pte_mkwrite.  But get_user_pages can cause write faults for mappings
  * that do not have writing enabled, when used by access_process_vm.
+ *
+ * Also, we must never change protections on VM_NONUNIFORM pages; that's only
+ * allowed in do_no_page(), so test only VMA protections there. For other cases
+ * we *know* that VM_NONUNIFORM is clear, such as anonymous/swap pages, and in
+ * that case using plain maybe_mkwrite() is an optimization.
+ * Instead, when we may be mapping a file, we must use maybe_mkwrite_file.
  */
+static inline pte_t maybe_mkwrite_file(pte_t pte, struct vm_area_struct *vma)
+{
+   if (likely((vma->vm_flags & (VM_WRITE | VM_NONUNIFORM)) == VM_WRITE))
+   pte = pte_mkwrite(pte);
+   return pte;
+}
+
 static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 {
if (likely(vma->vm_flags & VM_WRITE))
@@ -1206,8 +1220,8 @@ static inline void break_cow(struct vm_a
 {
pte_t entry;
 
-   entry = maybe_mkwrite(pte_mkdirty(mk_pte(new_page, vma->vm_page_prot)),
- vma);
+   entry = maybe_mkwrite_file(pte_mkdirty(mk_pte(new_page,
+   vma->vm_page_prot)), vma);
ptep_establish(vma, address, page_table, entry);
update_mmu_cache(vma, address, entry);
lazy_mmu_prot_update(entry);
@@ -1260,8 +1274,8 @@ static int do_wp_page(struct mm_struct *
unlock_page(old_page);
if (reuse) {
flush_cache_page(vma, address, pfn);
-   entry = maybe_mkwrite(pte_mkyoung(pte_mkdirty(pte)),
- vma);
+   entry = 
maybe_mkwrite_file(pte_mkyoung(pte_mkdirty(pte)),
+   vma);
ptep_set_access_flags(vma, address, page_table, entry, 
1);
update_mmu_cache(vma, address, entry);
lazy_mmu_prot_update(entry);
@@ -1971,14 +1985,15 @@ static int do_file_page(struct mm_struct
 * ->populate; in this case do the protection checks.
 */
if (!vma->vm_ops->populate ||
-   ((access_mask & VM_WRITE) && !(vma->vm_flags & 
VM_SHARED))) {
+   ((access_mask & (VM_WRITE|VM_MAYWRITE)) &&
+!(vma->vm_flags & VM_SHARED))) {
/* We're behaving as if pte_file was cleared, so check
 * protections like in handle_pte_fault. */
if (check_perms(vma, access_mask))
goto out_segv;
 
pte_clear(mm, address, pte);
-   return do_no_page(mm, vma, address, access_mask & VM_WRITE, 
pte, pmd);
+   return 

[patch 06/18] remap_file_pages protection support: support private vma for MAP_POPULATE

2005-08-26 Thread blaisorblade

From: Ingo Molnar <[EMAIL PROTECTED]>

Fix MAP_POPULATE | MAP_PRIVATE. We don't need the VMA to be shared if we don't
rearrange pages around. And it's trivial to do.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |7 ---
 linux-2.6.git-paolo/mm/mmap.c   |4 
 2 files changed, 8 insertions(+), 3 deletions(-)

diff -puN mm/fremap.c~rfp-private-vma-2 mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-private-vma-2 2005-08-24 20:57:13.0 
+0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-24 20:57:13.0 +0200
@@ -221,9 +221,6 @@ retry:
if (!vma)
goto out_unlock;
 
-   if (!(vma->vm_flags & VM_SHARED))
-   goto out_unlock;
-
if (!vma->vm_ops || !vma->vm_ops->populate || end <= start || start <
vma->vm_start || end > vma->vm_end)
goto out_unlock;
@@ -246,6 +243,8 @@ retry:
/* Must set VM_NONLINEAR before any pages are populated. */
if (pgoff != linear_page_index(vma, start) &&
!(vma->vm_flags & VM_NONLINEAR)) {
+   if (!(vma->vm_flags & VM_SHARED))
+   goto out_unlock;
if (!has_write_lock) {
up_read(&mm->mmap_sem);
down_write(&mm->mmap_sem);
@@ -264,6 +263,8 @@ retry:
 
if (pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot) &&
!(vma->vm_flags & VM_NONUNIFORM)) {
+   if (!(vma->vm_flags & VM_SHARED))
+   goto out_unlock;
if (!has_write_lock) {
up_read(&mm->mmap_sem);
down_write(&mm->mmap_sem);
diff -puN mm/mmap.c~rfp-private-vma-2 mm/mmap.c
--- linux-2.6.git/mm/mmap.c~rfp-private-vma-2   2005-08-24 20:57:13.0 
+0200
+++ linux-2.6.git-paolo/mm/mmap.c   2005-08-24 20:57:13.0 +0200
@@ -1124,6 +1124,10 @@ out: 
}
if (flags & MAP_POPULATE) {
up_write(&mm->mmap_sem);
+   /*
+* remap_file_pages() works even if the mapping is private,
+* in the linearly-mapped case:
+*/
sys_remap_file_pages(addr, len, 0,
pgoff, flags & MAP_NONBLOCK);
down_write(&mm->mmap_sem);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 03/18] remap_file_pages protection support: make mprotect skip pagetables on nonuniform

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

There is IMHO no reason to support using mprotect on non-uniform VMAs. The
only exception is to change the VMA's default protection (which is used for
non-individually remapped pages), but it must still ignore the page tables, as
done in this patch.

The only unsatisfied need is if I want to change protections without changing
the indexes, which with remap_file_pages you must do one page at a time and
re-specifying the indexes.

It is more reasonable to allow remap_file_pages to change protections on a PTE
range without changing the offsets. I've not implemented this, but if wanted I
can. For sure, UML doesn't currently need this interface.

However, for now I've implemented only this change to mprotect(), I'd like to
get some feedback about this choice.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/mprotect.c |3 +++
 1 files changed, 3 insertions(+)

diff -puN mm/mprotect.c~rfp-mprotect-skip-pagetables-on-nonuniform mm/mprotect.c
--- linux-2.6.git/mm/mprotect.c~rfp-mprotect-skip-pagetables-on-nonuniform  
2005-08-17 13:36:43.0 +0200
+++ linux-2.6.git-paolo/mm/mprotect.c   2005-08-17 13:36:43.0 +0200
@@ -86,6 +86,9 @@ static void change_protection(struct vm_
unsigned long start = addr;
 
BUG_ON(addr >= end);
+   if (vma->vm_flags & VM_NONUNIFORM)
+   return;
+
pgd = pgd_offset(mm, addr);
flush_cache_range(vma, addr, end);
spin_lock(&mm->page_table_lock);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 14/18] remap_file_pages protection support: avoid lookup and I/O of pages for PROT_NONE remapping

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>, Ingo Molnar <[EMAIL 
PROTECTED]>

This optimization avoid looking up pages for PROT_NONE mappings, and instead
simply clear the page tables. This code was taken straight from Ingo's patch.

However, this code is only correct if we disallow having VMA with protections
!= PROT_NONE; on a VM_READ vma, for instance, the cleared PTE would be faulted
in again, which is wrong.

We could (in order of preference)
*) use the subsequent patch; you may feel it's a bit intrusive, but it's not
too much, and may further be simplified.
*) drop this optimization
*) additionally check that the VMA is PROT_NONE in this optimization (but we
would have to disallow mprotect() on the VMA or specify the mappings get in an
"unspecified" state)

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/filemap.c |   14 ++
 linux-2.6.git-paolo/mm/shmem.c   |9 +
 2 files changed, 23 insertions(+)

diff -puN mm/filemap.c~rfp-avoid-lookup-miss-mapping-base mm/filemap.c
--- linux-2.6.git/mm/filemap.c~rfp-avoid-lookup-miss-mapping-base   
2005-08-25 13:01:32.0 +0200
+++ linux-2.6.git-paolo/mm/filemap.c2005-08-25 13:01:32.0 +0200
@@ -1495,6 +1495,20 @@ int filemap_populate(struct vm_area_stru
struct page *page;
int err;
 
+   /*
+* mapping-removal fastpath:
+*/
+   if ((vma->vm_flags & VM_SHARED) &&
+   (pgprot_val(prot) == pgprot_val(__S000))) {
+   /* Still do error-checking! */
+   size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> 
PAGE_CACHE_SHIFT;
+   if (pgoff + (len >> PAGE_CACHE_SHIFT) > size)
+   return -EINVAL;
+
+   zap_page_range(vma, addr, len, NULL);
+   return 0;
+   }
+
if (!nonblock)
force_page_cache_readahead(mapping, vma->vm_file,
pgoff, len >> PAGE_CACHE_SHIFT);
diff -puN mm/shmem.c~rfp-avoid-lookup-miss-mapping-base mm/shmem.c
--- linux-2.6.git/mm/shmem.c~rfp-avoid-lookup-miss-mapping-base 2005-08-25 
13:01:32.0 +0200
+++ linux-2.6.git-paolo/mm/shmem.c  2005-08-25 13:01:32.0 +0200
@@ -1186,6 +1186,15 @@ static int shmem_populate(struct vm_area
if (pgoff >= size || pgoff + (len >> PAGE_SHIFT) > size)
return -EINVAL;
 
+   /*
+* mapping-removal fastpath:
+*/
+   if ((vma->vm_flags & VM_SHARED) &&
+   (pgprot_val(prot) == pgprot_val(__S000))) {
+   zap_page_range(vma, addr, len, NULL);
+   return 0;
+   }
+
while ((long) len > 0) {
struct page *page = NULL;
int err;
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 02/18] remap_file_pages protection support: handle nonuniform VMAs

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

Handle the possible existance of VM_NONUNIFORM vmas, without actually creating
them.

* Replace old uses of pgoff_to_pte with pgoff_prot_to_pte.
* Introduce the flag, use it to read permissions from the PTE rather than from
  the VMA flags.
* Replace the linear_page_index() check with save_nonlinear_pte(), which
  encapsulates the check.

Below there is a long explaination of why I've added VM_NONUNIFORM, rather
than simply overload VM_NONLINEAR.

However, this patch assumes that VM_NONUNIFORM vmas are also marked as
nonlinear. Otherwise other changes are needed too.

*** remap_file_pages protection support: add VM_NONUNIFORM to fix existing 
usage of mprotect()

Distinguish between "normal" VMA and VMA with non-uniform protection, by
adding the VM_NONUNIFORM flag. This is needed for various reasons:

* notify the arch fault handlers that they must not check VMA protection for
  giving SIGSEGV 
* fixing regression of mprotect() on !VM_NONUNIFORM mappings (see below)
* (in next patches) giving a sensible behaviour to mprotect on VM_NONUNIFORM
  mappings
* (TODO?) avoid regression in max file offset with r_f_p() for older mappings;
  we could use either the old offset encoding or the new offset-prot encoding
  depending on this flag.
  It's trivial to do, just I don't know whether existing apps will overflow
  the new limits. They go down from 2Tb to 1Tb on i386 and 512G on PPC, and
  from 256G to 128G on S390/31 bits. Give me a call in case.

In fact, without this flag, we'd have indeed a regression with
remap_file_pages VS mprotect.

mprotect alters the VMA prots and walks each present PTE, ignoring installed
ones; their saved prots will be restored on faults, ignoring VMA ones and
losing the mprotect() on them. So, we must restore VMA prots when the VMA is
uniform, as we used to do.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/include/linux/mm.h  |7 +++
 linux-2.6.git-paolo/include/linux/pagemap.h |   21 +
 linux-2.6.git-paolo/mm/fremap.c |8 
 linux-2.6.git-paolo/mm/memory.c |   14 --
 linux-2.6.git-paolo/mm/rmap.c   |3 +--
 5 files changed, 41 insertions(+), 12 deletions(-)

diff -puN include/linux/mm.h~rfp-add-VM_NONUNIF include/linux/mm.h
--- linux-2.6.git/include/linux/mm.h~rfp-add-VM_NONUNIF 2005-08-24 
13:27:38.0 +0200
+++ linux-2.6.git-paolo/include/linux/mm.h  2005-08-24 13:27:39.0 
+0200
@@ -160,7 +160,14 @@ extern unsigned int kobjsize(const void 
 #define VM_ACCOUNT 0x0010  /* Is a VM accounted object */
 #define VM_HUGETLB 0x0040  /* Huge TLB Page VM */
 #define VM_NONLINEAR   0x0080  /* Is non-linear (remap_file_pages) */
+
+#ifndef CONFIG_MMU
 #define VM_MAPPED_COPY 0x0100  /* T if mapped copy of data (nommu 
mmap) */
+#else
+#define VM_NONUNIFORM  0x0100  /* The VM individual pages have
+  different protections
+  (remap_file_pages)*/
+#endif
 
 #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
diff -puN include/linux/pagemap.h~rfp-add-VM_NONUNIF include/linux/pagemap.h
--- linux-2.6.git/include/linux/pagemap.h~rfp-add-VM_NONUNIF2005-08-24 
13:27:38.0 +0200
+++ linux-2.6.git-paolo/include/linux/pagemap.h 2005-08-24 13:27:39.0 
+0200
@@ -159,6 +159,27 @@ static inline pgoff_t linear_page_index(
return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 }
 
+/***
+ * Checks if the PTE is nonlinear, and if yes sets it.
+ * @vma: the VMA in which @addr is; we don't check if it's VM_NONLINEAR, just
+ * if this PTE is nonlinear.
+ * @addr: the addr which @pte refers to.
+ * @pte: the old PTE value (to read its protections.
+ * @ptep: the PTE pointer (for setting it).
+ * @mm: passed to set_pte_at.
+ * @page: the page which was installed (to read its ->index, i.e. the old
+ * offset inside the file.
+ */
+static inline void save_nonlinear_pte(pte_t pte, pte_t * ptep, struct
+   vm_area_struct *vma, struct mm_struct *mm, struct page* page,
+   unsigned long addr)
+{
+   pgprot_t pgprot = pte_to_pgprot(pte);
+   if (linear_page_index(vma, addr) != page->index || 
+   pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot))
+   set_pte_at(mm, addr, ptep, pgoff_prot_to_pte(page->index, 
pgprot));
+}
+
 extern void FASTCALL(__lock_page(struct page *page));
 extern void FASTCALL(unlock_page(struct page *page));
 
diff -puN mm/fremap.c~rfp-add-VM_NONUNIF mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-add-VM_NONUNIF2005-08-24 
13:27:38.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-24 13:27:39.

[patch 04/18] remap_file_pages protection support: cleanup syscall checks

2005-08-26 Thread blaisorblade

From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>

This patch reorganizes the code only, without differences in behaviour. It
makes the code more readable on its own, and is needed for next patches. I've
split this out to avoid cluttering real patches.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
---

 linux-2.6.git-paolo/mm/fremap.c |   38 --
 1 files changed, 24 insertions(+), 14 deletions(-)

diff -puN mm/fremap.c~rfp-cleanup-sc-check mm/fremap.c
--- linux-2.6.git/mm/fremap.c~rfp-cleanup-sc-check  2005-08-24 
10:45:48.0 +0200
+++ linux-2.6.git-paolo/mm/fremap.c 2005-08-24 11:21:13.0 +0200
@@ -178,7 +178,7 @@ err_unlock:
  * future.
  */
 asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
-   unsigned long __prot, unsigned long pgoff, unsigned long flags)
+   unsigned long prot, unsigned long pgoff, unsigned long flags)
 {
struct mm_struct *mm = current->mm;
struct address_space *mapping;
@@ -186,9 +186,10 @@ asmlinkage long sys_remap_file_pages(uns
struct vm_area_struct *vma;
int err = -EINVAL;
int has_write_lock = 0;
+   pgprot_t pgprot;
 
-   if (__prot)
-   return err;
+   if (prot)
+   goto out;
/*
 * Sanitize the syscall parameters:
 */
@@ -197,7 +198,7 @@ asmlinkage long sys_remap_file_pages(uns
 
/* Does the address range wrap, or is the span zero-sized? */
if (start + size <= start)
-   return err;
+   goto out;
 
/* Can we represent this offset inside this architecture's pte's? */
 #if PTE_FILE_MAX_BITS < BITS_PER_LONG
@@ -207,7 +208,7 @@ asmlinkage long sys_remap_file_pages(uns
 
/* We need down_write() to change vma->vm_flags. */
down_read(&mm->mmap_sem);
- retry:
+retry:
vma = find_vma(mm, start);
 
/*
@@ -217,13 +218,20 @@ asmlinkage long sys_remap_file_pages(uns
 * swapout cursor in a VM_NONLINEAR vma (unless VM_RESERVED
 * or VM_LOCKED, but VM_LOCKED could be revoked later on).
 */
-   if (vma && (vma->vm_flags & VM_SHARED) &&
-   (!vma->vm_private_data ||
-   (vma->vm_flags & (VM_NONLINEAR|VM_RESERVED))) &&
-   vma->vm_ops && vma->vm_ops->populate &&
-   end > start && start >= vma->vm_start &&
-   end <= vma->vm_end) {
+   if (!vma)
+   goto out_unlock;
+
+   if (!(vma->vm_flags & VM_SHARED))
+   goto out_unlock;
 
+   if (!vma->vm_ops || !vma->vm_ops->populate || end <= start || start <
+   vma->vm_start || end > vma->vm_end)
+   goto out_unlock;
+
+   pgprot = vma->vm_page_prot;
+
+   if (!vma->vm_private_data ||
+   (vma->vm_flags & (VM_NONLINEAR|VM_RESERVED))) {
/* Must set VM_NONLINEAR before any pages are populated. */
if (pgoff != linear_page_index(vma, start) &&
!(vma->vm_flags & VM_NONLINEAR)) {
@@ -249,16 +257,18 @@ asmlinkage long sys_remap_file_pages(uns
downgrade_write(&mm->mmap_sem);
has_write_lock = 0;
}
-   err = vma->vm_ops->populate(vma, start, size,
-   vma->vm_page_prot,
-   pgoff, flags & MAP_NONBLOCK);
+   err = vma->vm_ops->populate(vma, start, size, pgprot, pgoff,
+   flags & MAP_NONBLOCK);
 
}
+
+out_unlock:
if (likely(!has_write_lock))
up_read(&mm->mmap_sem);
else
up_write(&mm->mmap_sem);
 
+out:
return err;
 }
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >