date:20070314

Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access

2007-03-14 Thread Grant Grundler

On Thu, Mar 15, 2007 at 11:37:20AM +0900, Tejun Heo wrote:
...
> Also, the current implementation doesn't have any arch independent part. 

I thnk you meant "arch dependent" here.

>  It's wholly contained in arch independent PCI layer, but it might be 
> beneficial to have arch dependent hooks (IRQ line enable/disable?) in 
> the future.
> 
> >What if the device with the IRQ problem is never loaded? Sometimes
> >devices aren't loaded until after boot.
> 
> What do you mean by loading a device?  Do you mean loading driver for 
> the device?

Yes, I think that's what he meant.

> >Any change like this has to be done without changing device drivers.
> >Changing the skge/sky2 drivers as special case is not acceptable.

I don't like the idead of changing the driver API for PCI device setup.
But if it's necessary to solve this class of problem, I think it's ok.

> I dunno about that.  What I'm proposing is alternative two-step PCI 
> initialization step - the first step enables the device just enough for 
> initialization/reset and the second one enables full access.  We're 
> doing part of it already for bus master.  I'm proposing to expand that 
> approach and make them handled by generic PCI layer.  As you can see, it 
> doesn't add noticeable complexity to drivers.  I think it's even clearer 
> than doing pci_set_master() explicitly.

Please update Documentation/pci.txt to reflect the API changes too.

> If this way of solving the problem is chosen, eventually most drivers 
> should be converted to new initialization steps.  And there is no way to 
> do this without modifying low level driver.  Only low level driver knows 
> when full blown access can be enabled and such thing must happen before 
> registering the device to upper layer (e.g. ATA/SCSI, netif).

Agreed. ISTR this has been discussed before but don't recall
the exact context. I'll try to find the previous thread.

When I started the parisc port on 2.4 kernels, the policy was to
leave all interrupts enabled even if no interrupt handler was registered.
This is useful for debugging misconfigured IRQ routing.
Did the policy already change or is this a proposal to change the policy?

thanks,
grant

> sky2/skge aren't exceptions.  If this way of solving the problem is 
> chosen, eventually most if not all drivers should be converted to new 
> model.  It may take two years, maybe five, but as a start just 
> converting ATA and network drivers shouldn't take too long and that 
> would help a lot of cases.
> 
> Thanks.
> 
> -- 
> tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH take3 16/20] acpi files switched

2007-03-14 Thread Len Brown

On Thursday 15 March 2007 01:13, Steven Rostedt wrote:
> Moved the shared files that were in arch/i386/kernel/acpi to the common
> area.

When I do a "make cscope" on an i386 or an x86_64 box,
will it find these files in the common area?

thanks
-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OT] Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires

2007-03-14 Thread Gene Heskett

On Thursday 15 March 2007, Willy Tarreau wrote:

[...]

>with "/bin/tar -f - >/tmp/test/", you ask bash to open the file
> "/tmp/test/" for write, then start tar and pass this file as its
> stdout. Obviously this is wrong. I think that what you're trying to do
> is send extracted files to /tmp/test, which is what '-C' is for. Also,
> you need to specify a command for tar. You didn't. I bet if you do the
> following, it will work :
>
>[EMAIL PROTECTED] data]# dd if=00010.coyote._lib.1 bs=32k skip=1 |
>/bin/gzip -dc |  /bin/tar -C /tmp/test/ -xf -
>
>Now, Gene, this is becoming totally off-topic right here.

My apologies, I've been corrected, thanks for your patience.  And I'll see 
if I can get that text in the amanda file headers amended too.

>Regards,
>Willy



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
It takes less time to do a thing right than it does to explain why you
did it wrong.
-- H.W. Longfellow
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires

2007-03-14 Thread Gene Heskett

On Thursday 15 March 2007, Ray Lee wrote:
>Gene Heskett wrote:
>> Here is an example
>> [EMAIL PROTECTED] data]# dd if=00010.coyote._lib.1 bs=32k count=1
>> AMANDA: FILE 20070314104344 coyote /lib  lev 1 comp .gz program
>> /bin/tar To restore, position tape at start of file and run:
>>  dd if= bs=32k skip=1 |  /bin/gzip -dc |  /bin/tar -f - ...
>>
>> And the elipsis is an error if not removed.  Then one is supposed to
>> be able to redirect tars output with the usual >/tmp/test/ syntax
>>
>> So:
>> [EMAIL PROTECTED] data]# dd if=00010.coyote._lib.1 bs=32k
>> skip=1 |  /bin/gzip -dc |  /bin/tar -f - >/tmp/test/
>> -bash: /tmp/test/: Is a directory
>>
>> which is the return from any variation in how the redirect is done.
>>
>> So what is it that am I doing wrong in the above command line?, so I
>> can add it to my helper scripts to be published eventually on
>> zmanda.org.
>
>One of us is confused, and it may very well be me, but...
>
>the /bin/tar -f - >/tmp/test/ looks to me like it should fail exactly as
>bash says it does. the output redirect (>) will only write out to a
>file, not a directory. (So, /tmp/file should work, /tmp/file/ won't.)
>
>Are you trying to redirect where the files get restored? That should be
>done with a cd before doing the uncompress.
>
>Or am I misunderstanding what you're telling me?
>
>Ray

No, apparently its me that's been running with a fubar'd understanding.
I was certain that tar (or bash) should have been able to put the 
recovered files IN the directory /tmp/test but that turns out to need 
more options after the '-f -' section of that sample line I posted.

Thanks.  A bunch..

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Mal: "You are very much lacking in imagination."

Zoe: "I imagine that's so, sir."
--Episode #8, "Out of Gas"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] BLK_DEV_IDE_CELLEB dependency fix

2007-03-14 Thread Akira Iguchi

It's bool and it depends on BLK_DEV_IDE
 => should depend on BLK_DEV_IDE=y

And move it to "if BLK_DEV_IDEDMA_PCI" block because it depends on 
BLK_DEV_IDEDMA_PCI.

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
Signed-off-by: Kou Ishizaki <[EMAIL PROTECTED]>
Signed-off-by: Akira Iguchi <[EMAIL PROTECTED]>
---

diff -Nrpu -X linux-2.6.21-rc3/Documentation/dontdiff 
linux-2.6.21-rc3/drivers/ide/Kconfig linux-2.6.21-rc3.mod/drivers/ide/Kconfig
--- linux-2.6.21-rc3/drivers/ide/Kconfig2007-03-07 13:41:20.0 
+0900
+++ linux-2.6.21-rc3.mod/drivers/ide/Kconfig2007-03-15 23:49:33.0 
+0900
@@ -769,6 +769,14 @@ config BLK_DEV_TC86C001
help
This driver adds support for Toshiba TC86C001 GOKU-S chip.
 
+config BLK_DEV_IDE_CELLEB
+   bool "Toshiba's Cell Reference Set IDE support"
+   depends on PPC_CELLEB && BLK_DEV_IDE=y
+   help
+ This driver provides support for the built-in IDE controller on
+ Toshiba Cell Reference Board.
+ If unsure, say Y.
+
 endif
 
 config BLK_DEV_IDE_PMAC
@@ -800,14 +808,6 @@ config BLK_DEV_IDEDMA_PMAC
  to transfer data to and from memory.  Saying Y is safe and improves
  performance.
 
-config BLK_DEV_IDE_CELLEB
-   bool "Toshiba's Cell Reference Set IDE support"
-   depends on PPC_CELLEB
-   help
- This driver provides support for the built-in IDE controller on
- Toshiba Cell Reference Board.
- If unsure, say Y.
-
 config BLK_DEV_IDE_SWARM
tristate "IDE for Sibyte evaluation boards"
depends on SIBYTE_SB1xxx_SOC

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/5] fs: introduce new aops and infrastructure

2007-03-14 Thread Joel Becker

On Thu, Mar 15, 2007 at 05:36:42AM +0100, Nick Piggin wrote:
> On Wed, Mar 14, 2007 at 09:13:29PM -0700, Mark Fasheh wrote:
> > Are we going to get rid of the file and intr arguments btw? I'm not sure
> > intr is useful, and mapping is probably enough to get whatever we inside
> > ->write_begin / ->write_end.
> 
> Yeah, I was going to, but I had this version ready to go so decided
> to leave them in at the last minute. We can definitely take them out
> if people agree.

You're really going to need the file argument around.  Some
folks care about file->private_data, etc.  A good example is
nfs_updatepage() from nfs_commit_write().  There's a context on the
filp.  Mapping can get back to the inode via ->host, but not to the
struct file.

Joel

-- 

Life's Little Instruction Book #157 

"Take time to smell the roses."

Joel Becker
Principal Software Developer
Oracle
E-mail: [EMAIL PROTECTED]
Phone: (650) 506-8127
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/5] fs: introduce new aops and infrastructure

2007-03-14 Thread Mark Fasheh

On Thu, Mar 15, 2007 at 05:36:42AM +0100, Nick Piggin wrote:
> > Are we going to get rid of the file and intr arguments btw? I'm not sure
> > intr is useful, and mapping is probably enough to get whatever we inside
> > ->write_begin / ->write_end.
> 
> Yeah, I was going to, but I had this version ready to go so decided
> to leave them in at the last minute. We can definitely take them out
> if people agree.
> 
> However a side note about intr -- I wonder if it might be wise to
> include a flags argument, in case we might want to add something like
> that later? (definitely if we do keep intr, then it should be done as
> a flag rather than its own int).

I don't see a problem with having a flags argument. It could give us some
flexibility in the future which would otherwise require a much bigger
update. If we found out that we needed intr, it could just be a flag.


> > One interesting side effect is that we no longer pass AOP_TRUNCATE_PAGE up a
> > level. This gives callers less to deal with. And it means that ocfs2 doesn't
> > have to use the ocfs2_*_lock_with_page() cluster lock variants in
> > ocfs2_block_write_begin() because it can order cluster locks outside of the
> > page lock there.
> 
> OK that's very cool. I was hoping that would be the case. If GFS2 can
> avoid that too, then we might be able to get rid of AOP_TRUNCATE_PAGE
> handling from the legacy prepare/commit_write paths, which will make
> them simpler.

Yeah - so long as we're not taking a page fault between write_begin /
write_end, there's no reason for the cluster locks to be taken and dropped
within the individual callbacks, which means we can just take them in
write_begin (where page lock ordering is possible) and hold them until
write_end is called.


> OK, well I'll add this to my queue for now, and post the full patchset
> after incorporating feedback I've had so far, and doing more testing,
> so people can actually apply them and boot kernels.

Great, thanks - it just occured to me that I should be holding the clusters
locks across the entire copy (as I point out above), so I'll have a slightly
updated version of this patch for you soon :)
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/13] BLK_DEV_IDE_CELLEB dependency fix

2007-03-14 Thread Akira Iguchi

Al wrote:
>
>Eh...  You still need dependency on IDE=y; otherwise you'll get configs
>with IDE=m, BLK_DEV_IDE_CELLEB=y and those won't link.  BLK_DEV_IDEDMA_PCI
>is selectable just fine with IDE=m.
>
>It's the same problem as with ps3 fb.
>

I'm sorry I missed this case.
Using some configurations, I found BLK_DEV_IDE=y was better.
(I failed to link when IDE=y and BLK_DEV_IDE=m.)


Best regards,
Akira Iguchi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH REPOST] No need to use -traditional for processing asm in arch/i386/

2007-03-14 Thread Jeremy Fitzhardinge

No need to use -traditional for processing asm in arch/i386/

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 arch/i386/boot/Makefile|4 ++--
 arch/i386/boot/compressed/Makefile |1 -
 arch/i386/kernel/Makefile  |2 --
 arch/i386/kernel/entry.S   |2 +-
 include/asm-i386/percpu.h  |4 ++--
 5 files changed, 5 insertions(+), 8 deletions(-)

===
--- a/arch/i386/boot/Makefile
+++ b/arch/i386/boot/Makefile
@@ -36,9 +36,9 @@ HOSTCFLAGS_build.o := $(LINUXINCLUDE)
 # ---
 
 $(obj)/zImage:  IMAGE_OFFSET := 0x1000
-$(obj)/zImage:  EXTRA_AFLAGS := -traditional $(SVGA_MODE) $(RAMDISK)
+$(obj)/zImage:  EXTRA_AFLAGS := $(SVGA_MODE) $(RAMDISK)
 $(obj)/bzImage: IMAGE_OFFSET := 0x10
-$(obj)/bzImage: EXTRA_AFLAGS := -traditional $(SVGA_MODE) $(RAMDISK) 
-D__BIG_KERNEL__
+$(obj)/bzImage: EXTRA_AFLAGS := $(SVGA_MODE) $(RAMDISK) -D__BIG_KERNEL__
 $(obj)/bzImage: BUILDFLAGS   := -b
 
 quiet_cmd_image = BUILD   $@
===
--- a/arch/i386/boot/compressed/Makefile
+++ b/arch/i386/boot/compressed/Makefile
@@ -6,7 +6,6 @@
 
 targets:= vmlinux vmlinux.bin vmlinux.bin.gz head.o misc.o 
piggy.o \
vmlinux.bin.all vmlinux.relocs
-EXTRA_AFLAGS   := -traditional
 
 LDFLAGS_vmlinux := -T
 CFLAGS_misc.o += -fPIC
===
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -44,8 +44,6 @@ obj-$(CONFIG_PARAVIRT)+= paravirt.o
 obj-$(CONFIG_PARAVIRT) += paravirt.o
 obj-y  += pcspeaker.o
 
-EXTRA_AFLAGS   := -traditional
-
 obj-$(CONFIG_SCx200)   += scx200.o
 
 # vsyscall.o contains the vsyscall DSO images as __initdata.
===
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -635,7 +635,7 @@ ENTRY(name) \
SAVE_ALL;   \
TRACE_IRQS_OFF  \
movl %esp,%eax; \
-   call smp_/**/name;  \
+   call smp_##name;\
jmp ret_from_intr;  \
CFI_ENDPROC;\
 ENDPROC(name)
===
--- a/include/asm-i386/percpu.h
+++ b/include/asm-i386/percpu.h
@@ -20,10 +20,10 @@
 #ifdef CONFIG_SMP
 #define PER_CPU(var, cpu) \
movl __per_cpu_offset(,cpu,4), cpu; \
-   addl $per_cpu__/**/var, cpu;
+   addl $per_cpu__##var, cpu;
 #else /* ! SMP */
 #define PER_CPU(var, cpu) \
-   movl $per_cpu__/**/var, cpu;
+   movl $per_cpu__##var, cpu;
 #endif /* SMP */
 
 #endif /* !__ASSEMBLY__ */


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc3-mm1

2007-03-14 Thread Mariusz Kozlowski

Hello, 

> > Today after +- 24h of uptime I found some more page allocation
> > failures ('eth1: Can't allocate skb for Rx'). You'll find more here:
> > 
> > http://tuxland.pl/misc/2.6.21-rc3-mm1-page-allocation-failure.txt
> > 
> > System wasn't doing anything unusual, as usual ;-) X, some p2p 
> > software, firefox+flash playing music.
> > 
> 
> Do other kernels do this, or is 2.6.21-rc3-mm1 worse?

I've never seen page allocation failures before 2.6.21-rc3-mm1 (first
khubd with the mouse thing now this).
 
> It is of course a non-fatal problem and will inevitably happen sometimes,
> but we would like the VM to be able to minimise the occurrence of this
> problem.

True. System runs as nothing happened. It just pops out from time to time.

> I think we were rather hoping that Mel's anti-fragmentation work would
> improve things.

Thanks,

Mariusz Kozlowski
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Clean up ELF note generation

2007-03-14 Thread Jeremy Fitzhardinge

Three cleanups:

1: ELF notes are never mapped, so there's no need to have any access
flags in their phdr.

2: When generating them from asm, tell the assembler to use a SHT_NOTE
section type.  There doesn't seem to be a way to do this from C.

3: Use ANSI rather than traditional cpp behaviour to stringify the
macro argument.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Eric W. Biederman <[EMAIL PROTECTED]>

---
 arch/i386/kernel/vmlinux.lds.S|2 +-
 include/asm-generic/vmlinux.lds.h |2 +-
 include/linux/elfnote.h   |4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

===
--- a/arch/i386/kernel/vmlinux.lds.S
+++ b/arch/i386/kernel/vmlinux.lds.S
@@ -34,7 +34,7 @@ PHDRS {
 PHDRS {
text PT_LOAD FLAGS(5);  /* R_E */
data PT_LOAD FLAGS(7);  /* RWE */
-   note PT_NOTE FLAGS(4);  /* R__ */
+   note PT_NOTE FLAGS(0);  /* ___ */
 }
 SECTIONS
 {
===
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -208,7 +208,7 @@
}
 
 #define NOTES  \
-   .notes : { *(.note.*) } :note
+   .notes : { *(.note.*) } :note
 
 #define INITCALLS  \
*(.initcall0.init)  \
===
--- a/include/linux/elfnote.h
+++ b/include/linux/elfnote.h
@@ -39,12 +39,12 @@
  *  ELFNOTE(XYZCo, 12, .long, 0xdeadbeef)
  */
 #define ELFNOTE(name, type, desctype, descdata)\
-.pushsection .note.name;   \
+.pushsection .note.name, "",@note  ;   \
   .align 4 ;   \
   .long 2f - 1f/* namesz */;   \
   .long 4f - 3f/* descsz */;   \
   .long type   ;   \
-1:.asciz "name";   \
+1:.asciz #name ;   \
 2:.align 4 ;   \
 3:desctype descdata;   \
 4:.align 4 ;   \


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/13] BLK_DEV_IDE_CELLEB dependency fix

2007-03-14 Thread Akira Iguchi

Al wrote:
>
>It's bool and it depends on IDE => should depend on IDE=y
>
>Signed-off-by: Al Viro <[EMAIL PROTECTED]>

Move to "if BLK_DEV_IDEDMA_PCI" block because it depends on 
BLK_DEV_IDEDMA_PCI.

Signed-off-by: Kou Ishizaki <[EMAIL PROTECTED]>
Signed-off-by: Akira Iguchi <[EMAIL PROTECTED]>
---

diff -Nrpu -X linux-2.6.21-rc3/Documentation/dontdiff 
linux-2.6.21-rc3/drivers/ide/Kconfig linux-2.6.21-rc3.mod/drivers/ide/Kconfig
--- linux-2.6.21-rc3/drivers/ide/Kconfig2007-03-07 13:41:20.0 
+0900
+++ linux-2.6.21-rc3.mod/drivers/ide/Kconfig2007-03-15 22:47:14.0 
+0900
@@ -769,6 +769,14 @@ config BLK_DEV_TC86C001
help
This driver adds support for Toshiba TC86C001 GOKU-S chip.
 
+config BLK_DEV_IDE_CELLEB
+   bool "Toshiba's Cell Reference Set IDE support"
+   depends on PPC_CELLEB
+   help
+ This driver provides support for the built-in IDE controller on
+ Toshiba Cell Reference Board.
+ If unsure, say Y.
+
 endif
 
 config BLK_DEV_IDE_PMAC
@@ -800,14 +808,6 @@ config BLK_DEV_IDEDMA_PMAC
  to transfer data to and from memory.  Saying Y is safe and improves
  performance.
 
-config BLK_DEV_IDE_CELLEB
-   bool "Toshiba's Cell Reference Set IDE support"
-   depends on PPC_CELLEB
-   help
- This driver provides support for the built-in IDE controller on
- Toshiba Cell Reference Board.
- If unsure, say Y.
-
 config BLK_DEV_IDE_SWARM
tristate "IDE for Sibyte evaluation boards"
depends on SIBYTE_SB1xxx_SOC

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access

2007-03-14 Thread Tejun Heo

[cc'ing Andi, Hi!]

Hello,

Russell King wrote:
> On Wed, Mar 14, 2007 at 06:34:11PM -0400, Jeff Garzik wrote:
>> Russell King wrote:
>>> pci_enable_device() doesn't deal with this; in most PCI setups I've
>>> seen, there is no control at PCI level over whether a device generates
>>> an interrupt on the bus.  Certainly the memory and io command enables
>> PCI grew an interrupt enable while you weren't looking: 
>> PCI_COMMAND_INTX_DISABLE
> 
> That's fine for devices which conform to the later PCI specs, but not
> all do.
> 
>> It was added in PCI 2.3 I think.
> 
> Correct.
> 
>> Older PCI devices certainly do not have this standardized bit.
> 
> No PCI device that I have has that bit - including the raid card I
> bought last year...

Many recent ATA and network controllers do and most new ones will
probably do.

> In any case, relying on such a new control bit to implement this kind
> of functionality would result in a very hit and miss result; Linux
> tends to get used on things other than the bleeding edge of hardware
> technology.

I don't think INTX_DISABLE is on the bleeding edge of hardware
technology and many common cases will benefit from using it (just think
about the number of newish notebook users).  The problem with
INTX_DISABLE is that there doesn't seem to be any way to tell whether
writing to that bit is safe or not.

You are right in that turning off IRQ mechanisms in pci_enable_device()
doesn't fix all the problems as PCI-wise it only enables IO and memory
address space access, but to some extent it does because in the arch
code, it enables the IRQ line and the physical IRQ line might not be
shared even if the final IRQ number is shared (Andi, am I correct)?

Anyways, I think the proper solution is to make sure all generic IRQ
controls including INTX turned off early in the boot during PCI
subsystem initialization (ie. do the disable part of
pcim_prepare_device() early in the boot before any IRQ line is
requested) and let each driver enable after initialization as necessary
and do similar things during resume.  Note that drivers still need to be
modified to signify when the device is initialized enough to enable IRQ,
and bus mastering.

We can also arch-dep IRQ enabling to the activation time.  That will
give us more protection even when INTX_DISABLE is not available.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps

2007-03-14 Thread Vivek Goyal

On Thu, Mar 15, 2007 at 02:07:56PM +0900, Horms wrote:
> On Thu, Mar 15, 2007 at 10:25:36AM +0530, Vivek Goyal wrote:
> > On Thu, Mar 15, 2007 at 10:46:38AM +0900, Horms wrote:
> > > On Wed, Mar 14, 2007 at 05:00:09PM +, Ian Campbell wrote:
> > > > The specific case I am encountering is kdump under Xen with a 64 bit
> > > > hypervisor and 32 bit kernel/userspace. The dump created is a 64 bit due
> > > > to the hypervisor but the dump kernel is 32 bit to match the domain 0
> > > > kernel.
> > > > 
> > > > It's possibly less likely to be useful in a purely native scenario but I
> > > > see no reason to disallow it.
> > > 
> > > For native Linux, would this cover the case where the pre-crash kernel
> > > is 64bit and the crashdump (post-crash) kernel is 32bit?
> > > 
> > 
> > I think so. Though I have never tried this.
> > 
> > > > Signed-off-by: Ian Campbell <[EMAIL PROTECTED]>
> > > > 
> > > > --- pristine-linux-2.6.18/include/asm-i386/elf.h2006-09-20 
> > > > 04:42:06.0 +0100
> > > > +++ linux-2.6.18-xen/include/asm-i386/elf.h 2007-03-14 
> > > > 16:42:30.0 +
> > > > @@ -36,7 +36,7 @@
> > > >   * This is used to ensure we don't load something for the wrong 
> > > > architecture.
> > > >   */
> > > >  #define elf_check_arch(x) \
> > > > -   (((x)->e_machine == EM_386) || ((x)->e_machine == EM_486))
> > > > +   (((x)->e_machine == EM_386) || ((x)->e_machine == EM_486) || 
> > > > ((x)->e_machine == EM_X86_64))
> > 
> > But I think changing this macro might run into issues. It is being used at
> > few places in kernel, for example while loading module. This will 
> > essentially
> > mean that we allow loading 64bit x86_64 modules on 32bit i386 systems?
> > 
> > Similarly, load_elf_interp() is using it, again will we allow loading a 
> > interp written for X86_64 on a 32bit i386 machine?
> > 
> > Should we create a separate macro something like elf_check_allowed_arch(),
> > to take care of such corner cases?
> 
> That sounds reasonable to me. Though perhaps it could just be
> kexec_elf_check_arch() for now, as I don't think there are any
> other consumers of it.

Kexec will also not allow loading an x86_64 kernel on a 32bit machine.
So how about something like vmcore_elf_allowed_cross_arch()? Vmcore code
can continue to check elf_check_arch() and if that fails it can invoke
vmcore_elf_allowed_cross_arch() to find out what cross arch are allowed
for vmcore.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 4/7] RSS accounting hooks over the code

2007-03-14 Thread Balbir Singh


Nick Piggin wrote:

Kirill Korotaev wrote:


The approaches I have seen that don't have a struct page pointer, do
intrusive things like try to put hooks everywhere throughout the kernel
where a userspace task can cause an allocation (and of course end up
missing many, so they aren't secure anyway)... and basically just
nasty stuff that will never get merged.



User beancounters patch has got through all these...
The approach where each charged object has a pointer to the owner 
container,

who has charged it - is the most easy/clean way to handle
all the problems with dynamic context change, races, etc.
and 1 pointer in page struct is just 0.1% overehad.


The pointer in struct page approach is a decent one, which I have
liked since this whole container effort came up. IIRC Linus and Alan
also thought that was a reasonable way to go.

I haven't reviewed the rest of the beancounters patch since looking
at it quite a few months ago... I probably don't have time for a
good review at the moment, but I should eventually.



This patch is not really beancounters.

1. It uses the containers framework
2. It is similar to my RSS controller (http://lkml.org/lkml/2007/2/26/8)

I would say that beancounters are changing and evolving.


Struct page overhead really isn't bad. Sure, nobody who doesn't use
containers will want to turn it on, but unless you're using a big PAE
system you're actually unlikely to notice.



big PAE doesn't make any difference IMHO
(until struct pages are not created for non-present physical memory 
areas)


The issue is just that struct pages use low memory, which is a really
scarce commodity on PAE. One more pointer in the struct page means
64MB less lowmem.

But PAE is crap anyway. We've already made enough concessions in the
kernel to support it. I agree: struct page overhead is not really
significant. The benefits of simplicity seems to outweigh the downside.


But again, I'll say the node-container approach of course does avoid
this nicely (because we already can get the node from the page). So
definitely that approach needs to be discredited before going with this
one.



But it lacks some other features:
1. page can't be shared easily with another container


I think they could be shared. You allocate _new_ pages from your own
node, but you can definitely use existing pages allocated to other
nodes.


2. shared page can't be accounted honestly to containers
   as fraction=PAGE_SIZE/containers-using-it


Yes there would be some accounting differences. I think it is hard
to say exactly what containers are "using" what page anyway, though.
What do you say about unmapped pages? Kernel allocations? etc.


3. It doesn't help accounting of kernel memory structures.
   e.g. in OpenVZ we use exactly the same pointer on the page
   to track which container owns it, e.g. pages used for page
   tables are accounted this way.


?
page_to_nid(page) ~= container that owns it.


4. I guess container destroy requires destroy of memory zone,
   which means write out of dirty data. Which doesn't sound
   good for me as well.


I haven't looked at any implementation, but I think it is fine for
the zone to stay around.


5. memory reclamation in case of global memory shortage
   becomes a tricky/unfair task.


I don't understand why? You can much more easily target a specific
container for reclaim with this approach than with others (because
you have an lru per container).



Yes, but we break the global LRU. With these RSS patches, reclaim not
triggered by containers still uses the global LRU, by using nodes,
we would have lost the global LRU.


6. You cannot overcommit. AFAIU, the memory should be granted
   to node exclusive usage and cannot be used by by another containers,
   even if it is unused. This is not an option for us.


I'm not sure about that. If you have a larger number of nodes, then
you could assign more free nodes to a container on demand. But I
think there would definitely be less flexibility with nodes...

I don't know... and seeing as I don't really know where the google
guys are going with it, I won't misrepresent their work any further ;)



Everyone seems to have a plan ;) I don't read the containers list...
does everyone still have *different* plans, or is any sort of consensus
being reached?



hope we'll have it soon :)


Good luck ;)



I think we have made some forward progress on the consensus.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: MediaGX/GeodeGX1 requires X86_OOSTORE.

2007-03-14 Thread takada

From: [EMAIL PROTECTED] (Lennart Sorensen)
Subject: Re: MediaGX/GeodeGX1 requires X86_OOSTORE.
Date: Tue, 20 Feb 2007 09:48:23 -0500

Hiroshi Miura posted `Geode out-of-order store enables' patch in Jun, 2003.
There is http://lkml.org/lkml/2003/6/5/57 .
OOSTORE was enabled at this point in time. It seems to have disappeared 
somewhere.

BTW, I use MediaGX with kernel 2.6.20(and 2.6.20.3) and suspend2. When I resume 
the PC and use the PC Card modem, PC is hungup. However, PC isn't hung up when
I apply a WBINVD patch.
I can't understand it whether there is problem in resume of suspend2 or MediaGX
or both. Many drivers lack support for resume on my PC.

> On Tue, Feb 20, 2007 at 08:34:13PM +0900, takada wrote:
> > I posted with 2.6.20 + enabled X86_OOSTORE.
> > The clflush sze line is in /proc/cpuinfo. but clfush is not in flags line.
> > 
> > BTW, can we use WBINVD instruction? I tested compile only.
> > Do you know a method to change dynamically without #ifdef when it works
> > with MediaGX/GeodeGX.
> > 
> > diff -Narup a/include/asm-i386/io.h b/include/asm-i386/io.h
> > --- a/include/asm-i386/io.h 2007-02-20 16:23:25.0 +0900
> > +++ b/include/asm-i386/io.h 2007-02-20 17:07:14.0 +0900
> > @@ -232,7 +232,19 @@ static inline void memcpy_toio(volatile 
> >   * 2. Accidentally out of order processors (PPro errata #51)
> >   */
> >   
> > -#if defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE)
> > +#ifdef CONFIG_MGEODEGX1
> > +
> > +static inline void dma_flush_cache(void)
> > +{
> > +   __asm__ __volatile__ ("wbinvd": : :"memory");
> > +}
> > +
> > +#define dma_cache_inv(_start,_size)dma_flush_cache()
> > +#define dma_cache_wback(_start,_size)  dma_flush_cache()
> > +#define dma_cache_wback_inv(_start,_size)  dma_flush_cache()
> > +#define flush_write_buffers()
> > +
> > +#elif defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE)
> >  
> >  static inline void flush_write_buffers(void)
> >  {
> > -
> 
> Well it is starting to look like it isn't a caching issue, but more
> likely an issue of which order writes are performed in.  I think the MAC
> might be seeing the ownership bit change before the rest of the
> descriptor, which shouldn't happen.  With X86_OOSTORE, wmb() is called
> between setting the fields in the descriptor and setting the ownership
> bit to the MAC.  I still have to investigate a bit more to find out for
> sure, but that could certainly explain why X86_OOSTORE makes the problem
> become much less frequent.  It doesn't completely elliminate it though.
> Of course maybe there are two different problems with the same symptoms.
> 
> --
> Len Sorensen
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.30 cpu scheduler for mainline kernels

2007-03-14 Thread Con Kolivas

On Thursday 15 March 2007 13:31, Siddha, Suresh B wrote:
> Con,
>
> On Mon, Mar 12, 2007 at 10:58:11AM +1100, Con Kolivas wrote:
> > There are updated patches for 2.6.20, 2.6.20.2, 2.6.21-rc3 and
> > 2.6.21-rc3-mm2 to bring RSDL up to version 0.30 for download here:
>
> I tried this on a Core 2 Quad cpu system(system has 4 cores on a single
> package). When I run SPECjbb2000 with number of threads varying from 1-8,
> I see ~4.5% perf regression with RSDL (compared to native 2.6.21-rc3) in
> the 8 threads case. This I think, is coming from increased number of
> context switches, when we have more than one thread(at same user priority)
> on the same logical cpu.
>
> Just to see the % increase in number of context switches, I ran 8 infinite
> loops (simple while(1); 's) and with 2.6.21-rc3 I see ~70 context switches
> every second, whereas with RSDL I see ~530 context switches.

Thanks. If it's just that then scaling rr interval with cpus somewhat would
help. If you could, the following patch just to test might confirm that.

---
 kernel/sched.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.21-rc3-mm2/kernel/sched.c
===
--- linux-2.6.21-rc3-mm2.orig/kernel/sched.c2007-03-15 17:03:17.0 
+1100
+++ linux-2.6.21-rc3-mm2/kernel/sched.c 2007-03-15 17:03:30.0 +1100
@@ -104,7 +104,7 @@ unsigned long long __attribute__((weak))
  * This is the time all tasks within the same priority round robin.
  * Set to a minimum of 6ms.
  */
-#define RR_INTERVAL((6 * HZ / 1001) + 1)
+#define RR_INTERVAL((12 * HZ / 1001) + 1)
 #define DEF_TIMESLICE  (RR_INTERVAL * 20)
 
 #ifdef CONFIG_SMP

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/13] BLK_DEV_IDE_CELLEB dependency fix

2007-03-14 Thread Al Viro

On Thu, Mar 15, 2007 at 02:25:40PM +0900, Akira Iguchi wrote:
> Al wrote:
> >
> >It's bool and it depends on IDE => should depend on IDE=y
> >
> >Signed-off-by: Al Viro <[EMAIL PROTECTED]>
> 
> Move to "if BLK_DEV_IDEDMA_PCI" block because it depends on 
> BLK_DEV_IDEDMA_PCI.

> +config BLK_DEV_IDE_CELLEB
> + bool "Toshiba's Cell Reference Set IDE support"
> + depends on PPC_CELLEB
> + help
> +   This driver provides support for the built-in IDE controller on
> +   Toshiba Cell Reference Board.
> +   If unsure, say Y.
> +

Eh...  You still need dependency on IDE=y; otherwise you'll get configs
with IDE=m, BLK_DEV_IDE_CELLEB=y and those won't link.  BLK_DEV_IDEDMA_PCI
is selectable just fine with IDE=m.

It's the same problem as with ps3 fb.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kref refcounting breakage in mainline

2007-03-14 Thread Greg KH

On Sat, Mar 10, 2007 at 04:44:06PM +0100, Mike Galbraith wrote:
> On Wed, 2007-03-07 at 06:39 +0100, Mike Galbraith wrote:
> > On Tue, 2007-03-06 at 13:04 -0800, Greg KH wrote:
> > > On Tue, Mar 06, 2007 at 06:43:22AM +0100, Mike Galbraith wrote:
> > > > On Mon, 2007-03-05 at 16:25 -0800, Greg KH wrote:
> > > > 
> > > > > Mike, I've reverted this patch, and I don't see any references 
> > > > > leaking.
> > > > > And, as your patch released the reference on the driver, and the
> > > > > module_add_driver() call would not grab a reference to the driver, 
> > > > > only
> > > > > the module kobject, I don't see what you were trying to fix with this
> > > > > patch.
> > > > > 
> > > > > Do you have a test case that this fixes?
> > > > 
> > > > What it fixed for me was the hard hang reported below.
> > > > 
> > > > http://lkml.org/lkml/2007/2/16/96
> > > 
> > > What specific module are you trying to unload that causes the hang?  I
> > > think it might just be a problem with that module, and not with all
> > > others.
> > 
> > It's ipmi_si that's hanging, waits for completion that never comes.
> > 
> > > So, I'm going to revert your patch and work to try to find the real
> > > cause of this problem.
> > 
> > Yeah, my stab at it seems busted.  I'll take another poke at it to see
> > if I can find out why (post 725522b5453dd680412f2b6463a988e4fd148757)
> > I'm left with a reference.
> 
> Ok, stab #2.
> 
> My reference count woes stem from module_remove_driver() not removing
> the link created in module_add_driver().  With the below, my box boots
> fine.  Since I obviously know spit about driver layer glue, I'll just
> call this one a diagnostic, and head for the hills :)

Does ipmi_si not have a "owner"?  Ah, that makes sense, not all modules
do...

> --- linux-2.6.20-rc3/kernel/module.c.org  2007-03-10 15:16:47.0 
> +0100
> +++ linux-2.6.20-rc3/kernel/module.c  2007-03-10 15:43:09.0 +0100
> @@ -2411,14 +2411,28 @@ void module_remove_driver(struct device_
>   return;
>  
>   sysfs_remove_link(>kobj, "module");
> - if (drv->owner && drv->owner->mkobj.drivers_dir) {
> - driver_name = make_driver_name(drv);
> - if (driver_name) {
> - sysfs_remove_link(drv->owner->mkobj.drivers_dir,
> + driver_name = make_driver_name(drv);
> + if (!driver_name)
> + return;
> + if (drv->owner && drv->owner->mkobj.drivers_dir)
> + sysfs_remove_link(drv->owner->mkobj.drivers_dir,
> driver_name);
> - kfree(driver_name);
> - }
> + else if (drv->mod_name) {
> + struct module_kobject *mk;
> + struct kobject *mkobj;
> +
> + /* Lookup built-in module entry in /sys/modules */
> + mkobj = kset_find_obj(_subsys.kset, drv->mod_name);
> + if (!mkobj)
> + goto out_free;
> + mk = container_of(mkobj, struct module_kobject, kobj);
> + module_create_drivers_dir(mk);
> + sysfs_remove_link(mk->drivers_dir, driver_name);
> + /* Release reference taken via lookup */
> + kobject_put(mkobj);
>   }
> +out_free:
> + kfree(driver_name);
>  }
>  EXPORT_SYMBOL(module_remove_driver);
>  #endif

That's pretty good for not knowing much about the subject matter here.
But can you try this version instead?  It should work a bit better than
yours.

thanks for your patience,

greg k-h

Subject: modules: fix reference counting logic for drivers without module 
pointers.

We weren't dropping the sysfs link for the module driver name if we
didn't happen to have the "owner" pointer in the driver.

Based on a patch from Mike Galbraith <[EMAIL PROTECTED]>

Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 kernel/module.c |   24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2405,20 +2405,30 @@ EXPORT_SYMBOL(module_add_driver);
 
 void module_remove_driver(struct device_driver *drv)
 {
+   struct module_kobject *mk = NULL;
+   struct kobject *mkobj = NULL;
char *driver_name;
 
if (!drv)
return;
 
sysfs_remove_link(>kobj, "module");
-   if (drv->owner && drv->owner->mkobj.drivers_dir) {
-   driver_name = make_driver_name(drv);
-   if (driver_name) {
-   sysfs_remove_link(drv->owner->mkobj.drivers_dir,
- driver_name);
-   kfree(driver_name);
-   }
+   driver_name = make_driver_name(drv);
+   if (!driver_name)
+   return;
+
+   if (drv->owner && drv->owner->mkobj.drivers_dir)
+   mk = >owner->mkobj;
+   else {
+   /* Lookup built-in module entry in /sys/modules */
+   mkobj = kset_find_obj(_subsys.kset, drv->mod_name);
+

[OT] Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires

2007-03-14 Thread Willy Tarreau

On Wed, Mar 14, 2007 at 11:12:48PM -0400, Gene Heskett wrote:
> On Wednesday 14 March 2007, Ray Lee wrote:
> >On 3/13/07, Gene Heskett <[EMAIL PROTECTED]> wrote:
> >> On Tuesday 13 March 2007, Gene Heskett wrote:
> >> >On Tuesday 13 March 2007, Gene Heskett wrote:
> >> >>Greetings;
> >> >>Someone suggested a fresh thread for this.
> >> >>
> >> >>I now have my scripts more or less under control, and I can report
> >> >> that kernel-2.6.20.1 with no other patches does not exhibit the
> >> >> undesirable behaviour where tar thinks its all new, even when told
> >> >> to do a level 2 on a directory tree that hasn't been touched in
> >> >> months to update anything.
> >> >>
> >> >>Next up, 2.6.20.2, plain and with the latest RDSL-0.30 patch.
> >> >
> >> >And amanda/tar worked normally for 2.6.20.2 plain.
> >> >
> >> >Next up, 2.6.21-rc1 if it will build here.
> >>
> >> It built, it booted, and its busted big time.  First, with an amdump
> >> running in the background, the machine is so close to unusable that I
> >> considered rebooting, but I needed the data to show the problem.  I am
> >> losing the keyboard and mouse for a minute or more at a time but the
> >> keystrokes seem to be being registered so it eventually catches up.
> >>
> >> Disk i/o seems to be the killer according to gkrellm.
> >>
> >> But to give one an idea of the fits this is giving tar, I'll snip a
> >> line or 2 from an amstatus report here:
> >> coyote:/GenesAmandaHelper-0.6 1 planner: [dumps way too big, 138200
> >> KB, must skip incremental dumps]
> >>
> >> Huh?  138.2GB?  A 'du -h .' in that dir says 766megs.
> >>
> >> coyote:/root  1 4426m wait for dumping
> >> du -h says 5.0GB so that's ballpark, but its also a level 1, so maybe
> >> 20 megs is actually new since 15:57 this afternoon local.  kmails
> >> final maildir is in that dir.
> >>
> >> This goes on for much of the amstatus report, very few of the reported
> >> sizes are close to sane.
> >>
> >> Now, can someone suggest a patch I can revert that might fix this? 
> >> The total number of patches between 2.6.20 and 2.6.21-rc1 will have me
> >> building kernels to bisect this till the middle of June at this rate.
> >
> >In a previous email, you said you were using ext3. If that's the case,
> >there doesn't appear to be much going on in terms of patches between
> >2.6.20 and 2.6.21-rc1. The only one that even comes close to looking
> >like it might have an effect would only come in to play if you have a
> >filesystem that has ACL information, but is mounted by a kernel that
> >doesn't have ACL support.
> >
> >I have to echo wli here, I'm afraid, and recommend at least a *few*
> >bisections to help narrow down the list of suspect patches.
> >
> >There are tutorials out there for git users. I use the mercurial
> >repository, as I find the mercurial interface and workflow a lot more
> >intuitive, but it has the same capability.
> >
> >Even 2-5 bisections will greatly help others hunt the bug down.
> >
> >Ray
> 
> Probably.  But I've now put a week into this, and from some other clues 
> I've collected, I'm beginning to think tar has a tummy ache. After all, 
> and ls -lc reports totally sane mtimes.  So why is tar going bonkers 
> under kernels 2.6.21-rc*, with or without Cons patches?
> 
> I've also spent a day now looking for a valid place to put a bugzilla 
> entry against tar, but googles search results are sending me to 
> gcc.gnu.org and this is NOT the correct bugzilla for a tar problem.
> 
> Its no secret that with all the churn in tar over the last 5 years, worse 
> churn than the kernel IMO in going from 2.0 to 2.6, that I'm not a fan of 
> yet another _new_ version of tar, when what we just need is _one_ that 
> works.  It is not capable of executing the recovery command listed in the 
> first block of every amdump file it (amdump) ever built right now, and 
> I've played the equ of the 10,000 monkeys writing Shakespear for several 
> hours trying.  Damned frustrating is what it is.
> 
> The error it reports seems to indicate that it cannot write through the 
> pipes involved.  But with tar's error reporting, who the hell knows for 
> sure.
> 
> Here is an example
> [EMAIL PROTECTED] data]# dd if=00010.coyote._lib.1 bs=32k count=1
> AMANDA: FILE 20070314104344 coyote /lib  lev 1 comp .gz program /bin/tar
> To restore, position tape at start of file and run:
>  dd if= bs=32k skip=1 |  /bin/gzip -dc |  /bin/tar -f - ...
> 
> And the elipsis is an error if not removed.  Then one is supposed to be 
> able to redirect tars output with the usual >/tmp/test/ syntax
> 
> So:
> [EMAIL PROTECTED] data]# dd if=00010.coyote._lib.1 bs=32k 
> skip=1 |  /bin/gzip -dc |  /bin/tar -f - >/tmp/test/
> -bash: /tmp/test/: Is a directory
> 
> which is the return from any variation in how the redirect is done.
> 
> So what is it that am I doing wrong in the above command line?, so I can 
> add it to my helper scripts to be published eventually on zmanda.org.

with "/bin/tar -f

[PATCH take3 00/20] Make common x86 arch area for i386 and x86_64 - Take 3

2007-03-14 Thread Steven Rostedt

Once again here's an attempt to put the shared files of x86_64 and i386
into a separate directory.

This time, I took the pains to make sure that each patch in this
series compiles after it is applied.  I did this on both x86_64 as well
as i386, with the affected files config options turned on.

I still stayed away from the pci shared code.

This time I moved the speedstep-lib.h into include/asm-x86. Although all
references to this files now needs to explicitly state
 #include 
But this will also create a doorway for other shared headers to go
into.

And yes the long term goal is to perhaps make a single arch that can
handle both the i386 modern CPUs as well as the x86_64 code. And then
phase out the x86_64, keeping the current i386 for legacy hardware.

Used git-diff -M for the diffs, so the renames are explicitly stated
as such, but no delete/create diff is made (so patch and quilt will
not apply theses).

Comments and flames welcome.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 12/20] mtrr directory switch

2007-03-14 Thread Steven Rostedt

Move the mtrr directory over to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/cpu/Makefile b/arch/i386/kernel/cpu/Makefile
index 010aecf..f8eaef8 100644
--- a/arch/i386/kernel/cpu/Makefile
+++ b/arch/i386/kernel/cpu/Makefile
@@ -15,5 +15,4 @@ obj-y +=  umc.o
 
 obj-$(CONFIG_X86_MCE)  +=  mcheck/
 
-obj-$(CONFIG_MTRR) +=  mtrr/
 obj-$(CONFIG_CPU_FREQ) +=  cpufreq/
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3e15c9e..c1a2b58 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -1,5 +1,7 @@
 obj-y  += bootflag.o quirks.o i8237.o topology.o 
alternative.o
 
+obj-y  += cpu/
+
 obj-$(CONFIG_X86_MSR)  += msr.o
 obj-$(CONFIG_X86_CPUID)+= cpuid.o
 obj-$(CONFIG_MICROCODE)+= microcode.o
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
new file mode 100644
index 000..3e59ae7
--- /dev/null
+++ b/arch/x86/kernel/cpu/Makefile
@@ -0,0 +1,2 @@
+
+obj-$(CONFIG_MTRR) +=  mtrr/
diff --git a/arch/i386/kernel/cpu/mtrr/Makefile 
b/arch/x86/kernel/cpu/mtrr/Makefile
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/Makefile
rename to arch/x86/kernel/cpu/mtrr/Makefile
diff --git a/arch/i386/kernel/cpu/mtrr/amd.c b/arch/x86/kernel/cpu/mtrr/amd.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/amd.c
rename to arch/x86/kernel/cpu/mtrr/amd.c
diff --git a/arch/i386/kernel/cpu/mtrr/centaur.c 
b/arch/x86/kernel/cpu/mtrr/centaur.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/centaur.c
rename to arch/x86/kernel/cpu/mtrr/centaur.c
diff --git a/arch/i386/kernel/cpu/mtrr/cyrix.c 
b/arch/x86/kernel/cpu/mtrr/cyrix.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/cyrix.c
rename to arch/x86/kernel/cpu/mtrr/cyrix.c
diff --git a/arch/i386/kernel/cpu/mtrr/generic.c 
b/arch/x86/kernel/cpu/mtrr/generic.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/generic.c
rename to arch/x86/kernel/cpu/mtrr/generic.c
diff --git a/arch/i386/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/if.c
rename to arch/x86/kernel/cpu/mtrr/if.c
diff --git a/arch/i386/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/main.c
rename to arch/x86/kernel/cpu/mtrr/main.c
diff --git a/arch/i386/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/mtrr.h
rename to arch/x86/kernel/cpu/mtrr/mtrr.h
diff --git a/arch/i386/kernel/cpu/mtrr/state.c 
b/arch/x86/kernel/cpu/mtrr/state.c
similarity index 100%
rename from arch/i386/kernel/cpu/mtrr/state.c
rename to arch/x86/kernel/cpu/mtrr/state.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 3fae694..60918ad 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -14,7 +14,6 @@ obj-$(CONFIG_STACKTRACE)  += stacktrace.o
 obj-$(CONFIG_X86_MCE)  += mce.o therm_throt.o
 obj-$(CONFIG_X86_MCE_INTEL)+= mce_intel.o
 obj-$(CONFIG_X86_MCE_AMD)  += mce_amd.o
-obj-$(CONFIG_MTRR) += ../../i386/kernel/cpu/mtrr/
 obj-$(CONFIG_ACPI) += acpi/
 obj-$(CONFIG_SMP)  += smp.o smpboot.o trampoline.o
 obj-y  += apic.o  nmi.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed

2007-03-14 Thread Ashif Harji




On Wed, 14 Mar 2007, Xiaoning Ding wrote:


Dave Kleikamp wrote:

On Wed, 2007-03-14 at 22:33 +0100, Andreas Mohr wrote:

Hi,

On Wed, Mar 14, 2007 at 03:55:41PM -0500, Dave Kleikamp wrote:

On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
This patch unconditionally calls mark_page_accessed to prevent pages, 
especially for small files, from being evicted from the page cache 
despite frequent access.

I guess the downside to this is if a reader is reading a large file, or
several files, sequentially with a small read size (smaller than
PAGE_SIZE), the pages will be marked active after just one read pass.
My gut says the benefits of this patch outweigh the cost.  I would
expect real-world backup apps, etc. to read at least PAGE_SIZE.

I also think that the patch is somewhat problematic, since the original
intention seems to have been a reduction of the number of (expensive?)
mark_page_accessed() calls,


mark_page_accessed() isn't expensive.  If called repeatedly, starting
with the third call, it will check two page flags and return.  The only
real expense is that the page appears busier than it may be and will be
retained in memory longer than it should.


If we allow mark_page_accessed() called multiple times for a single page,
a scan of large file with small-size reads would flush the buffer cache.
mark_page_accessed() also requests lru_lock when moving page from
inactive_list to active_list. It may also increase lock contention.


The problem with the existing logic is that it is too coarse.  In trying 
to deal with one usage pattern it is negatively impacting performance for 
other reasonable access patterns.


Further, consider the extreme case of scanning a file 1 byte at a time. 
In this case, you are going to access a page over 4000 times, but that 
page is not going to be marked as active and hence that page is likely to 
be evicted from the cache.  Clearly, there are cases when scanning a file 
that you would like the pages to be kept in the cache.


Finally, the existing code is problematic as there is no reasonable way to 
circumvent the negative impact for small files.


Hence, I think a change is necessary.  The question is whether the 
intent of conditionally calling mark_page_accessed() is still reasonable 
and whether the amount of bookkeeping required to detect that usage 
pattern but not create a problem for other usage patterns is reasonable.


I would tend to agree with David that:  "Any application doing many 
tiny-sized reads isn't exactly asking for great performance."  As well, 
applications concerned with performance and caching problems can read in a 
file in PAGE_SIZE chunks.  I still think the simple fix of removing the 
condition is the best approach, but I'm certainly open to alternatives.


ashif.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps

2007-03-14 Thread Horms

On Thu, Mar 15, 2007 at 10:25:36AM +0530, Vivek Goyal wrote:
> On Thu, Mar 15, 2007 at 10:46:38AM +0900, Horms wrote:
> > On Wed, Mar 14, 2007 at 05:00:09PM +, Ian Campbell wrote:
> > > The specific case I am encountering is kdump under Xen with a 64 bit
> > > hypervisor and 32 bit kernel/userspace. The dump created is a 64 bit due
> > > to the hypervisor but the dump kernel is 32 bit to match the domain 0
> > > kernel.
> > > 
> > > It's possibly less likely to be useful in a purely native scenario but I
> > > see no reason to disallow it.
> > 
> > For native Linux, would this cover the case where the pre-crash kernel
> > is 64bit and the crashdump (post-crash) kernel is 32bit?
> > 
> 
> I think so. Though I have never tried this.
> 
> > > Signed-off-by: Ian Campbell <[EMAIL PROTECTED]>
> > > 
> > > --- pristine-linux-2.6.18/include/asm-i386/elf.h  2006-09-20 
> > > 04:42:06.0 +0100
> > > +++ linux-2.6.18-xen/include/asm-i386/elf.h   2007-03-14 
> > > 16:42:30.0 +
> > > @@ -36,7 +36,7 @@
> > >   * This is used to ensure we don't load something for the wrong 
> > > architecture.
> > >   */
> > >  #define elf_check_arch(x) \
> > > - (((x)->e_machine == EM_386) || ((x)->e_machine == EM_486))
> > > + (((x)->e_machine == EM_386) || ((x)->e_machine == EM_486) || 
> > > ((x)->e_machine == EM_X86_64))
> 
> But I think changing this macro might run into issues. It is being used at
> few places in kernel, for example while loading module. This will essentially
> mean that we allow loading 64bit x86_64 modules on 32bit i386 systems?
> 
> Similarly, load_elf_interp() is using it, again will we allow loading a 
> interp written for X86_64 on a 32bit i386 machine?
> 
> Should we create a separate macro something like elf_check_allowed_arch(),
> to take care of such corner cases?

That sounds reasonable to me. Though perhaps it could just be
kexec_elf_check_arch() for now, as I don't think there are any
other consumers of it.

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Stolen and degraded time and schedulers

2007-03-14 Thread Paul Mackerras

Jeremy Fitzhardinge writes:

> Sure.  But on a given machine, the CPUs are likely to be closely enough
> matched that a cycle on one CPU is more or less equivalent to a cycle on
> another CPU.  The fact that a cycle represents a different amount of

A cycle on one thread of a machine with SMT/hyperthreading when the
other thread is idle *isn't* equivalent to a cycle when the other
thread is busy.  We run into this on POWER5, where we have hardware
that counts cycles when each of the two threads in each core gets to
dispatch instructions (on each cycle, one thread or the other gets to
dispatch).  That helps but still doesn't give a totally accurate
estimate of how much computation a given process has managed to do.

> Not at all.  You might have an unimportant but cpu-bound process which
> doesn't merit increasing the cpu speed, but should also be scheduled
> properly compared to other processes.  I often nice my kernel builds
> (which cpufreq takes as a hint to not ramp up the cpu speed) on my
> laptop so to save power.

Just as a side note - that's probably actually a bad strategy; you
almost certainly consume less total energy by running the cpu at full
speed until the build is done and then going to the deepest sleep mode
you can achieve.

> That's true.  But this is a case of the left brain not talking to the
> right brain: cpufreq might decide to slow a cpu down, but the scheduler
> doesn't take that into account.  Making the timebase of sched_clock
> reflect the current cpu speed (or more specifically, the integral of the
> cpu speed over a time interval) is a good way of communicating between
> the two subsystems.

What was the original proposal?  I came into this discussion late...

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 02/20] tsc_sync.c switch

2007-03-14 Thread Steven Rostedt

Move tsc_sync.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index a57040d..c8fe439 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -18,7 +18,7 @@ obj-$(CONFIG_X86_MSR) += msr.o
 obj-$(CONFIG_X86_CPUID)+= cpuid.o
 obj-$(CONFIG_MICROCODE)+= microcode.o
 obj-$(CONFIG_APM)  += apm.o
-obj-$(CONFIG_X86_SMP)  += smp.o smpboot.o tsc_sync.o
+obj-$(CONFIG_X86_SMP)  += smp.o smpboot.o
 obj-$(CONFIG_X86_TRAMPOLINE)   += trampoline.o
 obj-$(CONFIG_X86_MPPARSE)  += mpparse.o
 obj-$(CONFIG_X86_LOCAL_APIC)   += apic.o nmi.o
diff --git a/arch/i386/kernel/tsc_sync.c b/arch/i386/kernel/tsc_sync.c
deleted file mode 100644
index 1242462..000
--- a/arch/i386/kernel/tsc_sync.c
+++ /dev/null
@@ -1 +0,0 @@
-#include "../../x86_64/kernel/tsc_sync.c"
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 55f268f..bd548e6 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -1,2 +1,7 @@
 
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
+
+# i386 defines CONFIG_X86_SMP when CONFIG_SMP and !CONFIG_X86_VOYAGER
+ifeq ($(CONFIG_X86_VOYAGER), )
+obj-$(CONFIG_SMP)  += tsc_sync.o
+endif
diff --git a/arch/x86_64/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
similarity index 100%
rename from arch/x86_64/kernel/tsc_sync.c
rename to arch/x86/kernel/tsc_sync.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 8b2535c..54fe500 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -19,7 +19,7 @@ obj-$(CONFIG_ACPI)+= acpi/
 obj-$(CONFIG_X86_MSR)  += msr.o
 obj-$(CONFIG_MICROCODE)+= microcode.o
 obj-$(CONFIG_X86_CPUID)+= cpuid.o
-obj-$(CONFIG_SMP)  += smp.o smpboot.o trampoline.o tsc_sync.o
+obj-$(CONFIG_SMP)  += smp.o smpboot.o trampoline.o
 obj-y  += apic.o  nmi.o
 obj-y  += io_apic.o mpparse.o \
genapic.o genapic_cluster.o genapic_flat.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 15/20] cpufreq files switched

2007-03-14 Thread Steven Rostedt

Moved the shared files that were in the arch/i386/kernel/cpu/cpufreq to
the common area.  Since the speedstep-lib.h file was used by files that
were moved as well as files that were not moved, a new directory was
created to hold this shared header, called include/asm-x86.  Since this
directory is not full featured yet (no x86 arch fully defined) all references
to this file must be of #include 

But this allows for a stepping stone approach to a generic x86 arch and
a place to put more asm-x86 headers.

The Kconfig for cpufreq in the x86_64 arch directory is not moved
to simplify this patch.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Chris Wright <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/cpu/cpufreq/Makefile 
b/arch/i386/kernel/cpu/cpufreq/Makefile
index 560f776..49c4ca4 100644
--- a/arch/i386/kernel/cpu/cpufreq/Makefile
+++ b/arch/i386/kernel/cpu/cpufreq/Makefile
@@ -1,6 +1,6 @@
+# See also arch/x86/kernel/cpu/cpufreq/Makefile
 obj-$(CONFIG_X86_POWERNOW_K6)  += powernow-k6.o
 obj-$(CONFIG_X86_POWERNOW_K7)  += powernow-k7.o
-obj-$(CONFIG_X86_POWERNOW_K8)  += powernow-k8.o
 obj-$(CONFIG_X86_LONGHAUL) += longhaul.o
 obj-$(CONFIG_X86_E_POWERSAVER) += e_powersaver.o
 obj-$(CONFIG_ELAN_CPUFREQ) += elanfreq.o
@@ -8,9 +8,5 @@ obj-$(CONFIG_SC520_CPUFREQ) += sc520_freq.o
 obj-$(CONFIG_X86_LONGRUN)  += longrun.o  
 obj-$(CONFIG_X86_GX_SUSPMOD)   += gx-suspmod.o
 obj-$(CONFIG_X86_SPEEDSTEP_ICH)+= speedstep-ich.o
-obj-$(CONFIG_X86_SPEEDSTEP_LIB)+= speedstep-lib.o
 obj-$(CONFIG_X86_SPEEDSTEP_SMI)+= speedstep-smi.o
-obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o
-obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO)   += speedstep-centrino.o
-obj-$(CONFIG_X86_P4_CLOCKMOD)  += p4-clockmod.o
 obj-$(CONFIG_X86_CPUFREQ_NFORCE2)  += cpufreq-nforce2.o
diff --git a/arch/i386/kernel/cpu/cpufreq/speedstep-ich.c 
b/arch/i386/kernel/cpu/cpufreq/speedstep-ich.c
index b425cd3..97c14b3 100644
--- a/arch/i386/kernel/cpu/cpufreq/speedstep-ich.c
+++ b/arch/i386/kernel/cpu/cpufreq/speedstep-ich.c
@@ -25,7 +25,7 @@
 #include 
 #include 
 
-#include "speedstep-lib.h"
+#include 
 
 
 /* speedstep_chipset:
diff --git a/arch/i386/kernel/cpu/cpufreq/speedstep-smi.c 
b/arch/i386/kernel/cpu/cpufreq/speedstep-smi.c
index ff0d898..093d7d0 100644
--- a/arch/i386/kernel/cpu/cpufreq/speedstep-smi.c
+++ b/arch/i386/kernel/cpu/cpufreq/speedstep-smi.c
@@ -22,7 +22,7 @@
 #include 
 #include 
 
-#include "speedstep-lib.h"
+#include 
 
 /* speedstep system management interface port/command.
  *
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 6557e4a..4728c89 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -2,3 +2,4 @@ obj-y   +=  intel_cacheinfo.o
 
 obj-$(CONFIG_X86_MCE)  +=  mcheck/
 obj-$(CONFIG_MTRR) +=  mtrr/
+obj-$(CONFIG_CPU_FREQ) +=  cpufreq/
diff --git a/arch/x86/kernel/cpu/cpufreq/Makefile 
b/arch/x86/kernel/cpu/cpufreq/Makefile
new file mode 100644
index 000..883fae4
--- /dev/null
+++ b/arch/x86/kernel/cpu/cpufreq/Makefile
@@ -0,0 +1,6 @@
+
+obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o
+obj-$(CONFIG_X86_SPEEDSTEP_LIB) += speedstep-lib.o
+obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o
+obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o
+obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o
diff --git a/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c 
b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
similarity index 100%
rename from arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
rename to arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
diff --git a/arch/i386/kernel/cpu/cpufreq/p4-clockmod.c 
b/arch/x86/kernel/cpu/cpufreq/p4-clockmod.c
similarity index 100%
rename from arch/i386/kernel/cpu/cpufreq/p4-clockmod.c
rename to arch/x86/kernel/cpu/cpufreq/p4-clockmod.c
index 4786fed..5024ea8 100644
--- a/arch/i386/kernel/cpu/cpufreq/p4-clockmod.c
+++ b/arch/x86/kernel/cpu/cpufreq/p4-clockmod.c
@@ -33,7 +33,7 @@
 #include 
 #include 
 
-#include "speedstep-lib.h"
+#include 
 
 #define PFX"p4-clockmod: "
 #define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, 
"p4-clockmod", msg)
diff --git a/arch/i386/kernel/cpu/cpufreq/powernow-k8.c 
b/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
similarity index 100%
rename from arch/i386/kernel/cpu/cpufreq/powernow-k8.c
rename to arch/x86/kernel/cpu/cpufreq/powernow-k8.c
diff --git a/arch/i386/kernel/cpu/cpufreq/powernow-k8.h 
b/arch/x86/kernel/cpu/cpufreq/powernow-k8.h
similarity index 100%
rename from arch/i386/kernel/cpu/cpufreq/powernow-k8.h
rename to arch/x86/kernel/cpu/cpufreq/powernow-k8.h
diff --git a/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c 
b/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
similarity index 100%
rename from arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
rename to arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
diff

Re: kswapd & 2.4.21-47.0.0.1

2007-03-14 Thread Konstantin Kalin

Well. I expected similar answer :) But unfortunately it's not my 
decision to use CentOS. Also I couldn't get RH customer support for some 
reasons.

So anyway thank you for answer.

Regards,
Kostya.

Willy Tarreau wrote:

Hello,

On Wed, Mar 14, 2007 at 04:35:55PM +0300, Konstantin Kalin wrote:
  

Hello, All

I have the following configuration: CentOS 3.8, kernel 
2.4.21-41.0.01.EL, Dialogic boards.
Sometimes a kernel panic happens. I setup netdump and got several crash 
dumps and logs. Backtrace shows that kswapd called BUG in try_to_unmap 
function. Unfortunately I couldn't upgrade the kernel because of 
proprietary Dialogic drivers which are precompiled.


Could somebody help me? I tried to find similar issues in maillist and 
failed with it. There are a few messages but they describe another case.



Well, I think you're trying to get both the cake and the money for it.
You use a vendor-specific stable kernel in order to get a high reliability
and good hardware support, but without paying for the customer support
associated with it, and when you have a problem you ask for free help
here where people don't know much about it (except for those who worked
on it).

By trying to get all advantages, you're in the worst situation : you have
a bug with a kernel that nobody knows except the vendor, and you can't
beat the vendor for this. I don't know if CentOS offers community-based
support through mailing lists or such, but maybe you'd loose less time
and money by buying the smallest support contract from RH and ask them
to help you on this problem.

  
As I understand the rmap.c are under active development and it's 
strongly been changing per each kernel version. Also if I understand 
correct rmap.c has appeared in the kernel 2.6.x and my version of the 
kernel is a backport by RedHat from 2.6 to 2.4.



Nope, it was initially written for 2.4 by Rik van Riel, and supported
for a long time as a patch for these kernels. Later it got merged in
2.4-ac which became a base for RHEL3. It was also merged in 2.6 but
I believe that it got important changes, though I'm not sure.

  
Information about the crash is below. The specific of my system is a lot 
of java thread (up to 1500).



I'm not sure that many people here will be able to provide you with much
help, unfortunately.

Regards,
Willy




  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 20/20] oprofile files switched

2007-03-14 Thread Steven Rostedt

Move the oprofile files from arch/i386/oprofile to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index 53d6237..137c063 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -1226,7 +1226,7 @@ source "fs/Kconfig"
 menu "Instrumentation Support"
depends on EXPERIMENTAL
 
-source "arch/i386/oprofile/Kconfig"
+source "arch/x86/oprofile/Kconfig"
 
 config KPROBES
bool "Kprobes (EXPERIMENTAL)"
diff --git a/arch/i386/Makefile b/arch/i386/Makefile
index 06dd07e..6e537be 100644
--- a/arch/i386/Makefile
+++ b/arch/i386/Makefile
@@ -108,7 +108,7 @@ core-y  += 
arch/i386/kernel/ \
 drivers-$(CONFIG_MATH_EMULATION)   += arch/i386/math-emu/
 drivers-$(CONFIG_PCI)  += arch/i386/pci/
 # must be linked after kernel/
-drivers-$(CONFIG_OPROFILE) += arch/i386/oprofile/
+drivers-$(CONFIG_OPROFILE) += arch/x86/oprofile/
 drivers-$(CONFIG_PM)   += arch/i386/power/
 
 CFLAGS += $(mflags-y)
diff --git a/arch/i386/oprofile/Kconfig b/arch/x86/oprofile/Kconfig
similarity index 100%
rename from arch/i386/oprofile/Kconfig
rename to arch/x86/oprofile/Kconfig
diff --git a/arch/i386/oprofile/Makefile b/arch/x86/oprofile/Makefile
similarity index 100%
rename from arch/i386/oprofile/Makefile
rename to arch/x86/oprofile/Makefile
diff --git a/arch/i386/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
similarity index 100%
rename from arch/i386/oprofile/backtrace.c
rename to arch/x86/oprofile/backtrace.c
diff --git a/arch/i386/oprofile/init.c b/arch/x86/oprofile/init.c
similarity index 100%
rename from arch/i386/oprofile/init.c
rename to arch/x86/oprofile/init.c
diff --git a/arch/i386/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
similarity index 100%
rename from arch/i386/oprofile/nmi_int.c
rename to arch/x86/oprofile/nmi_int.c
diff --git a/arch/i386/oprofile/nmi_timer_int.c 
b/arch/x86/oprofile/nmi_timer_int.c
similarity index 100%
rename from arch/i386/oprofile/nmi_timer_int.c
rename to arch/x86/oprofile/nmi_timer_int.c
diff --git a/arch/i386/oprofile/op_counter.h b/arch/x86/oprofile/op_counter.h
similarity index 100%
rename from arch/i386/oprofile/op_counter.h
rename to arch/x86/oprofile/op_counter.h
diff --git a/arch/i386/oprofile/op_model_athlon.c 
b/arch/x86/oprofile/op_model_athlon.c
similarity index 100%
rename from arch/i386/oprofile/op_model_athlon.c
rename to arch/x86/oprofile/op_model_athlon.c
diff --git a/arch/i386/oprofile/op_model_p4.c b/arch/x86/oprofile/op_model_p4.c
similarity index 100%
rename from arch/i386/oprofile/op_model_p4.c
rename to arch/x86/oprofile/op_model_p4.c
diff --git a/arch/i386/oprofile/op_model_ppro.c 
b/arch/x86/oprofile/op_model_ppro.c
similarity index 100%
rename from arch/i386/oprofile/op_model_ppro.c
rename to arch/x86/oprofile/op_model_ppro.c
diff --git a/arch/i386/oprofile/op_x86_model.h 
b/arch/x86/oprofile/op_x86_model.h
similarity index 100%
rename from arch/i386/oprofile/op_x86_model.h
rename to arch/x86/oprofile/op_x86_model.h
diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index 56eb14c..12e9fc4 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -738,7 +738,7 @@ source fs/Kconfig
 menu "Instrumentation Support"
 depends on EXPERIMENTAL
 
-source "arch/x86_64/oprofile/Kconfig"
+source "arch/x86/oprofile/Kconfig"
 
 config KPROBES
bool "Kprobes (EXPERIMENTAL)"
diff --git a/arch/x86_64/Makefile b/arch/x86_64/Makefile
index abf1829..0c7e0fa 100644
--- a/arch/x86_64/Makefile
+++ b/arch/x86_64/Makefile
@@ -85,7 +85,7 @@ core-y+= 
arch/x86_64/kernel/ \
   arch/x86_64/crypto/
 core-$(CONFIG_IA32_EMULATION)  += arch/x86_64/ia32/
 drivers-$(CONFIG_PCI)  += arch/x86_64/pci/
-drivers-$(CONFIG_OPROFILE) += arch/x86_64/oprofile/
+drivers-$(CONFIG_OPROFILE) += arch/x86/oprofile/
 
 boot := arch/x86_64/boot
 
diff --git a/arch/x86_64/oprofile/Kconfig b/arch/x86_64/oprofile/Kconfig
deleted file mode 100644
index d8a8408..000
--- a/arch/x86_64/oprofile/Kconfig
+++ /dev/null
@@ -1,17 +0,0 @@
-config PROFILING
-   bool "Profiling support (EXPERIMENTAL)"
-   help
- Say Y here to enable the extended profiling support mechanisms used
- by profilers such as OProfile.
- 
-
-config OPROFILE
-   tristate "OProfile system profiling (EXPERIMENTAL)"
-   depends on PROFILING
-   help
- OProfile is a profiling system capable of profiling the
- whole system, include the kernel, kernel modules, libraries,
- and applications.
-
- If unsure, say N.
-
diff --git a/arch/x86_64/oprofile/Makefile b/arch/x86_64/oprofile/Makefile
deleted file mode 100644
index 6be3268..000
--- a/arch/x86_64/oprofile/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-#
-# oprofile for x86-64.
-# Just reuse the one from i386.

[PATCH take3 09/20] cpuid.c switch

2007-03-14 Thread Steven Rostedt

Move the cpuid.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index 5276349..4437181 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -14,7 +14,6 @@ obj-y += cpu/
 obj-y  += acpi/
 obj-$(CONFIG_X86_BIOS_REBOOT)  += reboot.o
 obj-$(CONFIG_MCA)  += mca.o
-obj-$(CONFIG_X86_CPUID)+= cpuid.o
 obj-$(CONFIG_MICROCODE)+= microcode.o
 obj-$(CONFIG_APM)  += apm.o
 obj-$(CONFIG_X86_SMP)  += smp.o smpboot.o
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4e5a88f..912421a 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -1,6 +1,7 @@
 obj-y  += bootflag.o quirks.o i8237.o topology.o 
alternative.o
 
 obj-$(CONFIG_X86_MSR)  += msr.o
+obj-$(CONFIG_X86_CPUID)+= cpuid.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
 # i386 defines CONFIG_X86_SMP when CONFIG_SMP and !CONFIG_X86_VOYAGER
diff --git a/arch/i386/kernel/cpuid.c b/arch/x86/kernel/cpuid.c
similarity index 100%
rename from arch/i386/kernel/cpuid.c
rename to arch/x86/kernel/cpuid.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 248dbe8..f5997f3 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -17,7 +17,6 @@ obj-$(CONFIG_X86_MCE_AMD) += mce_amd.o
 obj-$(CONFIG_MTRR) += ../../i386/kernel/cpu/mtrr/
 obj-$(CONFIG_ACPI) += acpi/
 obj-$(CONFIG_MICROCODE)+= microcode.o
-obj-$(CONFIG_X86_CPUID)+= cpuid.o
 obj-$(CONFIG_SMP)  += smp.o smpboot.o trampoline.o
 obj-y  += apic.o  nmi.o
 obj-y  += io_apic.o mpparse.o \
@@ -45,7 +44,6 @@ obj-y += pcspeaker.o
 CFLAGS_vsyscall.o  := $(PROFILING) -g0
 
 therm_throt-y   += ../../i386/kernel/cpu/mcheck/therm_throt.o
-cpuid-$(subst m,y,$(CONFIG_X86_CPUID))  += ../../i386/kernel/cpuid.o
 microcode-$(subst m,y,$(CONFIG_MICROCODE))  += ../../i386/kernel/microcode.o
 intel_cacheinfo-y  += ../../i386/kernel/cpu/intel_cacheinfo.o
 pcspeaker-y+= ../../i386/kernel/pcspeaker.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 11/20] pcspeaker.c switch

2007-03-14 Thread Steven Rostedt

Move the pcspeaker.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index ac925bc..ce1f742 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -37,7 +37,6 @@ obj-$(CONFIG_K8_NB)   += k8.o
 
 obj-$(CONFIG_VMI)  += vmi.o vmitime.o
 obj-$(CONFIG_PARAVIRT) += paravirt.o
-obj-y  += pcspeaker.o
 
 EXTRA_AFLAGS   := -traditional
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index f1c6b2e..3e15c9e 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -5,6 +5,8 @@ obj-$(CONFIG_X86_CPUID) += cpuid.o
 obj-$(CONFIG_MICROCODE)+= microcode.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
+obj-y  += pcspeaker.o
+
 # i386 defines CONFIG_X86_SMP when CONFIG_SMP and !CONFIG_X86_VOYAGER
 ifeq ($(CONFIG_X86_VOYAGER), )
 obj-$(CONFIG_SMP)  += tsc_sync.o
diff --git a/arch/i386/kernel/pcspeaker.c b/arch/x86/kernel/pcspeaker.c
similarity index 100%
rename from arch/i386/kernel/pcspeaker.c
rename to arch/x86/kernel/pcspeaker.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 08795d8..3fae694 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -38,10 +38,8 @@ obj-$(CONFIG_MODULES)+= module.o
 obj-$(CONFIG_PCI)  += early-quirks.o
 
 obj-y  += intel_cacheinfo.o
-obj-y  += pcspeaker.o
 
 CFLAGS_vsyscall.o  := $(PROFILING) -g0
 
 therm_throt-y   += ../../i386/kernel/cpu/mcheck/therm_throt.o
 intel_cacheinfo-y  += ../../i386/kernel/cpu/intel_cacheinfo.o
-pcspeaker-y+= ../../i386/kernel/pcspeaker.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 13/20] therm_throt.c switch

2007-03-14 Thread Steven Rostedt

Move the therm_throt.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/cpu/mcheck/Makefile 
b/arch/i386/kernel/cpu/mcheck/Makefile
index f1ebe1c..30808f3 100644
--- a/arch/i386/kernel/cpu/mcheck/Makefile
+++ b/arch/i386/kernel/cpu/mcheck/Makefile
@@ -1,2 +1,2 @@
-obj-y  =   mce.o k7.o p4.o p5.o p6.o winchip.o therm_throt.o
+obj-y  =   mce.o k7.o p4.o p5.o p6.o winchip.o
 obj-$(CONFIG_X86_MCE_NONFATAL) +=  non-fatal.o
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 3e59ae7..e439cc1 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -1,2 +1,3 @@
 
+obj-$(CONFIG_X86_MCE)  +=  mcheck/
 obj-$(CONFIG_MTRR) +=  mtrr/
diff --git a/arch/x86/kernel/cpu/mcheck/Makefile 
b/arch/x86/kernel/cpu/mcheck/Makefile
new file mode 100644
index 000..4018cde
--- /dev/null
+++ b/arch/x86/kernel/cpu/mcheck/Makefile
@@ -0,0 +1 @@
+obj-y  = therm_throt.o
diff --git a/arch/i386/kernel/cpu/mcheck/therm_throt.c 
b/arch/x86/kernel/cpu/mcheck/therm_throt.c
similarity index 100%
rename from arch/i386/kernel/cpu/mcheck/therm_throt.c
rename to arch/x86/kernel/cpu/mcheck/therm_throt.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 60918ad..ef1585d 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -11,7 +11,7 @@ obj-y := process.o signal.o entry.o traps.o irq.o \
pci-dma.o pci-nommu.o hpet.o tsc.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
-obj-$(CONFIG_X86_MCE)  += mce.o therm_throt.o
+obj-$(CONFIG_X86_MCE)  += mce.o
 obj-$(CONFIG_X86_MCE_INTEL)+= mce_intel.o
 obj-$(CONFIG_X86_MCE_AMD)  += mce_amd.o
 obj-$(CONFIG_ACPI) += acpi/
@@ -40,5 +40,4 @@ obj-y += intel_cacheinfo.o
 
 CFLAGS_vsyscall.o  := $(PROFILING) -g0
 
-therm_throt-y   += ../../i386/kernel/cpu/mcheck/therm_throt.o
 intel_cacheinfo-y  += ../../i386/kernel/cpu/intel_cacheinfo.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 14/20] intel_cacheinfo.c switch

2007-03-14 Thread Steven Rostedt

Move the intel_cacheinfo.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/cpu/Makefile b/arch/i386/kernel/cpu/Makefile
index f8eaef8..e484d74 100644
--- a/arch/i386/kernel/cpu/Makefile
+++ b/arch/i386/kernel/cpu/Makefile
@@ -8,7 +8,7 @@ obj-y   +=  amd.o
 obj-y  +=  cyrix.o
 obj-y  +=  centaur.o
 obj-y  +=  transmeta.o
-obj-y  +=  intel.o intel_cacheinfo.o
+obj-y  +=  intel.o
 obj-y  +=  rise.o
 obj-y  +=  nexgen.o
 obj-y  +=  umc.o
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index e439cc1..6557e4a 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -1,3 +1,4 @@
+obj-y  +=  intel_cacheinfo.o
 
 obj-$(CONFIG_X86_MCE)  +=  mcheck/
 obj-$(CONFIG_MTRR) +=  mtrr/
diff --git a/arch/i386/kernel/cpu/intel_cacheinfo.c 
b/arch/x86/kernel/cpu/intel_cacheinfo.c
similarity index 100%
rename from arch/i386/kernel/cpu/intel_cacheinfo.c
rename to arch/x86/kernel/cpu/intel_cacheinfo.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index ef1585d..0a33b03 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -36,8 +36,5 @@ obj-$(CONFIG_AUDIT)   += audit.o
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_PCI)  += early-quirks.o
 
-obj-y  += intel_cacheinfo.o
-
 CFLAGS_vsyscall.o  := $(PROFILING) -g0
 
-intel_cacheinfo-y  += ../../i386/kernel/cpu/intel_cacheinfo.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 19/20] hugetlbpage.c switch

2007-03-14 Thread Steven Rostedt

Move the hugetlbpage.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/Makefile b/arch/i386/Makefile
index d73a830..06dd07e 100644
--- a/arch/i386/Makefile
+++ b/arch/i386/Makefile
@@ -102,6 +102,7 @@ libs-y  += 
arch/i386/lib/
 core-y += arch/i386/kernel/ \
   arch/x86/kernel/ \
   arch/i386/mm/ \
+  arch/x86/mm/ \
   arch/i386/$(mcore-y)/ \
   arch/i386/crypto/
 drivers-$(CONFIG_MATH_EMULATION)   += arch/i386/math-emu/
diff --git a/arch/i386/mm/Makefile b/arch/i386/mm/Makefile
index 80908b5..0cb01e6 100644
--- a/arch/i386/mm/Makefile
+++ b/arch/i386/mm/Makefile
@@ -5,6 +5,5 @@
 obj-y  := init.o pgtable.o fault.o ioremap.o extable.o pageattr.o mmap.o
 
 obj-$(CONFIG_NUMA) += discontig.o
-obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_HIGHMEM) += highmem.o
 obj-$(CONFIG_BOOT_IOREMAP) += boot_ioremap.o
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
new file mode 100644
index 000..1b6e922
--- /dev/null
+++ b/arch/x86/mm/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
diff --git a/arch/i386/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
similarity index 100%
rename from arch/i386/mm/hugetlbpage.c
rename to arch/x86/mm/hugetlbpage.c
diff --git a/arch/x86_64/Makefile b/arch/x86_64/Makefile
index 3cf9198..abf1829 100644
--- a/arch/x86_64/Makefile
+++ b/arch/x86_64/Makefile
@@ -81,6 +81,7 @@ libs-y+= 
arch/x86_64/lib/
 core-y += arch/x86_64/kernel/ \
   arch/x86/kernel/ \
   arch/x86_64/mm/ \
+  arch/x86/mm/ \
   arch/x86_64/crypto/
 core-$(CONFIG_IA32_EMULATION)  += arch/x86_64/ia32/
 drivers-$(CONFIG_PCI)  += arch/x86_64/pci/
diff --git a/arch/x86_64/mm/Makefile b/arch/x86_64/mm/Makefile
index d25ac86..b6f1f43 100644
--- a/arch/x86_64/mm/Makefile
+++ b/arch/x86_64/mm/Makefile
@@ -3,9 +3,7 @@
 #
 
 obj-y   := init.o fault.o ioremap.o extable.o pageattr.o mmap.o
-obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_NUMA) += numa.o
 obj-$(CONFIG_K8_NUMA) += k8topology.o
 obj-$(CONFIG_ACPI_NUMA) += srat.o
 
-hugetlbpage-y = ../../i386/mm/hugetlbpage.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 05/20] i8237.c switch

2007-03-14 Thread Steven Rostedt

Move the i8237.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index c5c62af..1052659 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -7,7 +7,7 @@ extra-y := head.o init_task.o vmlinux.lds
 obj-y  := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
pci-dma.o i386_ksyms.o i387.o e820.o\
-   i8237.o topology.o alternative.o i8253.o tsc.o
+   topology.o alternative.o i8253.o tsc.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-y  += cpu/
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 26feab4..19921b9 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -1,4 +1,4 @@
-obj-y  += bootflag.o quirks.o
+obj-y  += bootflag.o quirks.o i8237.o
 
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
diff --git a/arch/i386/kernel/i8237.c b/arch/x86/kernel/i8237.c
similarity index 100%
rename from arch/i386/kernel/i8237.c
rename to arch/x86/kernel/i8237.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 533d4bb..c04f7a6 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -7,7 +7,7 @@ EXTRA_AFLAGS:= -traditional
 obj-y  := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_x86_64.o \
x8664_ksyms.o i387.o syscall.o vsyscall.o \
-   setup64.o e820.o reboot.o i8237.o \
+   setup64.o e820.o reboot.o \
pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
@@ -51,7 +51,6 @@ cpuid-$(subst m,y,$(CONFIG_X86_CPUID))  += 
../../i386/kernel/cpuid.o
 topology-y += ../../i386/kernel/topology.o
 microcode-$(subst m,y,$(CONFIG_MICROCODE))  += ../../i386/kernel/microcode.o
 intel_cacheinfo-y  += ../../i386/kernel/cpu/intel_cacheinfo.o
-i8237-y+= ../../i386/kernel/i8237.o
 msr-$(subst m,y,$(CONFIG_X86_MSR))  += ../../i386/kernel/msr.o
 alternative-y  += ../../i386/kernel/alternative.o
 pcspeaker-y+= ../../i386/kernel/pcspeaker.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 06/20] topology.c switch

2007-03-14 Thread Steven Rostedt

Move the topology.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index 1052659..556da60 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -7,7 +7,7 @@ extra-y := head.o init_task.o vmlinux.lds
 obj-y  := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
pci-dma.o i386_ksyms.o i387.o e820.o\
-   topology.o alternative.o i8253.o tsc.o
+   alternative.o i8253.o tsc.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-y  += cpu/
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 19921b9..d70dbf3 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -1,4 +1,4 @@
-obj-y  += bootflag.o quirks.o i8237.o
+obj-y  += bootflag.o quirks.o i8237.o topology.o
 
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
diff --git a/arch/i386/kernel/topology.c b/arch/x86/kernel/topology.c
similarity index 100%
rename from arch/i386/kernel/topology.c
rename to arch/x86/kernel/topology.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index c04f7a6..3dc4c18 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -40,7 +40,6 @@ obj-$(CONFIG_AUDIT)   += audit.o
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_PCI)  += early-quirks.o
 
-obj-y  += topology.o
 obj-y  += intel_cacheinfo.o
 obj-y  += pcspeaker.o
 
@@ -48,7 +47,6 @@ CFLAGS_vsyscall.o := $(PROFILING) -g0
 
 therm_throt-y   += ../../i386/kernel/cpu/mcheck/therm_throt.o
 cpuid-$(subst m,y,$(CONFIG_X86_CPUID))  += ../../i386/kernel/cpuid.o
-topology-y += ../../i386/kernel/topology.o
 microcode-$(subst m,y,$(CONFIG_MICROCODE))  += ../../i386/kernel/microcode.o
 intel_cacheinfo-y  += ../../i386/kernel/cpu/intel_cacheinfo.o
 msr-$(subst m,y,$(CONFIG_X86_MSR))  += ../../i386/kernel/msr.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 17/20] k8.c switch

2007-03-14 Thread Steven Rostedt

Move the k8.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index ce1f742..72e11f7 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -33,7 +33,6 @@ obj-$(CONFIG_EFI) += efi.o efi_stub.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
 obj-$(CONFIG_VM86) += vm86.o
 obj-$(CONFIG_HPET_TIMER)   += hpet.o
-obj-$(CONFIG_K8_NB)+= k8.o
 
 obj-$(CONFIG_VMI)  += vmi.o vmitime.o
 obj-$(CONFIG_PARAVIRT) += paravirt.o
@@ -78,6 +77,5 @@ $(obj)/vsyscall-syms.o: $(src)/vsyscall.lds \
$(obj)/vsyscall-sysenter.o $(obj)/vsyscall-note.o FORCE
$(call if_changed,syscall)
 
-k8-y  += ../../x86_64/kernel/k8.o
 stacktrace-y += ../../x86_64/kernel/stacktrace.o
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 1167962..06c335d 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_X86_MSR)   += msr.o
 obj-$(CONFIG_X86_CPUID)+= cpuid.o
 obj-$(CONFIG_MICROCODE)+= microcode.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
+obj-$(CONFIG_K8_NB)+= k8.o
 
 obj-y  += pcspeaker.o
 
diff --git a/arch/x86_64/kernel/k8.c b/arch/x86/kernel/k8.c
similarity index 100%
rename from arch/x86_64/kernel/k8.c
rename to arch/x86/kernel/k8.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 3d90462..0510887 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -29,7 +29,6 @@ obj-$(CONFIG_SWIOTLB) += pci-swiotlb.o
 obj-$(CONFIG_KPROBES)  += kprobes.o
 obj-$(CONFIG_X86_PM_TIMER) += pmtimer.o
 obj-$(CONFIG_X86_VSMP) += vsmp.o
-obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_AUDIT)+= audit.o
 
 obj-$(CONFIG_MODULES)  += module.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 01/20] early_printk.c switch

2007-03-14 Thread Steven Rostedt

Move the early_printk.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index 4ae3dcf..a57040d 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -35,7 +35,6 @@ obj-$(CONFIG_ACPI_SRAT)   += srat.o
 obj-$(CONFIG_EFI)  += efi.o efi_stub.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
 obj-$(CONFIG_VM86) += vm86.o
-obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 obj-$(CONFIG_HPET_TIMER)   += hpet.o
 obj-$(CONFIG_K8_NB)+= k8.o
 
diff --git a/arch/i386/kernel/early_printk.c b/arch/i386/kernel/early_printk.c
deleted file mode 100644
index 92f812b..000
--- a/arch/i386/kernel/early_printk.c
+++ /dev/null
@@ -1,2 +0,0 @@
-
-#include "../../x86_64/kernel/early_printk.c"
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
new file mode 100644
index 000..55f268f
--- /dev/null
+++ b/arch/x86/kernel/Makefile
@@ -0,0 +1,2 @@
+
+obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
diff --git a/arch/x86_64/kernel/early_printk.c b/arch/x86/kernel/early_printk.c
similarity index 100%
rename from arch/x86_64/kernel/early_printk.c
rename to arch/x86/kernel/early_printk.c
diff --git a/arch/x86_64/Makefile b/arch/x86_64/Makefile
index 2941a91..3cf9198 100644
--- a/arch/x86_64/Makefile
+++ b/arch/x86_64/Makefile
@@ -79,6 +79,7 @@ head-y := arch/x86_64/kernel/head.o 
arch/x86_64/kernel/head64.o arch/x86_64/kern
 
 libs-y += arch/x86_64/lib/
 core-y += arch/x86_64/kernel/ \
+  arch/x86/kernel/ \
   arch/x86_64/mm/ \
   arch/x86_64/crypto/
 core-$(CONFIG_IA32_EMULATION)  += arch/x86_64/ia32/
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index bb47e86..8b2535c 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -28,7 +28,6 @@ obj-$(CONFIG_CRASH_DUMP)  += crash_dump.o
 obj-$(CONFIG_PM)   += suspend.o
 obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend_asm.o
 obj-$(CONFIG_CPU_FREQ) += cpufreq/
-obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 obj-$(CONFIG_IOMMU)+= pci-gart.o aperture.o
 obj-$(CONFIG_CALGARY_IOMMU)+= pci-calgary.o tce.o
 obj-$(CONFIG_SWIOTLB)  += pci-swiotlb.o
diff --git a/arch/i386/Makefile b/arch/i386/Makefile
index bd28f9f..d73a830 100644
--- a/arch/i386/Makefile
+++ b/arch/i386/Makefile
@@ -100,6 +100,7 @@ head-y := arch/i386/kernel/head.o 
arch/i386/kernel/init_task.o
 
 libs-y += arch/i386/lib/
 core-y += arch/i386/kernel/ \
+  arch/x86/kernel/ \
   arch/i386/mm/ \
   arch/i386/$(mcore-y)/ \
   arch/i386/crypto/

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 18/20] stacktrace.c switch

2007-03-14 Thread Steven Rostedt

Move the stacktrace.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index 72e11f7..a5cf2e7 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -9,7 +9,6 @@ obj-y   := process.o signal.o entry.o traps.o irq.o \
pci-dma.o i386_ksyms.o i387.o e820.o\
i8253.o tsc.o
 
-obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-y  += cpu/
 obj-y  += acpi/
 obj-$(CONFIG_X86_BIOS_REBOOT)  += reboot.o
@@ -77,5 +76,3 @@ $(obj)/vsyscall-syms.o: $(src)/vsyscall.lds \
$(obj)/vsyscall-sysenter.o $(obj)/vsyscall-note.o FORCE
$(call if_changed,syscall)
 
-stacktrace-y += ../../x86_64/kernel/stacktrace.o
-
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 06c335d..297cde9 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -1,5 +1,7 @@
 obj-y  += bootflag.o quirks.o i8237.o topology.o 
alternative.o
 
+obj-$(CONFIG_STACKTRACE)   += stacktrace.o
+
 obj-y  += cpu/
 obj-$(CONFIG_ACPI) += acpi/
 
diff --git a/arch/x86_64/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
similarity index 100%
rename from arch/x86_64/kernel/stacktrace.c
rename to arch/x86/kernel/stacktrace.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 0510887..7477cb1 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -10,7 +10,6 @@ obj-y := process.o signal.o entry.o traps.o irq.o \
setup64.o e820.o reboot.o \
pci-dma.o pci-nommu.o hpet.o tsc.o
 
-obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-$(CONFIG_X86_MCE)  += mce.o
 obj-$(CONFIG_X86_MCE_INTEL)+= mce_intel.o
 obj-$(CONFIG_X86_MCE_AMD)  += mce_amd.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 04/20] quirks.c switch

2007-03-14 Thread Steven Rostedt

Move the quirks.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index 4622355..c5c62af 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -7,7 +7,7 @@ extra-y := head.o init_task.o vmlinux.lds
 obj-y  := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
pci-dma.o i386_ksyms.o i387.o e820.o\
-   quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+   i8237.o topology.o alternative.o i8253.o tsc.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-y  += cpu/
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index fe2e4ea..26feab4 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -1,4 +1,4 @@
-obj-y  += bootflag.o
+obj-y  += bootflag.o quirks.o
 
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
diff --git a/arch/i386/kernel/quirks.c b/arch/x86/kernel/quirks.c
similarity index 100%
rename from arch/i386/kernel/quirks.c
rename to arch/x86/kernel/quirks.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 1ffc4ea..533d4bb 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -7,7 +7,7 @@ EXTRA_AFLAGS:= -traditional
 obj-y  := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_x86_64.o \
x8664_ksyms.o i387.o syscall.o vsyscall.o \
-   setup64.o e820.o reboot.o quirks.o i8237.o \
+   setup64.o e820.o reboot.o i8237.o \
pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
@@ -51,7 +51,6 @@ cpuid-$(subst m,y,$(CONFIG_X86_CPUID))  += 
../../i386/kernel/cpuid.o
 topology-y += ../../i386/kernel/topology.o
 microcode-$(subst m,y,$(CONFIG_MICROCODE))  += ../../i386/kernel/microcode.o
 intel_cacheinfo-y  += ../../i386/kernel/cpu/intel_cacheinfo.o
-quirks-y   += ../../i386/kernel/quirks.o
 i8237-y+= ../../i386/kernel/i8237.o
 msr-$(subst m,y,$(CONFIG_X86_MSR))  += ../../i386/kernel/msr.o
 alternative-y  += ../../i386/kernel/alternative.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 16/20] acpi files switched

2007-03-14 Thread Steven Rostedt

Moved the shared files that were in arch/i386/kernel/acpi to the common
area.

Note, there still exists files in both archs in acpi. Since there's code
there that is unique to the arch.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/acpi/Makefile b/arch/i386/kernel/acpi/Makefile
index 7f7be01..3de22c2 100644
--- a/arch/i386/kernel/acpi/Makefile
+++ b/arch/i386/kernel/acpi/Makefile
@@ -1,10 +1,5 @@
-obj-$(CONFIG_ACPI) += boot.o
 ifneq ($(CONFIG_PCI),)
 obj-$(CONFIG_X86_IO_APIC)  += earlyquirk.o
 endif
 obj-$(CONFIG_ACPI_SLEEP)   += sleep.o wakeup.o
 
-ifneq ($(CONFIG_ACPI_PROCESSOR),)
-obj-y  += cstate.o processor.o
-endif
-
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index c1a2b58..1167962 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -1,6 +1,7 @@
 obj-y  += bootflag.o quirks.o i8237.o topology.o 
alternative.o
 
 obj-y  += cpu/
+obj-$(CONFIG_ACPI) += acpi/
 
 obj-$(CONFIG_X86_MSR)  += msr.o
 obj-$(CONFIG_X86_CPUID)+= cpuid.o
diff --git a/arch/x86/kernel/acpi/Makefile b/arch/x86/kernel/acpi/Makefile
new file mode 100644
index 000..3aa3d16
--- /dev/null
+++ b/arch/x86/kernel/acpi/Makefile
@@ -0,0 +1,5 @@
+obj-y  += boot.o
+
+ifneq ($(CONFIG_ACPI_PROCESSOR),)
+obj-y  += processor.o cstate.o
+endif
diff --git a/arch/i386/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
similarity index 100%
rename from arch/i386/kernel/acpi/boot.c
rename to arch/x86/kernel/acpi/boot.c
diff --git a/arch/i386/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
similarity index 100%
rename from arch/i386/kernel/acpi/cstate.c
rename to arch/x86/kernel/acpi/cstate.c
diff --git a/arch/i386/kernel/acpi/processor.c 
b/arch/x86/kernel/acpi/processor.c
similarity index 100%
rename from arch/i386/kernel/acpi/processor.c
rename to arch/x86/kernel/acpi/processor.c
diff --git a/arch/x86_64/kernel/acpi/Makefile b/arch/x86_64/kernel/acpi/Makefile
index 080b996..eb4bc11 100644
--- a/arch/x86_64/kernel/acpi/Makefile
+++ b/arch/x86_64/kernel/acpi/Makefile
@@ -1,9 +1,2 @@
-obj-y  := boot.o
-boot-y := ../../../i386/kernel/acpi/boot.o
 obj-$(CONFIG_ACPI_SLEEP)   += sleep.o wakeup.o
 
-ifneq ($(CONFIG_ACPI_PROCESSOR),)
-obj-y  += processor.o
-processor-y:= ../../../i386/kernel/acpi/processor.o 
../../../i386/kernel/acpi/cstate.o
-endif
-

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 08/20] msr.c switch

2007-03-14 Thread Steven Rostedt

Move the msr.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index 44c7d89..5276349 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -14,7 +14,6 @@ obj-y += cpu/
 obj-y  += acpi/
 obj-$(CONFIG_X86_BIOS_REBOOT)  += reboot.o
 obj-$(CONFIG_MCA)  += mca.o
-obj-$(CONFIG_X86_MSR)  += msr.o
 obj-$(CONFIG_X86_CPUID)+= cpuid.o
 obj-$(CONFIG_MICROCODE)+= microcode.o
 obj-$(CONFIG_APM)  += apm.o
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index b63f832..4e5a88f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -1,5 +1,6 @@
 obj-y  += bootflag.o quirks.o i8237.o topology.o 
alternative.o
 
+obj-$(CONFIG_X86_MSR)  += msr.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
 # i386 defines CONFIG_X86_SMP when CONFIG_SMP and !CONFIG_X86_VOYAGER
diff --git a/arch/i386/kernel/msr.c b/arch/x86/kernel/msr.c
similarity index 100%
rename from arch/i386/kernel/msr.c
rename to arch/x86/kernel/msr.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index b12901c..248dbe8 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -16,7 +16,6 @@ obj-$(CONFIG_X86_MCE_INTEL)   += mce_intel.o
 obj-$(CONFIG_X86_MCE_AMD)  += mce_amd.o
 obj-$(CONFIG_MTRR) += ../../i386/kernel/cpu/mtrr/
 obj-$(CONFIG_ACPI) += acpi/
-obj-$(CONFIG_X86_MSR)  += msr.o
 obj-$(CONFIG_MICROCODE)+= microcode.o
 obj-$(CONFIG_X86_CPUID)+= cpuid.o
 obj-$(CONFIG_SMP)  += smp.o smpboot.o trampoline.o
@@ -49,5 +48,4 @@ therm_throt-y   += 
../../i386/kernel/cpu/mcheck/therm_throt.o
 cpuid-$(subst m,y,$(CONFIG_X86_CPUID))  += ../../i386/kernel/cpuid.o
 microcode-$(subst m,y,$(CONFIG_MICROCODE))  += ../../i386/kernel/microcode.o
 intel_cacheinfo-y  += ../../i386/kernel/cpu/intel_cacheinfo.o
-msr-$(subst m,y,$(CONFIG_X86_MSR))  += ../../i386/kernel/msr.o
 pcspeaker-y+= ../../i386/kernel/pcspeaker.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 10/20] microcode.c switch

2007-03-14 Thread Steven Rostedt

Move the microcode.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index 4437181..ac925bc 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -14,7 +14,6 @@ obj-y += cpu/
 obj-y  += acpi/
 obj-$(CONFIG_X86_BIOS_REBOOT)  += reboot.o
 obj-$(CONFIG_MCA)  += mca.o
-obj-$(CONFIG_MICROCODE)+= microcode.o
 obj-$(CONFIG_APM)  += apm.o
 obj-$(CONFIG_X86_SMP)  += smp.o smpboot.o
 obj-$(CONFIG_X86_TRAMPOLINE)   += trampoline.o
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 912421a..f1c6b2e 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -2,6 +2,7 @@ obj-y   += bootflag.o quirks.o i8237.o 
topology.o alternative.o
 
 obj-$(CONFIG_X86_MSR)  += msr.o
 obj-$(CONFIG_X86_CPUID)+= cpuid.o
+obj-$(CONFIG_MICROCODE)+= microcode.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
 # i386 defines CONFIG_X86_SMP when CONFIG_SMP and !CONFIG_X86_VOYAGER
diff --git a/arch/i386/kernel/microcode.c b/arch/x86/kernel/microcode.c
similarity index 100%
rename from arch/i386/kernel/microcode.c
rename to arch/x86/kernel/microcode.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index f5997f3..08795d8 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -16,7 +16,6 @@ obj-$(CONFIG_X86_MCE_INTEL)   += mce_intel.o
 obj-$(CONFIG_X86_MCE_AMD)  += mce_amd.o
 obj-$(CONFIG_MTRR) += ../../i386/kernel/cpu/mtrr/
 obj-$(CONFIG_ACPI) += acpi/
-obj-$(CONFIG_MICROCODE)+= microcode.o
 obj-$(CONFIG_SMP)  += smp.o smpboot.o trampoline.o
 obj-y  += apic.o  nmi.o
 obj-y  += io_apic.o mpparse.o \
@@ -44,6 +43,5 @@ obj-y += pcspeaker.o
 CFLAGS_vsyscall.o  := $(PROFILING) -g0
 
 therm_throt-y   += ../../i386/kernel/cpu/mcheck/therm_throt.o
-microcode-$(subst m,y,$(CONFIG_MICROCODE))  += ../../i386/kernel/microcode.o
 intel_cacheinfo-y  += ../../i386/kernel/cpu/intel_cacheinfo.o
 pcspeaker-y+= ../../i386/kernel/pcspeaker.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 03/20] bootflag.c switch

2007-03-14 Thread Steven Rostedt

Move the bootflag.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index c8fe439..4622355 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -6,7 +6,7 @@ extra-y := head.o init_task.o vmlinux.lds
 
 obj-y  := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
-   pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
+   pci-dma.o i386_ksyms.o i387.o e820.o\
quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index bd548e6..fe2e4ea 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -1,3 +1,4 @@
+obj-y  += bootflag.o
 
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
diff --git a/arch/i386/kernel/bootflag.c b/arch/x86/kernel/bootflag.c
similarity index 100%
rename from arch/i386/kernel/bootflag.c
rename to arch/x86/kernel/bootflag.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 54fe500..1ffc4ea 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -7,7 +7,7 @@ EXTRA_AFLAGS:= -traditional
 obj-y  := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_x86_64.o \
x8664_ksyms.o i387.o syscall.o vsyscall.o \
-   setup64.o bootflag.o e820.o reboot.o quirks.o i8237.o \
+   setup64.o e820.o reboot.o quirks.o i8237.o \
pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
@@ -47,7 +47,6 @@ obj-y += pcspeaker.o
 CFLAGS_vsyscall.o  := $(PROFILING) -g0
 
 therm_throt-y   += ../../i386/kernel/cpu/mcheck/therm_throt.o
-bootflag-y += ../../i386/kernel/bootflag.o
 cpuid-$(subst m,y,$(CONFIG_X86_CPUID))  += ../../i386/kernel/cpuid.o
 topology-y += ../../i386/kernel/topology.o
 microcode-$(subst m,y,$(CONFIG_MICROCODE))  += ../../i386/kernel/microcode.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH take3 07/20] alternative.c switch

2007-03-14 Thread Steven Rostedt

Move the alternative.c to the common area.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile
index 556da60..44c7d89 100644
--- a/arch/i386/kernel/Makefile
+++ b/arch/i386/kernel/Makefile
@@ -7,7 +7,7 @@ extra-y := head.o init_task.o vmlinux.lds
 obj-y  := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
pci-dma.o i386_ksyms.o i387.o e820.o\
-   alternative.o i8253.o tsc.o
+   i8253.o tsc.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-y  += cpu/
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index d70dbf3..b63f832 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -1,4 +1,4 @@
-obj-y  += bootflag.o quirks.o i8237.o topology.o
+obj-y  += bootflag.o quirks.o i8237.o topology.o 
alternative.o
 
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
diff --git a/arch/i386/kernel/alternative.c b/arch/x86/kernel/alternative.c
similarity index 100%
rename from arch/i386/kernel/alternative.c
rename to arch/x86/kernel/alternative.c
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index 3dc4c18..b12901c 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -8,7 +8,7 @@ obj-y   := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_x86_64.o \
x8664_ksyms.o i387.o syscall.o vsyscall.o \
setup64.o e820.o reboot.o \
-   pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o
+   pci-dma.o pci-nommu.o hpet.o tsc.o
 
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-$(CONFIG_X86_MCE)  += mce.o therm_throt.o
@@ -50,5 +50,4 @@ cpuid-$(subst m,y,$(CONFIG_X86_CPUID))  += 
../../i386/kernel/cpuid.o
 microcode-$(subst m,y,$(CONFIG_MICROCODE))  += ../../i386/kernel/microcode.o
 intel_cacheinfo-y  += ../../i386/kernel/cpu/intel_cacheinfo.o
 msr-$(subst m,y,$(CONFIG_X86_MSR))  += ../../i386/kernel/msr.o
-alternative-y  += ../../i386/kernel/alternative.o
 pcspeaker-y+= ../../i386/kernel/pcspeaker.o

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires

2007-03-14 Thread Ray Lee

Gene Heskett wrote:
> Here is an example
> [EMAIL PROTECTED] data]# dd if=00010.coyote._lib.1 bs=32k count=1
> AMANDA: FILE 20070314104344 coyote /lib  lev 1 comp .gz program /bin/tar
> To restore, position tape at start of file and run:
>  dd if= bs=32k skip=1 |  /bin/gzip -dc |  /bin/tar -f - ...
> 
> And the elipsis is an error if not removed.  Then one is supposed to be 
> able to redirect tars output with the usual >/tmp/test/ syntax
> 
> So:
> [EMAIL PROTECTED] data]# dd if=00010.coyote._lib.1 bs=32k 
> skip=1 |  /bin/gzip -dc |  /bin/tar -f - >/tmp/test/
> -bash: /tmp/test/: Is a directory
> 
> which is the return from any variation in how the redirect is done.
> 
> So what is it that am I doing wrong in the above command line?, so I can 
> add it to my helper scripts to be published eventually on zmanda.org.

One of us is confused, and it may very well be me, but...

the /bin/tar -f - >/tmp/test/ looks to me like it should fail exactly as
bash says it does. the output redirect (>) will only write out to a
file, not a directory. (So, /tmp/file should work, /tmp/file/ won't.)

Are you trying to redirect where the files get restored? That should be
done with a cd before doing the uncompress.

Or am I misunderstanding what you're telling me?

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 4/7] RSS accounting hooks over the code

2007-03-14 Thread Nick Piggin


Kirill Korotaev wrote:


The approaches I have seen that don't have a struct page pointer, do
intrusive things like try to put hooks everywhere throughout the kernel
where a userspace task can cause an allocation (and of course end up
missing many, so they aren't secure anyway)... and basically just
nasty stuff that will never get merged.



User beancounters patch has got through all these...
The approach where each charged object has a pointer to the owner container,
who has charged it - is the most easy/clean way to handle
all the problems with dynamic context change, races, etc.
and 1 pointer in page struct is just 0.1% overehad.


The pointer in struct page approach is a decent one, which I have
liked since this whole container effort came up. IIRC Linus and Alan
also thought that was a reasonable way to go.

I haven't reviewed the rest of the beancounters patch since looking
at it quite a few months ago... I probably don't have time for a
good review at the moment, but I should eventually.


Struct page overhead really isn't bad. Sure, nobody who doesn't use
containers will want to turn it on, but unless you're using a big PAE
system you're actually unlikely to notice.



big PAE doesn't make any difference IMHO
(until struct pages are not created for non-present physical memory areas)


The issue is just that struct pages use low memory, which is a really
scarce commodity on PAE. One more pointer in the struct page means
64MB less lowmem.

But PAE is crap anyway. We've already made enough concessions in the
kernel to support it. I agree: struct page overhead is not really
significant. The benefits of simplicity seems to outweigh the downside.


But again, I'll say the node-container approach of course does avoid
this nicely (because we already can get the node from the page). So
definitely that approach needs to be discredited before going with this
one.



But it lacks some other features:
1. page can't be shared easily with another container


I think they could be shared. You allocate _new_ pages from your own
node, but you can definitely use existing pages allocated to other
nodes.


2. shared page can't be accounted honestly to containers
   as fraction=PAGE_SIZE/containers-using-it


Yes there would be some accounting differences. I think it is hard
to say exactly what containers are "using" what page anyway, though.
What do you say about unmapped pages? Kernel allocations? etc.


3. It doesn't help accounting of kernel memory structures.
   e.g. in OpenVZ we use exactly the same pointer on the page
   to track which container owns it, e.g. pages used for page
   tables are accounted this way.


?
page_to_nid(page) ~= container that owns it.


4. I guess container destroy requires destroy of memory zone,
   which means write out of dirty data. Which doesn't sound
   good for me as well.


I haven't looked at any implementation, but I think it is fine for
the zone to stay around.


5. memory reclamation in case of global memory shortage
   becomes a tricky/unfair task.


I don't understand why? You can much more easily target a specific
container for reclaim with this approach than with others (because
you have an lru per container).


6. You cannot overcommit. AFAIU, the memory should be granted
   to node exclusive usage and cannot be used by by another containers,
   even if it is unused. This is not an option for us.


I'm not sure about that. If you have a larger number of nodes, then
you could assign more free nodes to a container on demand. But I
think there would definitely be less flexibility with nodes...

I don't know... and seeing as I don't really know where the google
guys are going with it, I won't misrepresent their work any further ;)



Everyone seems to have a plan ;) I don't read the containers list...
does everyone still have *different* plans, or is any sort of consensus
being reached?



hope we'll have it soon :)


Good luck ;)

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps

2007-03-14 Thread Vivek Goyal

On Thu, Mar 15, 2007 at 10:46:38AM +0900, Horms wrote:
> On Wed, Mar 14, 2007 at 05:00:09PM +, Ian Campbell wrote:
> > The specific case I am encountering is kdump under Xen with a 64 bit
> > hypervisor and 32 bit kernel/userspace. The dump created is a 64 bit due
> > to the hypervisor but the dump kernel is 32 bit to match the domain 0
> > kernel.
> > 
> > It's possibly less likely to be useful in a purely native scenario but I
> > see no reason to disallow it.
> 
> For native Linux, would this cover the case where the pre-crash kernel
> is 64bit and the crashdump (post-crash) kernel is 32bit?
> 

I think so. Though I have never tried this.

> > Signed-off-by: Ian Campbell <[EMAIL PROTECTED]>
> > 
> > --- pristine-linux-2.6.18/include/asm-i386/elf.h2006-09-20 
> > 04:42:06.0 +0100
> > +++ linux-2.6.18-xen/include/asm-i386/elf.h 2007-03-14 16:42:30.0 
> > +
> > @@ -36,7 +36,7 @@
> >   * This is used to ensure we don't load something for the wrong 
> > architecture.
> >   */
> >  #define elf_check_arch(x) \
> > -   (((x)->e_machine == EM_386) || ((x)->e_machine == EM_486))
> > +   (((x)->e_machine == EM_386) || ((x)->e_machine == EM_486) || 
> > ((x)->e_machine == EM_X86_64))

But I think changing this macro might run into issues. It is being used at
few places in kernel, for example while loading module. This will essentially
mean that we allow loading 64bit x86_64 modules on 32bit i386 systems?

Similarly, load_elf_interp() is using it, again will we allow loading a 
interp written for X86_64 on a 32bit i386 machine?

Should we create a separate macro something like elf_check_allowed_arch(),
to take care of such corner cases?

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Kprobes: Make kprobe.symbol_name const

2007-03-14 Thread Ananth N Mavinakayanahalli

From: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]>

Kprobes doesn't scribble the kprobe.symbol_name field. Its only set by
the module when registering the probe. Modules that exercise good
hygiene using the "const" qualifier will see warnings...

warning: assignment discards qualifiers from pointer target type

Make struct kprobe.symbol_name const char *

Signed-off-by: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]>
Signed-off-by: Jim Keniston <[EMAIL PROTECTED]>

---
 include/linux/kprobes.h |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.21-rc3/include/linux/kprobes.h
===
--- linux-2.6.21-rc3.orig/include/linux/kprobes.h
+++ linux-2.6.21-rc3/include/linux/kprobes.h
@@ -78,7 +78,7 @@ struct kprobe {
kprobe_opcode_t *addr;
 
/* Allow user to indicate symbol name of the probe point */
-   char *symbol_name;
+   const char *symbol_name;
 
/* Offset into the symbol */
unsigned int offset;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/5] fs: introduce new aops and infrastructure

2007-03-14 Thread Nick Piggin

On Wed, Mar 14, 2007 at 09:13:29PM -0700, Mark Fasheh wrote:
> Hi Nick,
> 
> On Wed, Mar 14, 2007 at 02:38:22PM +0100, Nick Piggin wrote:
> > Introduce write_begin, write_end, and perform_write aops.
> > 
> > These are intended to replace prepare_write and commit_write with more
> > flexible alternatives that are also able to avoid the buffered write
> > deadlock problems efficiently (which prepare_write is unable to do).
> 
> > Index: linux-2.6/include/linux/fs.h
> > ===
> > --- linux-2.6.orig/include/linux/fs.h
> > +++ linux-2.6/include/linux/fs.h
> > @@ -449,6 +449,17 @@ struct address_space_operations {
> >  */
> > int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
> > int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
> > +
> > +   int (*write_begin)(struct file *, struct address_space *mapping,
> > +   loff_t pos, unsigned len, int intr,
> > +   struct page **pagep, void **fsdata);
> > +   int (*write_end)(struct file *, struct address_space *mapping,
> > +   loff_t pos, unsigned len, unsigned copied,
> > +   struct page *page, void *fsdata);
> 
> Are we going to get rid of the file and intr arguments btw? I'm not sure
> intr is useful, and mapping is probably enough to get whatever we inside
> ->write_begin / ->write_end.

Yeah, I was going to, but I had this version ready to go so decided
to leave them in at the last minute. We can definitely take them out
if people agree.

However a side note about intr -- I wonder if it might be wise to
include a flags argument, in case we might want to add something like
that later? (definitely if we do keep intr, then it should be done as
a flag rather than its own int).


> Also, I noticed that you didn't export block_write_begin(),
> simple_write_begin(), block_write_end() and simple_write_end() - I think we
> want those for client modules.

Yep, simple oversight on my part.


> Attached is a quick patch to hook up the existing ocfs2 write code. This has
> been compile tested only for now - one of my test machines isn't
> cooperating, so a runtime test will have to wait until tommorrow.
> 
> One interesting side effect is that we no longer pass AOP_TRUNCATE_PAGE up a
> level. This gives callers less to deal with. And it means that ocfs2 doesn't
> have to use the ocfs2_*_lock_with_page() cluster lock variants in
> ocfs2_block_write_begin() because it can order cluster locks outside of the
> page lock there.

OK that's very cool. I was hoping that would be the case. If GFS2 can
avoid that too, then we might be able to get rid of AOP_TRUNCATE_PAGE
handling from the legacy prepare/commit_write paths, which will make
them simpler.

> My ocfs2 write rework will be a more serious user of these stuff, including
> the fsdata backpointer. That code will also eliminate the entire
> ocfs2_*_lock_with_page() cluster locking workarounds for write (they'll have
> to remain for ->readpage()). I'm beginning work on cleaning those ocfs2
> patches up and getting them plugged into this stuff. I hope to post them in
> the next day or two.

OK, well I'll add this to my queue for now, and post the full patchset
after incorporating feedback I've had so far, and doing more testing,
so people can actually apply them and boot kernels.

Thanks,
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 5/5] eventfd+KAIO - KAIO eventfd support (example/maybe-broken) ...

2007-03-14 Thread Davide Libenzi

This is another example about how to add eventfd support to the current
KAIO code.
The KAIO code simply signals the eventfd fd when events are ready, and
this triggers a POLLIN in the fd.
I made a quick test program to verify the patch, and it runs fine here:

http://www.xmailserver.org/eventfd-aio-test.c

The test program uses poll(2), but it'd, of course, work with epoll too.
This can allow to schedule both block I/O and other poll-able devices
requests, and wait for results using select/poll/epoll.



Signed-off-by: Davide Libenzi 



- Davide



Index: linux-2.6.20.ep2/fs/aio.c
===
--- linux-2.6.20.ep2.orig/fs/aio.c  2007-03-14 20:51:32.0 -0700
+++ linux-2.6.20.ep2/fs/aio.c   2007-03-14 20:54:37.0 -0700
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -422,6 +423,7 @@
req->private = NULL;
req->ki_iovec = NULL;
INIT_LIST_HEAD(>ki_run_list);
+   req->ki_eventfd = ERR_PTR(-EINVAL);
 
/* Check if the completion queue has enough free space to
 * accept an event from this io.
@@ -463,6 +465,8 @@
 {
assert_spin_locked(>ctx_lock);
 
+   if (!IS_ERR(req->ki_eventfd))
+   fput(req->ki_eventfd);
if (req->ki_dtor)
req->ki_dtor(req);
if (req->ki_iovec != >ki_inline_vec)
@@ -947,6 +951,13 @@
return 1;
}
 
+   /*
+* Check if the user asked us to deliver the result through an
+* eventfd.
+*/
+   if (unlikely(!IS_ERR(iocb->ki_eventfd)))
+   eventfd_signal(iocb->ki_eventfd, 1);
+
info = >ring_info;
 
/* add a completion event to the ring buffer.
@@ -1556,6 +1567,18 @@
fput(file);
return -EAGAIN;
}
+   if (iocb->aio_resfd != 0) {
+   /*
+* If the aio_resfd field of the iocb is not zero, get an
+* instance of the file* now. This will be the place to deliver
+* AIO results to.
+*/
+   req->ki_eventfd = eventfd_fget((int) iocb->aio_resfd);
+   if (IS_ERR(req->ki_eventfd)) {
+   ret = PTR_ERR(req->ki_eventfd);
+   goto out_put_req;
+   }
+   }
 
req->ki_filp = file;
ret = put_user(req->ki_key, _iocb->aio_key);
Index: linux-2.6.20.ep2/include/linux/aio.h
===
--- linux-2.6.20.ep2.orig/include/linux/aio.h   2007-03-14 20:51:32.0 
-0700
+++ linux-2.6.20.ep2/include/linux/aio.h2007-03-14 20:54:37.0 
-0700
@@ -119,6 +119,12 @@
 
struct list_headki_list;/* the aio core uses this
 * for cancellation */
+
+   /*
+* If the aio_resfd field of the userspace iocb is not zero,
+* this is the underlying file* to deliver event to.
+*/
+   struct file *ki_eventfd;
 };
 
 #define is_sync_kiocb(iocb)((iocb)->ki_key == KIOCB_SYNC_KEY)
Index: linux-2.6.20.ep2/include/linux/aio_abi.h
===
--- linux-2.6.20.ep2.orig/include/linux/aio_abi.h   2007-03-14 
20:51:32.0 -0700
+++ linux-2.6.20.ep2/include/linux/aio_abi.h2007-03-14 20:56:00.0 
-0700
@@ -84,7 +84,11 @@
 
/* extra parameters */
__u64   aio_reserved2;  /* TODO: use this for a (struct sigevent *) */
-   __u64   aio_reserved3;
+   __u32   aio_reserved3;
+   /*
+* If different from 0, this is an eventfd to deliver AIO results to
+*/
+   __u32   aio_resfd;
 }; /* 64 bytes */
 
 #undef IFBIG

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/5] eventfd+KAIO - anonymous inode source ...

2007-03-14 Thread Davide Libenzi

This patch add an anonymous inode source, to be used for files that need 
and inode only in order to create a file*. We do not care of having an 
inode for each file, and we do not even care of having different names in 
the associated dentries (dentry names will be same for classes of file*).
This allow code reuse, and will be used by epoll, signalfd and timerfd 
(and whatever else there'll be).



Signed-off-by: Davide Libenzi 



- Davide



Index: linux-2.6.20.ep2/fs/anon_inodes.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.20.ep2/fs/anon_inodes.c   2007-03-10 15:57:47.0 -0800
@@ -0,0 +1,203 @@
+/*
+ *  fs/anon_inodes.c
+ *
+ *  Copyright (C) 2007  Davide Libenzi 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+
+
+static int ainofs_delete_dentry(struct dentry *dentry);
+static struct inode *aino_getinode(void);
+static struct inode *aino_mkinode(void);
+static int ainofs_get_sb(struct file_system_type *fs_type, int flags,
+const char *dev_name, void *data, struct vfsmount 
*mnt);
+
+
+
+static struct vfsmount *aino_mnt __read_mostly;
+static struct inode *aino_inode;
+static struct file_operations aino_fops = { };
+static struct file_system_type aino_fs_type = {
+   .name   = "ainofs",
+   .get_sb = ainofs_get_sb,
+   .kill_sb= kill_anon_super,
+};
+static struct dentry_operations ainofs_dentry_operations = {
+   .d_delete   = ainofs_delete_dentry,
+};
+
+
+
+int aino_getfd(int *pfd, struct inode **pinode, struct file **pfile,
+  char const *name, const struct file_operations *fops, void *priv)
+{
+   struct qstr this;
+   struct dentry *dentry;
+   struct inode *inode;
+   struct file *file;
+   int error, fd;
+
+   error = -ENFILE;
+   file = get_empty_filp();
+   if (!file)
+   goto eexit_1;
+
+   inode = aino_getinode();
+   if (IS_ERR(inode)) {
+   error = PTR_ERR(inode);
+   goto eexit_2;
+   }
+
+   error = get_unused_fd();
+   if (error < 0)
+   goto eexit_3;
+   fd = error;
+
+   /*
+* Link the inode to a directory entry by creating a unique name
+* using the inode sequence number.
+*/
+   error = -ENOMEM;
+   this.name = name;
+   this.len = strlen(name);
+   this.hash = 0;
+   dentry = d_alloc(aino_mnt->mnt_sb->s_root, );
+   if (!dentry)
+   goto eexit_4;
+   dentry->d_op = _dentry_operations;
+   /* Do not publish this dentry inside the global dentry hash table */
+   dentry->d_flags &= ~DCACHE_UNHASHED;
+   d_instantiate(dentry, inode);
+
+   file->f_path.mnt = mntget(aino_mnt);
+   file->f_path.dentry = dentry;
+   file->f_mapping = inode->i_mapping;
+
+   file->f_pos = 0;
+   file->f_flags = O_RDONLY;
+   file->f_op = fops;
+   file->f_mode = FMODE_READ;
+   file->f_version = 0;
+   file->private_data = priv;
+
+   fd_install(fd, file);
+
+   *pfd = fd;
+   *pinode = inode;
+   *pfile = file;
+   return 0;
+
+eexit_4:
+   put_unused_fd(fd);
+eexit_3:
+   iput(inode);
+eexit_2:
+   put_filp(file);
+eexit_1:
+   return error;
+}
+
+
+static int ainofs_delete_dentry(struct dentry *dentry)
+{
+   /*
+* We faked vfs to believe the dentry was hashed when we created it.
+* Now we restore the flag so that dput() will work correctly.
+*/
+   dentry->d_flags |= DCACHE_UNHASHED;
+   return 1;
+}
+
+
+static struct inode *aino_getinode(void)
+{
+   return igrab(aino_inode);
+}
+
+
+/*
+ * A single inode exist for all aino files. On the contrary of pipes,
+ * aino inodes has no per-instance data associated, so we can avoid
+ * the allocation of multiple of them.
+ */
+static struct inode *aino_mkinode(void)
+{
+   int error = -ENOMEM;
+   struct inode *inode = new_inode(aino_mnt->mnt_sb);
+
+   if (!inode)
+   goto eexit_1;
+
+   inode->i_fop = _fops;
+
+   /*
+* Mark the inode dirty from the very beginning,
+* that way it will never be moved to the dirty
+* list because mark_inode_dirty() will think
+* that it already _is_ on the dirty list.
+*/
+   inode->i_state = I_DIRTY;
+   inode->i_mode = S_IRUSR | S_IWUSR;
+   inode->i_uid = current->fsuid;
+   inode->i_gid = current->fsgid;
+   inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+   return inode;
+
+eexit_1:
+   return ERR_PTR(error);
+}
+
+
+static int ainofs_get_sb(struct file_system_type *fs_type, int flags,
+const char *dev_name, void *data, struct vfsmount *mnt)
+{
+   return get_sb_pseudo(fs_type, "aino:", NULL, AINOFS_MAGIC, mnt);
+}
+
+

[patch 3/5] eventfd+KAIO - eventfd wire up i386 arch ...

2007-03-14 Thread Davide Libenzi

This patch wire the eventfd system call to the i386 architecture.



Signed-off-by: Davide Libenzi 


- Davide


Index: linux-2.6.20.ep2/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.20.ep2.orig/arch/i386/kernel/syscall_table.S  2007-03-14 
20:51:36.0 -0700
+++ linux-2.6.20.ep2/arch/i386/kernel/syscall_table.S   2007-03-14 
20:54:34.0 -0700
@@ -321,3 +321,4 @@
.long sys_epoll_pwait
.long sys_signalfd  /* 320 */
.long sys_timerfd
+   .long sys_eventfd
Index: linux-2.6.20.ep2/include/asm-i386/unistd.h
===
--- linux-2.6.20.ep2.orig/include/asm-i386/unistd.h 2007-03-14 
20:51:36.0 -0700
+++ linux-2.6.20.ep2/include/asm-i386/unistd.h  2007-03-14 20:54:34.0 
-0700
@@ -327,10 +327,11 @@
 #define __NR_epoll_pwait   319
 #define __NR_signalfd  320
 #define __NR_timerfd   321
+#define __NR_eventfd   322
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 322
+#define NR_syscalls 323
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 2/5] eventfd+KAIO - eventfd core ...

2007-03-14 Thread Davide Libenzi

This is a very simple and light file descriptor, that can be used as
event wait/dispatch by userspace (both wait and dispatch) and by the
kernel (dispatch only). When used in the kernel, it can offer an fd-bridge
to enable functionalities like KAIO or syslets/threadlets to signal to
an fd the completion of certain operations.
The API is:

int eventfd(unsigned int count);

The eventfd API accepts an initial "count" parameter, and returns an
eventfd fd. It supports poll(2) (POLLIN), read(2) and write(2).
The read(2) function reads the __u64 counter value, and reset the internal
value to zero. The write(2) call writes an __u64 count value, and adds it
to the current counter. The eventfd fd supports O_NONBLOCK also.
On the kernel side, we have:

struct file *eventfd_fget(int fd);
int eventfd_signal(struct file *file, unsigned int n);

The eventfd_fget() should be called to get a struct file* from an eventfd
fd (this is an fget() + check of f_op being an eventfd fops pointer).
The kernel can then call eventfd_signal() every time it wants to post
an event to userspace. The eventfd_signal() function can be called from any
context.



Signed-off-by: Davide Libenzi 



- Davide



Index: linux-2.6.20.ep2/fs/Makefile
===
--- linux-2.6.20.ep2.orig/fs/Makefile   2007-03-12 11:27:58.0 -0700
+++ linux-2.6.20.ep2/fs/Makefile2007-03-14 17:31:35.0 -0700
@@ -11,7 +11,7 @@
attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
seq_file.o xattr.o libfs.o fs-writeback.o \
pnode.o drop_caches.o splice.o sync.o utimes.o \
-   stack.o anon_inodes.o signalfd.o timerfd.o
+   stack.o anon_inodes.o signalfd.o timerfd.o eventfd.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=   buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
Index: linux-2.6.20.ep2/include/linux/syscalls.h
===
--- linux-2.6.20.ep2.orig/include/linux/syscalls.h  2007-03-13 
16:40:46.0 -0700
+++ linux-2.6.20.ep2/include/linux/syscalls.h   2007-03-14 19:31:56.0 
-0700
@@ -605,6 +605,7 @@
 asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t 
sizemask);
 asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
const struct itimerspec __user *utmr);
+asmlinkage long sys_eventfd(unsigned int count);
 
 int kernel_execve(const char *filename, char *const argv[], char *const 
envp[]);
 
Index: linux-2.6.20.ep2/fs/eventfd.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.20.ep2/fs/eventfd.c   2007-03-14 20:42:33.0 -0700
@@ -0,0 +1,259 @@
+/*
+ *  fs/eventfd.c
+ *
+ *  Copyright (C) 2007  Davide Libenzi 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+
+
+struct eventfd_ctx {
+   spinlock_t lock;
+   wait_queue_head_t wqh;
+   __u64 count;
+};
+
+
+static void eventfd_cleanup(struct eventfd_ctx *ctx);
+static int eventfd_close(struct inode *inode, struct file *file);
+static unsigned int eventfd_poll(struct file *file, poll_table *wait);
+static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count,
+   loff_t *ppos);
+static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t 
count,
+loff_t *ppos);
+
+
+
+static const struct file_operations eventfd_fops = {
+   .release= eventfd_close,
+   .poll   = eventfd_poll,
+   .read   = eventfd_read,
+   .write  = eventfd_write,
+};
+static struct kmem_cache *eventfd_ctx_cachep;
+
+
+
+
+struct file *eventfd_fget(int fd)
+{
+   struct file *file;
+
+   file = fget(fd);
+   if (!file)
+   return ERR_PTR(-EBADF);
+   if (file->f_op != _fops) {
+   fput(file);
+   return ERR_PTR(-EINVAL);
+   }
+
+   return file;
+}
+
+
+int eventfd_signal(struct file *file, unsigned int n)
+{
+   struct eventfd_ctx *ctx = file->private_data;
+   int res = 0;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   if (ULLONG_MAX - ctx->count <= n)
+   res = -EINVAL;
+   else
+   ctx->count += n;
+   if (waitqueue_active(>wqh))
+   wake_up_locked(>wqh);
+   spin_unlock_irqrestore(>lock, flags);
+
+   return res;
+}
+
+
+asmlinkage long sys_eventfd(unsigned int count)
+{
+   int error, fd;
+   struct eventfd_ctx *ctx;
+   struct file *file;
+   struct inode *inode;
+
+   ctx = kmem_cache_alloc(eventfd_ctx_cachep, GFP_KERNEL);
+   if (!ctx)
+   return -ENOMEM;
+
+   init_waitqueue_head(>wqh);
+

[patch 4/5] eventfd+KAIO - eventfd wire up x86_64 arch ...

2007-03-14 Thread Davide Libenzi

This patch wire the eventfd system call to the x86_64 architecture.



Signed-off-by: Davide Libenzi 


- Davide



Index: linux-2.6.20.ep2/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.20.ep2.orig/arch/x86_64/ia32/ia32entry.S  2007-03-14 
20:51:34.0 -0700
+++ linux-2.6.20.ep2/arch/x86_64/ia32/ia32entry.S   2007-03-14 
20:54:36.0 -0700
@@ -721,4 +721,5 @@
.quad sys_epoll_pwait
.quad sys_signalfd  /* 320 */
.quad sys_timerfd
+   .quad sys_eventfd
 ia32_syscall_end:
Index: linux-2.6.20.ep2/include/asm-x86_64/unistd.h
===
--- linux-2.6.20.ep2.orig/include/asm-x86_64/unistd.h   2007-03-14 
20:51:34.0 -0700
+++ linux-2.6.20.ep2/include/asm-x86_64/unistd.h2007-03-14 
20:54:36.0 -0700
@@ -623,8 +623,10 @@
 __SYSCALL(__NR_signalfd, sys_signalfd)
 #define __NR_timerfd   281
 __SYSCALL(__NR_timerfd, sys_timerfd)
+#define __NR_eventfd   282
+__SYSCALL(__NR_eventfd, sys_eventfd)
 
-#define __NR_syscall_max __NR_timerfd
+#define __NR_syscall_max __NR_eventfd
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/5] fs: introduce new aops and infrastructure

2007-03-14 Thread Mark Fasheh

Hi Nick,

On Wed, Mar 14, 2007 at 02:38:22PM +0100, Nick Piggin wrote:
> Introduce write_begin, write_end, and perform_write aops.
> 
> These are intended to replace prepare_write and commit_write with more
> flexible alternatives that are also able to avoid the buffered write
> deadlock problems efficiently (which prepare_write is unable to do).

> Index: linux-2.6/include/linux/fs.h
> ===
> --- linux-2.6.orig/include/linux/fs.h
> +++ linux-2.6/include/linux/fs.h
> @@ -449,6 +449,17 @@ struct address_space_operations {
>*/
>   int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
>   int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
> +
> + int (*write_begin)(struct file *, struct address_space *mapping,
> + loff_t pos, unsigned len, int intr,
> + struct page **pagep, void **fsdata);
> + int (*write_end)(struct file *, struct address_space *mapping,
> + loff_t pos, unsigned len, unsigned copied,
> + struct page *page, void *fsdata);

Are we going to get rid of the file and intr arguments btw? I'm not sure
intr is useful, and mapping is probably enough to get whatever we inside
->write_begin / ->write_end.

Also, I noticed that you didn't export block_write_begin(),
simple_write_begin(), block_write_end() and simple_write_end() - I think we
want those for client modules.


Attached is a quick patch to hook up the existing ocfs2 write code. This has
been compile tested only for now - one of my test machines isn't
cooperating, so a runtime test will have to wait until tommorrow.

One interesting side effect is that we no longer pass AOP_TRUNCATE_PAGE up a
level. This gives callers less to deal with. And it means that ocfs2 doesn't
have to use the ocfs2_*_lock_with_page() cluster lock variants in
ocfs2_block_write_begin() because it can order cluster locks outside of the
page lock there.

My ocfs2 write rework will be a more serious user of these stuff, including
the fsdata backpointer. That code will also eliminate the entire
ocfs2_*_lock_with_page() cluster locking workarounds for write (they'll have
to remain for ->readpage()). I'm beginning work on cleaning those ocfs2
patches up and getting them plugged into this stuff. I hope to post them in
the next day or two.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]

ocfs2: Convert to new aops

Turn ocfs2_prepare_write() and ocfs2_commit_write() into ocfs2_write_begin()
and ocfs2_write_end().

Signed-off-by: Mark Fasheh <[EMAIL PROTECTED]>

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 93628b0..e7bcbbd 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -293,29 +293,30 @@ int ocfs2_prepare_write_nolock(struct in
 }
 
 /*
- * ocfs2_prepare_write() can be an outer-most ocfs2 call when it is called
- * from loopback.  It must be able to perform its own locking around
- * ocfs2_get_block().
+ * ocfs2_write_begin() can be an outer-most ocfs2 call when it is
+ * called from elsewhere in the kernel. It must be able to perform its
+ * own locking around ocfs2_get_block().
  */
-static int ocfs2_prepare_write(struct file *file, struct page *page,
-  unsigned from, unsigned to)
+static int ocfs2_write_begin(struct file *file, struct address_space *mapping,
+loff_t pos, unsigned len, int intr,
+struct page **pagep, void **fsdata)
 {
-   struct inode *inode = page->mapping->host;
+   struct inode *inode = mapping->host;
int ret;
 
-   mlog_entry("(0x%p, 0x%p, %u, %u)\n", file, page, from, to);
-
-   ret = ocfs2_meta_lock_with_page(inode, NULL, 0, page);
+   ret = ocfs2_meta_lock(inode, NULL, 0);
if (ret != 0) {
mlog_errno(ret);
goto out;
}
 
-   ret = ocfs2_prepare_write_nolock(inode, page, from, to);
+   down_read(_I(inode)->ip_alloc_sem);
+   ret = block_write_begin(file, mapping, pos, len, intr, pagep, fsdata,
+   ocfs2_get_block);
+   up_read(_I(inode)->ip_alloc_sem);
 
ocfs2_meta_unlock(inode, 0);
 out:
-   mlog_exit(ret);
return ret;
 }
 
@@ -388,16 +389,21 @@ out:
return handle;
 }
 
-static int ocfs2_commit_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+static int ocfs2_write_end(struct file *file, struct address_space *mapping,
+  loff_t pos, unsigned len, unsigned copied,
+  struct page *page, void *fsdata)
 {
int ret;
+   unsigned from, to;
struct buffer_head *di_bh = NULL;
struct inode *inode = page->mapping->host;
handle_t *handle = NULL;
struct ocfs2_dinode *di;
 
-   mlog_entry("(0x%p, 0x%p, %u,

Re: [stable] [PATCH] Fix COMPAT_VDSO regression bug

2007-03-14 Thread Greg KH

On Thu, Mar 15, 2007 at 12:38:40AM +0100, Leroy van Logchem wrote:
> 
> Revert "[PATCH] Fix CONFIG_COMPAT_VDSO"
> This reverts commit a1f3bb9ae4497a2ed3eac773fd7798ac33a0371f.
> 
> Several systems couldnt boot using CONFIG_HIGHMEM64G=y as
> reported in bug #8040. Reverting the above patch solved the problem.

What stable version did you revert this in that solved your problem?

thanks,

greg k-h

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/5] fs: introduce new aops and infrastructure

2007-03-14 Thread Nick Piggin

On Wed, Mar 14, 2007 at 10:46:25PM +0100, Mariusz Kozlowski wrote:
> Hello, 
> 
>   I guess no need to define 'ret' twice here.

[...]

Hi Mariusz,

Thanks, I'll clean that up.

Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/5] fs: introduce new aops and infrastructure

2007-03-14 Thread Nick Piggin

On Thu, Mar 15, 2007 at 12:28:04AM +0300, Dmitriy Monakhov wrote:
> Nick Piggin <[EMAIL PROTECTED]> writes:
> 
> > +
> > +int pagecache_write_end(struct file *file, struct address_space *mapping,
> > +   loff_t pos, unsigned len, unsigned copied,
> > +   struct page *page, void *fsdata)
> > +{
> > +   const struct address_space_operations *aops = mapping->a_ops;
> > +   int ret;
> > +
> > +   if (aops->write_begin)
> > +   ret = aops->write_end(file, mapping, pos, len, copied, page, 
> > fsdata);
> > +   else {
> > +   int ret;
> > +   unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
> > +   struct inode *inode = mapping->host;
> > +
> > +   flush_dcache_page(page);
> > +   ret = aops->commit_write(file, page, offset, offset+len);
> > +   if (ret < 0) {
> > +   unlock_page(page);
> > +   page_cache_release(page);
> > +   if (pos + len > inode->i_size)
> > +   vmtruncate(inode, inode->i_size);
> > +   } else
> > +   ret = copied;
> What about AOP_TRUNCATED_PAGE?  Off corse we can't just "goto retry" here :) ,
> but we may return it to caller and let's caller handle it.  

Yeah AOP_TRUNCATED_PAGE... I'm _hoping_ that OCFS2 and GFS2 will be able
to avoid that using write_begin/write_end, so the caller will not have to
know anything about it.

I don't know that commit_write can even return AOP_TRUNCATED_PAGE... we
should have gathered all our locks in prepare_write.


> > +   }
> > +
> > +   return copied;
> if ->commit_write return non negative value we return with sill locked page  
> look above at [1] 
> may be it will be unlocked by caller? I guess no it was just forgoten.

Yeah, thanks. I think I converted all my filesystems to use write_begin /
write_end, so I probably didn't test this path :P. I do plan to go through
and try to individually test error cases and stress test it over the next
couple of days.


> > +void page_zero_new_buffers(struct page *page, unsigned from, unsigned to)
> > +{
> > +   unsigned int block_start, block_end;
> > +   struct buffer_head *head, *bh;
> > +
> > +   BUG_ON(!PageLocked(page));
> > +   if (!page_has_buffers(page))
> > +   return;__block_prepare_write 
> > +
> > +   bh = head = page_buffers(page);
> > block_start = 0;
> > do {
> > -   block_end = block_start+blocksize;
> > -   if (block_end <= from)
> > -   goto next_bh;
> > -   if (block_start >= to)
> > -   break;
> > +   block_end = block_start + bh->b_size;
> > +
> > if (buffer_new(bh)) {
> > -   void *kaddr;
> > +   if (block_end > from && block_start < to) {
> > +   if (!PageUptodate(page)) {
> > +   unsigned start, end;
> > +   void *kaddr;
> > +
> > +   start = max(from, block_start);
> > +   end = min(to, block_end);
> > +
> > +   kaddr = kmap_atomic(page, KM_USER0);
> > +   memset(kaddr+start, 0, block_end-end);
> <<< At least this result in information leak in case of (stat == from)
> just imagine fs with blocksize == 1k conains file with i_size == 4096 and 
> fist two blocks not mapped (hole), now invoke write op from 1023 to 2048.
> For example we succeed in allocating first block, but faile while allocating 
> second
> , then we call page_zero_new_buffers(...from == 1023, to == 2048)
>   and then zerro only one last byte for first block, and set is uptodate
> After this we just do read( from == 0, to == 1023) and steal old block 
> content.

When we first invoke the write op, it should see were doing a partial
write into the first buffer and bring it uptodate first. I don't see the
problem, but again I do need to go through and exercise various cases
like this.


> > @@ -222,67 +221,47 @@ static int do_lo_send_aops(struct loop_d
> > len = bvec->bv_len;
> > while (len > 0) {
> > sector_t IV;
> > -   unsigned size;
> > +   unsigned size, copied;
> > int transfer_result;
> > +   struct page *page;
> > +   void *fsdata;
> >  
> > IV = ((sector_t)index << (PAGE_CACHE_SHIFT - 9))+(offset >> 9);
> > size = PAGE_CACHE_SIZE - offset;
> > if (size > len)
> > size = len;
> > -   page = grab_cache_page(mapping, index);
> > -   if (unlikely(!page))
> > +
> > +   ret = pagecache_write_begin(file, mapping, pos, size, 1,
> > +   , );
> > +   if (ret)
> > goto fail;
> > -   ret = aops->prepare_write(file, page, offset,
> > - offset +

Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires

2007-03-14 Thread Gene Heskett

On Wednesday 14 March 2007, Ray Lee wrote:
>On 3/13/07, Gene Heskett <[EMAIL PROTECTED]> wrote:
>> On Tuesday 13 March 2007, Gene Heskett wrote:
>> >On Tuesday 13 March 2007, Gene Heskett wrote:
>> >>Greetings;
>> >>Someone suggested a fresh thread for this.
>> >>
>> >>I now have my scripts more or less under control, and I can report
>> >> that kernel-2.6.20.1 with no other patches does not exhibit the
>> >> undesirable behaviour where tar thinks its all new, even when told
>> >> to do a level 2 on a directory tree that hasn't been touched in
>> >> months to update anything.
>> >>
>> >>Next up, 2.6.20.2, plain and with the latest RDSL-0.30 patch.
>> >
>> >And amanda/tar worked normally for 2.6.20.2 plain.
>> >
>> >Next up, 2.6.21-rc1 if it will build here.
>>
>> It built, it booted, and its busted big time.  First, with an amdump
>> running in the background, the machine is so close to unusable that I
>> considered rebooting, but I needed the data to show the problem.  I am
>> losing the keyboard and mouse for a minute or more at a time but the
>> keystrokes seem to be being registered so it eventually catches up.
>>
>> Disk i/o seems to be the killer according to gkrellm.
>>
>> But to give one an idea of the fits this is giving tar, I'll snip a
>> line or 2 from an amstatus report here:
>> coyote:/GenesAmandaHelper-0.6 1 planner: [dumps way too big, 138200
>> KB, must skip incremental dumps]
>>
>> Huh?  138.2GB?  A 'du -h .' in that dir says 766megs.
>>
>> coyote:/root  1 4426m wait for dumping
>> du -h says 5.0GB so that's ballpark, but its also a level 1, so maybe
>> 20 megs is actually new since 15:57 this afternoon local.  kmails
>> final maildir is in that dir.
>>
>> This goes on for much of the amstatus report, very few of the reported
>> sizes are close to sane.
>>
>> Now, can someone suggest a patch I can revert that might fix this? 
>> The total number of patches between 2.6.20 and 2.6.21-rc1 will have me
>> building kernels to bisect this till the middle of June at this rate.
>
>In a previous email, you said you were using ext3. If that's the case,
>there doesn't appear to be much going on in terms of patches between
>2.6.20 and 2.6.21-rc1. The only one that even comes close to looking
>like it might have an effect would only come in to play if you have a
>filesystem that has ACL information, but is mounted by a kernel that
>doesn't have ACL support.
>
>I have to echo wli here, I'm afraid, and recommend at least a *few*
>bisections to help narrow down the list of suspect patches.
>
>There are tutorials out there for git users. I use the mercurial
>repository, as I find the mercurial interface and workflow a lot more
>intuitive, but it has the same capability.
>
>Even 2-5 bisections will greatly help others hunt the bug down.
>
>Ray

Probably.  But I've now put a week into this, and from some other clues 
I've collected, I'm beginning to think tar has a tummy ache. After all, 
and ls -lc reports totally sane mtimes.  So why is tar going bonkers 
under kernels 2.6.21-rc*, with or without Cons patches?

I've also spent a day now looking for a valid place to put a bugzilla 
entry against tar, but googles search results are sending me to 
gcc.gnu.org and this is NOT the correct bugzilla for a tar problem.

Its no secret that with all the churn in tar over the last 5 years, worse 
churn than the kernel IMO in going from 2.0 to 2.6, that I'm not a fan of 
yet another _new_ version of tar, when what we just need is _one_ that 
works.  It is not capable of executing the recovery command listed in the 
first block of every amdump file it (amdump) ever built right now, and 
I've played the equ of the 10,000 monkeys writing Shakespear for several 
hours trying.  Damned frustrating is what it is.

The error it reports seems to indicate that it cannot write through the 
pipes involved.  But with tar's error reporting, who the hell knows for 
sure.

Here is an example
[EMAIL PROTECTED] data]# dd if=00010.coyote._lib.1 bs=32k count=1
AMANDA: FILE 20070314104344 coyote /lib  lev 1 comp .gz program /bin/tar
To restore, position tape at start of file and run:
 dd if= bs=32k skip=1 |  /bin/gzip -dc |  /bin/tar -f - ...

And the elipsis is an error if not removed.  Then one is supposed to be 
able to redirect tars output with the usual >/tmp/test/ syntax

So:
[EMAIL PROTECTED] data]# dd if=00010.coyote._lib.1 bs=32k 
skip=1 |  /bin/gzip -dc |  /bin/tar -f - >/tmp/test/
-bash: /tmp/test/: Is a directory

which is the return from any variation in how the redirect is done.

So what is it that am I doing wrong in the above command line?, so I can 
add it to my helper scripts to be published eventually on zmanda.org.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Trying to establish voice contact ... please yell into keyboard.
-
To unsubscribe from this list: send the line

Re: [kvm-devel] [PATCH 01/15] KVM: Use a shared page for kernel/user communication when runing a vcpu

2007-03-14 Thread Hollis Blanchard

On Wed, 2007-03-14 at 21:38 -0500, Hollis Blanchard wrote:
> On Sun, 2007-03-11 at 15:53 +0200, Avi Kivity wrote:
> > Instead of passing a 'struct kvm_run' back and forth between the
> > kernel and userspace, allocate a page and allow the user to mmap() it.
> > This reduces needless copying and makes the interface expandable by
> > providing lots of free space.
> 
> Do you provide for another means of accessing guest memory from host
> userspace? For example, how do you attach a host debugger to the guest?
> 
> Xen uses an ioctl followed by mmap for this purpose, which is why I
> wonder about using mmap(/dev/kvm) for another purpose.

Never mind: I see now that you have separate device nodes for the VM vs
each vcpu.

-Hollis

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access

2007-03-14 Thread Tejun Heo


Andi Kleen wrote:

Tejun Heo <[EMAIL PROTECTED]> writes:

Let's assume there's a device which shares its INTX IRQ line with
another device and the other one is already initialized.  During boot,
due to BIOS's fault, bad hardware design or sheer bad luck, the device
has got a pending IRQ.


This seems to be also common after kexec during kexec crashdumps
where the device just continues doing what it did before the crash.


This patch expands the pci_set_master() approach.  Instead of enabling
the device in one go, it's done in two steps - prepare and activate.
'prepare' enables access to PCI configuration,


I hope there aren't any new erratas triggered by this. Perhaps it would
make sense to add some paranoia sleeps at least before touching other
state? 


Do you mean between disabling IRQ mechanisms and enabling PCI device in 
pcim_prepare_device()?


Thanks.

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [kvm-devel] [PATCH 01/15] KVM: Use a shared page for kernel/user communication when runing a vcpu

2007-03-14 Thread Hollis Blanchard

On Sun, 2007-03-11 at 15:53 +0200, Avi Kivity wrote:
> Instead of passing a 'struct kvm_run' back and forth between the
> kernel and userspace, allocate a page and allow the user to mmap() it.
> This reduces needless copying and makes the interface expandable by
> providing lots of free space.

Do you provide for another means of accessing guest memory from host
userspace? For example, how do you attach a host debugger to the guest?

Xen uses an ioctl followed by mmap for this purpose, which is why I
wonder about using mmap(/dev/kvm) for another purpose.

-Hollis

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access

2007-03-14 Thread Tejun Heo


Stephen Hemminger wrote:

The problem is the BIOS is busted on these machines. How much effort
do we want to put into dealing with systems with broken BIOS?
I would rather have the root cause fixed than creating a bandaid that
has to be maintained for all the other architectures and platforms.


For sky2/skge, it might be caused by broken BIOS.  For some ATA devices, 
it's just the hardware which is designed that way.  Also, under non-x86 
machines and during resume, there's no BIOS to nudge chips into sane 
state.  This is an existing problem which has to be solved.  How much 
effort we are gonna put into it is certainly debatable.


Also, the current implementation doesn't have any arch independent part. 
 It's wholly contained in arch independent PCI layer, but it might be 
beneficial to have arch dependent hooks (IRQ line enable/disable?) in 
the future.



What if the device with the IRQ problem is never loaded? Sometimes
devices aren't loaded until after boot.


What do you mean by loading a device?  Do you mean loading driver for 
the device?  The patch as posted is probably not a complete solution. 
We probably need to make sure during early boot and resume that all IRQ 
/ bus master are turned off where possible and let low level drivers 
enable them as needed and after certain amount of initialization is 
performed.



If you use MSI interrupts, they aren't shared so there isn't a problem.
Maybe the root cause of this is bad MSI emulation handling in BIOS.


Yes, if MSI is used things are better.


Any change like this has to be done without changing device drivers.
Changing the skge/sky2 drivers as special case is not acceptable.


I dunno about that.  What I'm proposing is alternative two-step PCI 
initialization step - the first step enables the device just enough for 
initialization/reset and the second one enables full access.  We're 
doing part of it already for bus master.  I'm proposing to expand that 
approach and make them handled by generic PCI layer.  As you can see, it 
doesn't add noticeable complexity to drivers.  I think it's even clearer 
than doing pci_set_master() explicitly.


If this way of solving the problem is chosen, eventually most drivers 
should be converted to new initialization steps.  And there is no way to 
do this without modifying low level driver.  Only low level driver knows 
when full blown access can be enabled and such thing must happen before 
registering the device to upper layer (e.g. ATA/SCSI, netif).


sky2/skge aren't exceptions.  If this way of solving the problem is 
chosen, eventually most if not all drivers should be converted to new 
model.  It may take two years, maybe five, but as a start just 
converting ATA and network drivers shouldn't take too long and that 
would help a lot of cases.


Thanks.

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.30 cpu scheduler for mainline kernels

2007-03-14 Thread Siddha, Suresh B

Con,

On Mon, Mar 12, 2007 at 10:58:11AM +1100, Con Kolivas wrote:
> There are updated patches for 2.6.20, 2.6.20.2, 2.6.21-rc3 and 2.6.21-rc3-mm2 
> to bring RSDL up to version 0.30 for download here:

I tried this on a Core 2 Quad cpu system(system has 4 cores on a single
package). When I run SPECjbb2000 with number of threads varying from 1-8,
I see ~4.5% perf regression with RSDL (compared to native 2.6.21-rc3) in
the 8 threads case. This I think, is coming from increased number of
context switches, when we have more than one thread(at same user priority) on
the same logical cpu.

Just to see the % increase in number of context switches, I ran 8 infinite
loops (simple while(1); 's) and with 2.6.21-rc3 I see ~70 context switches
every second, whereas with RSDL I see ~530 context switches.

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Geode cs5530a magic (Was: Re: [PATCH] clean up mach_reboot_fixups)

2007-03-14 Thread Jeremy Fitzhardinge

Andi Kleen wrote:
> On Wednesday 14 March 2007 23:24, Jeremy Fitzhardinge wrote:
>   
>> The reboot_fixups stuff seems to be a bit of a mess, specifically the
>> header is in linux/ when its a purely i386-specific piece of code.  I'm
>> not sure why it has its config option; its only currently needed for
>> "geode-gx1/cs5530a", so perhaps whatever config option controls that
>> hardware should enable this?
>> 
>
> Thanks. Looks good.

It looks like a cs5530a is a PATA driver in drivers/ata/pata_cs5530.c. 
Seems to me the cleanest fix is to register a reboot notifier in the
driver and have it do the magic rather than have the special
mach_reboot_fixups mechanism at all.

Assuming it needs to be done at all...

Alan? Jaya?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] Allow i386 crash kernels to handle x86_64 dumps

2007-03-14 Thread Horms

On Wed, Mar 14, 2007 at 05:00:09PM +, Ian Campbell wrote:
> The specific case I am encountering is kdump under Xen with a 64 bit
> hypervisor and 32 bit kernel/userspace. The dump created is a 64 bit due
> to the hypervisor but the dump kernel is 32 bit to match the domain 0
> kernel.
> 
> It's possibly less likely to be useful in a purely native scenario but I
> see no reason to disallow it.

For native Linux, would this cover the case where the pre-crash kernel
is 64bit and the crashdump (post-crash) kernel is 32bit?

> Signed-off-by: Ian Campbell <[EMAIL PROTECTED]>
> 
> --- pristine-linux-2.6.18/include/asm-i386/elf.h  2006-09-20 
> 04:42:06.0 +0100
> +++ linux-2.6.18-xen/include/asm-i386/elf.h   2007-03-14 16:42:30.0 
> +
> @@ -36,7 +36,7 @@
>   * This is used to ensure we don't load something for the wrong architecture.
>   */
>  #define elf_check_arch(x) \
> - (((x)->e_machine == EM_386) || ((x)->e_machine == EM_486))
> + (((x)->e_machine == EM_386) || ((x)->e_machine == EM_486) || 
> ((x)->e_machine == EM_X86_64))

I think it would be a bit nicer if this was < 80col wide,
though obviously this doesn't affect the funtionality.

diff --git a/include/asm-i386/elf.h b/include/asm-i386/elf.h
index 8d33c9b..cd894dd 100644
--- a/include/asm-i386/elf.h
+++ b/include/asm-i386/elf.h
@@ -36,7 +36,8 @@ typedef struct user_fxsr_struct elf_fpxregset_t;
  * This is used to ensure we don't load something for the wrong architecture.
  */
 #define elf_check_arch(x) \
-   (((x)->e_machine == EM_386) || ((x)->e_machine == EM_486))
+   (((x)->e_machine == EM_386) || ((x)->e_machine == EM_486) || \
+((x)->e_machine == EM_X86_64))
 
 /*
  * These are used to set parameters in the core dumps.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reiser4: Transparent compression support. Further development and compatibility.

2007-03-14 Thread Edward Shishkin


   Reiser4 file system: Transparent compression support.
  Further development and compatibility.


   A. Reiser4 cryptcompress file plugin(*) and its conversion(**)


This is the second file plugin that realizes regular files in reiser4.
Unlike previous one (unix-file plugin), cryptcompress plugin manages
files with encrypted and(or) compressed bodies packed to metadata
pages, so plain text is cached in data pages (pinned to inode's
mapping), which don't participate in IO: at background commit their
data get compressed with the following update of old compressed
bodies. This update is going in so-called "squalloc" phase of the
flush algorithm, so eventually everything will be tightly packed.
And yes, metadata pages are supposed to be writebacked. Roughly
speaking, cryptcompress file occupies more memory and smaller disk
space then ordinary file (managed by unix-file plugin). In contrast
with unix-file plugin, the smallest addressable unit is page cluster
(in memory) and item cluster (on disk). Also cryptcompress plugin
implements another, more economic approach in representing holes.
However it calls the same low-level (node, etc) plugins, so you can
have a "mixed" fileset on your reiser4 partition. See below about
backward compatibility.

To reduce cpu and memory usage when handling incompressible data one
should assign proper compression mode plugin. The default one
activates special hook in ->write() method of cryptcompress file
plugin (only once per file's life, when starting to write from special
offset in some iteration) which tries to estimate whether a file is
compressible by testing its first logical cluster (64K by default).
If evaluation result is negative, then fragments will be converted to
extents, and management will be passed to unix-file plugin. Back
conversion does not take place. If evaluation result is positive, then
file stays under cryptcompress plugin control, but compression will be
dynamically switched by flush manager in accordance with the policy
implemented by compression mode plugin. This heuristic looks mostly
like improvisation and might be improved via modifying the compression
mode plugin (***) (some statistical analysis is needed here to make
sure we don't worsen the situation).

So let's summarize what we have in the cases of not success in primary
evaluation performed by default mode:

1. file is incompressible, but its first logical cluster is
   compressible. In this case compression will be "turned off" in
   flush time, so we save only cpu, whereas memory consumption is
   wasteful, as file stays under cryptcomptress plugin control. Also
   deleting a huge file built of fragments is not the fastest
   operation.
2. file is compressible, but its first logical cluster is
   incompressible. In this case management will be passed to the
   unix-file plugin forever (not the worse situation).

---
(*) "plugins" means "internal reiser4 modules". Perhaps, "plugin" is a
bad name, but let us use it in the context of reiser4 (at least for
now). Each plugin is labeled by a unique pair (type, id), so plugin's
name is composed of id name (first) and type name. For example,
"extent item plugin" means plugin of item type that manages extent
pointers in reiser4. Plugins of file type are to service VFS
entrypoints.

(**) plugin conversion means passing management to another plugin of
the same plugin type: (type, id1) -> (type, id2) with the following
(or preceded) conversion of controlled objects (tail conversion is a
classic example of such operation).

(***) when modifying an existing plugin we should be careful (see
below about backward compatibility).


B. Getting started with cryptcompress plugin


** Warning! Warning! Warning! 

This stuff is experimental.
Do not store important data in the files managed by cryptcompress
plugin. It can be lost with no chances to recover it back. Also
creating at least one such file on your product Reiser4 partition can
cause its unrecoverable crash. It is not a joke!

**

NOTE: We don't consider using pseudo interface (metas), as it is still
deprecated.

1. Build and boot the latest kernel of -mm series.
2. Build and install the latest version of reiser4progs(1.0.6 for now)
3. Have a free partition (not for product using).
4. Format it by mkfs.reiser4. Use the option -o to override "create"
   and maybe other related plugins that mkfs installs to root
   directory by default.
   List of default settings is available via option -p.
   List of all possible settings is available via option -l
   For example:

   "mkfs.reiser4 -o create=ccreg40 /dev/xxx"
   specifies cryptcompress file plugin with (default) lzo1 compression
   "mkfs.reiser4 -o create=ccreg40,compress=gzip1 /dev/xxx"
   specifies cryptcompress file plugin with gzip1 compression.

   Description of all cryptcompress-related settings can be found

Re: [PATCH 0/8] x86 boot, pda and gdt cleanups

2007-03-14 Thread Jeremy Fitzhardinge

Rusty Russell wrote:
> Hmm, this invalidated my assumption that write_gdt_entry is always a
> write to this cpu's active gdt.  Better fix is not to call it twice
> anyway...
>   

No, I don't think that's true.  I implemented the write_*_entry
functions with the assumption they could be called either on setup or on
an in-use entry.  I think its good policy to use it all the time anyway,
since the pv_ops backend might want to fiddle with the values on the way
through.

I tried to avoid calling init_gdt twice, but it seemed cleaner to just
let it happen.

> Getting rid of the call in smp_prepare_boot_cpu currently works, but
> it's fragile:  __get_cpu_var(x) && per_cpu(x, smp_processor_id()) will
> differ, and changes made to __get_cpu_var(x) will vanish...
>   

Yes.  I think its definitely a good idea to call init_gdt asap after
doing the percpu setup.

> Fortunately, UP doesn't have to call init_gdt at all, so I think it's
> better to place it in smp_prepare_boot_cpu only and then clean up the UP
> code.  I'll try now...
>   
It doesn't?  The per-cpu gdt is the same as the boot gdt?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed

2007-03-14 Thread Xiaoning Ding


Dave Kleikamp wrote:

On Wed, 2007-03-14 at 22:33 +0100, Andreas Mohr wrote:

Hi,

On Wed, Mar 14, 2007 at 03:55:41PM -0500, Dave Kleikamp wrote:

On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
This patch unconditionally calls mark_page_accessed to prevent pages, 
especially for small files, from being evicted from the page cache despite 
frequent access.

I guess the downside to this is if a reader is reading a large file, or
several files, sequentially with a small read size (smaller than
PAGE_SIZE), the pages will be marked active after just one read pass.
My gut says the benefits of this patch outweigh the cost.  I would
expect real-world backup apps, etc. to read at least PAGE_SIZE.

I also think that the patch is somewhat problematic, since the original
intention seems to have been a reduction of the number of (expensive?)
mark_page_accessed() calls,


mark_page_accessed() isn't expensive.  If called repeatedly, starting
with the third call, it will check two page flags and return.  The only
real expense is that the page appears busier than it may be and will be
retained in memory longer than it should.


If we allow mark_page_accessed() called multiple times for a single page,
a scan of large file with small-size reads would flush the buffer cache.
mark_page_accessed() also requests lru_lock when moving page from
inactive_list to active_list. It may also increase lock contention.


but this of course falls flat on its face in case
of permanent single-page accesses or accesses with progressing but very small
read size (single-byte reads or so), since the cached page content will expire
eventually due to lack of mark_page_accessed() updates; thus this patch
decided to call mark_page_accessed() unconditionally which may be a large
performance penalty for subsequent tiny-sized reads.


Any application doing many tiny-sized reads isn't exactly asking for
great performance.


I've been thinking hard how to avoid the mark_page_accessed() starvation in
case of a fixed, (almost) non-changing access state, but this seems hard since
it'd seem we need some kind of state management here to figure out good
intervals of when to call mark_page_accessed() *again* for this page. E.g.
despite non-changing access patterns you could still call mark_page_accessed()
every 32 calls or so to avoid expiry, but this would need extra helper
variables.

A rather ugly way to do it may be to abuse ra.cache_hit or ra.mmap_hit content
with a
if ((prev_index != index) || (ra.cache_hit % 32 == 0))
mark_page_accessed(page);
This assumes that ra.cache_hit gets incremented for every access (haven't
checked whether this is the case).
That way (combined with an enhanced comment properly explaining the dilemma)
you would avoid most mark_page_accessed() invocations of subsequent same-page 
reads
but still do page status updates from time to time to avoid page deprecation.

Does anyone think this would be acceptable? Any better idea?


I wouldn't go looking for anything more complicated than Ashif's patch,
unless testing shows it to be harmful in some realistic workload.


Andreas Mohr

P.S.: since I'm not too familiar with this area I could be rather wrong after 
all...


I could be missing something as well.  :-)

Shaggy


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm: migrate_pages using

2007-03-14 Thread KAMEZAWA Hiroyuki

On Mon, 12 Mar 2007 19:57:58 +0100
Michal Hocko <[EMAIL PROTECTED]> wrote:

> What do you think about that. Is this way correct?
> 

If you are sure that your "original" pages is never freed while you are
migrating it.maybe.

-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: 2.6.20-1 not working on ibook g4 (BUG/Oops)

2007-03-14 Thread young dave


2007/3/13, Benjamin Herrenschmidt <[EMAIL PROTECTED]>:
On Tue, 2007-03-13 at 01:49 +, young dave wrote:
> Hi,
> I have tested on my mac mini g4.
>
> The 2.6.21-rc2 will cause oops like the above post.
>
> And for the new 2.6.21-rc3-git7 , the kernel load ok,  penguin pixmap
> appears, but then it stopped, there's no error messages also.

-rc3 should have the bug fixed... it might be something else wrong. Have
you use a pmac32_defconfig ?

Ben.


Hi,
I have tested the pmac32_defconfig, make menuconfig , and there's some warnings:

.config:380:warning: trying to assign nonexistent symbol IP_NF_TARGET_TCPMSS
.config:808:warning: trying to assign nonexistent symbol IEEE1394_OUI_DB
.config:811:warning: trying to assign nonexistent symbol IEEE1394_EXPORT_FULL_AP
I
.config:1308:warning: trying to assign nonexistent symbol BACKLIGHT_DEVICE
.config:1310:warning: trying to assign nonexistent symbol LCD_DEVICE
.config:1461:warning: trying to assign nonexistent symbol USB_BANDWIDTH
.config:1464:warning: trying to assign nonexistent symbol USB_MULTITHREAD_PROBE
.config:1476:warning: trying to assign nonexistent symbol USB_OHCI_BIG_ENDIAN
.config:1734:warning: trying to assign nonexistent symbol ZISOFS_FS
.config:1894:warning: trying to assign nonexistent symbol IOMAP_COPY
.config:1920:warning: trying to assign nonexistent symbol DEBUG_RWSEMS

Then I modified the config file, But still can't boot , just stopped,
and the keyboard is active,  it seems the kernel is running, but
there's no init messages.

I don't know why, the distribution I used is Yellowdog 4.0. the
original 2.6.17 is just ok.

Could you please help to check the configs?

Thanks.


config
Description: Binary data

do_acct_process bypasses vfs_write?

2007-03-14 Thread Michael K. Edwards


do_acct_process (in kernel/acct.c) bypasses vfs_write and calls
file->f_op->write directly.  It therefore bypasses various sanity
checks, some of which appear applicable (notably inode->i_flock &&
MANDATORY_LOCK(inode)) and others of which do not (oversize request,
access_ok, etc.).  It also neglects to call
fsnotify_modify(file->f_path.dentry) after a successful write, which
may or may not matter.

Perhaps someone more knowledgeable than I could go through vfs_read
and vfs_write, distinguishing between those checks which are only
applicable to requests initiated from userspace and those which should
also be performed for in-kernel uses of f_op->read/write?

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 13/13] fix ps3fb glue allowing a modular build

2007-03-14 Thread Antonino A. Daplas

On Wed, 2007-03-14 at 10:50 +0100, Geert Uytterhoeven wrote:
> On Wed, 14 Mar 2007, Al Viro wrote:
> > Signed-off-by: Al Viro <[EMAIL PROTECTED]>
> 

> And finally, make sure CONFIG_LOGO=n, as there's a bug in the logo code: logos
> are __initdata but the logo code still tries to draw them for a modular fbdev.
> Originally (eons ago) this case was handled by the flag initmem_freed, which 
> no
> longer exists.
> 

True, I tried to prevent the logo from being drawn if the driver is
loaded first prior to fbcon, but the code will still draw the logo if
the load order is reversed.  Can you try this patch?  It will only
permit the drawing of the logo if both the driver and fbcon are compiled
statically.

Tony
diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index bd131d4..12e8a3b 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -107,7 +107,9 @@ static struct display fb_display[MAX_NR_
 
 static signed char con2fb_map[MAX_NR_CONSOLES];
 static signed char con2fb_map_boot[MAX_NR_CONSOLES];
+#ifndef MODULE
 static int logo_height;
+#endif
 static int logo_lines;
 /* logo_shown is an index to vc_cons when >= 0; otherwise follows FBCON_LOGO
enums.  */
@@ -576,6 +578,13 @@ static int fbcon_takeover(int show_logo)
 	return err;
 }
 
+#ifdef MODULE
+static void fbcon_prepare_logo(struct vc_data *vc, struct fb_info *info,
+			   int cols, int rows, int new_cols, int new_rows)
+{
+	logo_shown = FBCON_LOGO_DONTSHOW;
+}
+#else
 static void fbcon_prepare_logo(struct vc_data *vc, struct fb_info *info,
 			   int cols, int rows, int new_cols, int new_rows)
 {
@@ -584,6 +593,11 @@ static void fbcon_prepare_logo(struct vc
 	int cnt, erase = vc->vc_video_erase_char, step;
 	unsigned short *save = NULL, *r, *q;
 
+	if (info->flags & FBINFO_MODULE) {
+		logo_shown = FBCON_LOGO_DONTSHOW;
+		goto done;
+	}
+
 	/*
 	 * remove underline attribute from erase character
 	 * if black and white framebuffer.
@@ -654,7 +668,10 @@ static void fbcon_prepare_logo(struct vc
 		logo_shown = FBCON_LOGO_DRAW;
 		vc->vc_top = logo_lines;
 	}
+
+done:
 }
+#endif /* MODULE */
 
 #ifdef CONFIG_FB_TILEBLITTING
 static void set_blitting_type(struct vc_data *vc, struct fb_info *info)
diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
index 45f3839..08c292d 100644
--- a/drivers/video/fbmem.c
+++ b/drivers/video/fbmem.c
@@ -418,7 +418,8 @@ int fb_prepare_logo(struct fb_info *info
 
 	memset(_logo, 0, sizeof(struct logo_data));
 
-	if (info->flags & FBINFO_MISC_TILEBLITTING)
+	if (info->flags & FBINFO_MISC_TILEBLITTING ||
+	info->flags & FBINFO_MODULE)
 		return 0;
 
 	if (info->fix.visual == FB_VISUAL_DIRECTCOLOR) {
@@ -483,7 +484,8 @@ int fb_show_logo(struct fb_info *info, i
 	struct fb_image image;
 
 	/* Return if the frame buffer is not mapped or suspended */
-	if (fb_logo.logo == NULL || info->state != FBINFO_STATE_RUNNING)
+	if (fb_logo.logo == NULL || info->state != FBINFO_STATE_RUNNING ||
+	info->flags & FBINFO_MODULE)
 		return 0;
 
 	image.depth = 8;

Re: [PATCH] Fix COMPAT_VDSO regression bug

2007-03-14 Thread Roland McGrath

I built a CONFIG_COMPAT_VDSO=y, CONFIG_HIGHMEM64G=y kernel and it has no
problems with FC-6 userland.  Everything looks fine with the vDSO.  
So either some more details of your kernel config are relevant, or
something about the userland usage pattern.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc3-mm2 (BUG in pci_restore_state())

2007-03-14 Thread Eric W. Biederman

Bjorn Helgaas <[EMAIL PROTECTED]> writes:

> In 2.6.21-rc3-mm2 (plus some move_freepages() bugfixes), I hit one
> of the warnings added by Eric's msi-debug-code.patch.  This is on an
> ia64 box, an HP rx2600.  Let me know if I can collect more information.

I think we are good. How pci_save_state and pci_restore_state were
implemented and how they were used were out of sync.  tg3 was one
of the drivers where pci_save_state and pci_restore_state were used
as part of the reset routine and were not used in pairs.

Which when combined with a pci-x or a pci-express capability
resulted in a memory leak, (that I was warning about).  This
has now been corrected upstream.

And the condition I was warning about non paired pci_save_state and
pci_restore_state is no longer a problem.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/17] sparc: nr_free_pages() is unsigned long

2007-03-14 Thread David Miller

From: William Lee Irwin III <[EMAIL PROTECTED]>
Date: Wed, 14 Mar 2007 08:06:12 -0700

> On Wed, Mar 14, 2007 at 09:18:50AM +, Al Viro wrote:
> > Signed-off-by: Al Viro <[EMAIL PROTECTED]>
> > ---
> >  arch/sparc/mm/init.c |2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> Dave, I trust you'll pick it up until I get a git tree going.
> 
> Acked-by: William Irwin <[EMAIL PROTECTED]>

What usually happens when Al sends a set like this is that
Linus picks it up directly, and I've just verified that this
is in fact what has happened this time too :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: oops in __nodemgr_remove_host_dev (was Re: Ooops with suspend to RAM)

2007-03-14 Thread Ismail Dönmez

On Thursday 15 March 2007 02:08:43 Stefan Richter wrote:
[...]
>
> Ismail, if you have the opportunity, the next thing you could test would
> be to unload eth1394 explicitly before ohci1394 on 2.6.21-rc3. This
> would _not_ oops according to my observation.

On a clean reboot it works as expected ;

southpark cartman # rmmod eth1394
southpark cartman # rmmod ohci1394
southpark cartman #

No oops.

Thanks.

-- 
Happiness in intelligent people is the rarest thing I know. (Ernest Hemingway)

Ismail Donmez ismail (at) pardus.org.tr
GPG Fingerprint: 7ACD 5836 7827 5598 D721 DF0D 1A9D 257A 5B88 F54C
Pardus Linux / KDE developer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: oops in __nodemgr_remove_host_dev (was Re: Ooops with suspend to RAM)

2007-03-14 Thread Ismail Dönmez

On Thursday 15 March 2007 02:08:43 Stefan Richter wrote:
[...]
> Ismail, if you have the opportunity, the next thing you could test would
> be to unload eth1394 explicitly before ohci1394 on 2.6.21-rc3. This
> would _not_ oops according to my observation.

rmmod eth1394 and modprobe -r eth1394 both hangs here no oops nothing.

Regards.

-- 
Happiness in intelligent people is the rarest thing I know. (Ernest Hemingway)

Ismail Donmez ismail (at) pardus.org.tr
GPG Fingerprint: 7ACD 5836 7827 5598 D721 DF0D 1A9D 257A 5B88 F54C
Pardus Linux / KDE developer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset

2007-03-14 Thread Robin Holt

On Fri, Mar 09, 2007 at 05:58:59PM -0600, Nathan Lynch wrote:
> Hello-
> 
> Cliff Wickman wrote:
> > This patch would insert a preference to migrate such a task to a cpu within
> > its cpuset (and set its cpus_allowed to its cpuset).
> > 
> > With this patch, migrate the task to:
> >  1) to any cpu on the same node as the disabled cpu, which is both online
> > and among that task's cpus_allowed
> >  2) to any online cpu within the task's cpuset
> >  3) to any cpu which is both online and among that task's cpus_allowed
> 
> I think I disagree with this change.
> 
> The kernel shouldn't have to be any smarter than it already is about
> moving tasks off an offlined cpu.  The only way case 2) can be reached
> is if the user has changed a task's cpu affinity.  If the user is
> sophisticated enough to manipulate tasks' cpu affinity then they can
> arrange to migrate tasks as they see fit before offlining a cpu.

You are assuming some sort of interlock between the admin and the user.
While this may be true on your own personal desktop, I don't think you
can expect this to be true on a development machine shared by hundreds
of users and admin'd by a group of people.

Additionally, ia64 is gaining support for offlining a cpu which is giving
cache errors.

Thanks,
Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 13/13] signalfd/timerfd/asyncfd v5 - KAIO asyncfd support (example/maybe-broken) ...

2007-03-14 Thread Davide Libenzi

On Wed, 14 Mar 2007, Davide Libenzi wrote:

> On Wed, 14 Mar 2007, Benjamin LaHaise wrote:
> 
> > On Wed, Mar 14, 2007 at 04:41:58PM -0700, Davide Libenzi wrote:
> > > Yeah, of course. I do not plan revolutions. Just asking if it's a 
> > > possible 
> > > thing to do. I can mlock the userspace ring, if imposing that burden over 
> > > aio_complete() is seen as too heavy.
> > 
> > I'm not sure I follow what you're doing -- why isn't asyncfd merely calling 
> > io_getevents() instead of reinventing everything the ringbuffer does?  The 
> > aio ringbuffer is already locked in memory.  Fwiw, the aio ringbuffer was 
> > originally wired up to a file descriptor, but that gave way to the actual 
> > syscall in order to enforce proper typechecking and typical usage scenarios 
> > with timeouts.
> 
> The purpose of asyncfd is to provide a pollable (by the mean of 
> f_op->poll) device that can be hosted inside a standard select/poll/epoll 
> wait subsystem, and that, at the same time, provide a zero-copy way for 
> kernel code (KAIO and syslets/threadlets were my thought) to deliver 
> results to userspace.

But, yeah. It can end up calling io_getevents() instead of doing it's own 
thing. That'd make it even slimmer ;)



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 13/13] signalfd/timerfd/asyncfd v5 - KAIO asyncfd support (example/maybe-broken) ...

2007-03-14 Thread Davide Libenzi

On Wed, 14 Mar 2007, Linus Torvalds wrote:

> On Wed, 14 Mar 2007, Davide Libenzi wrote:
> > >
> > > That won't work.  aio_complete() is supposed to be irq safe.
> > 
> > Can you point me to a kernel path that ends up calling aio_complete() in a 
> > do-not-sleep mode?
> 
> All of them.
> 
> It's called from dio_bio_end_aio(), which is the bi_end_io function for an 
> AIO action. Which in turn is called at IO completion time. 
> 
> Which is basically _always_ interrupt context.
> 
> So you cannot sleep. It's not about holding spinlocks (which it might well 
> do as well). It's about a much more fundamental issue: you can only sleep 
> in process context, not from interrupts.

Ack! Gotcha. Sigh! :)



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 13/13] signalfd/timerfd/asyncfd v5 - KAIO asyncfd support (example/maybe-broken) ...

2007-03-14 Thread Linus Torvalds

On Wed, 14 Mar 2007, Davide Libenzi wrote:
> >
> > That won't work.  aio_complete() is supposed to be irq safe.
> 
> Can you point me to a kernel path that ends up calling aio_complete() in a 
> do-not-sleep mode?

All of them.

It's called from dio_bio_end_aio(), which is the bi_end_io function for an 
AIO action. Which in turn is called at IO completion time. 

Which is basically _always_ interrupt context.

So you cannot sleep. It's not about holding spinlocks (which it might well 
do as well). It's about a much more fundamental issue: you can only sleep 
in process context, not from interrupts.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 13/13] signalfd/timerfd/asyncfd v5 - KAIO asyncfd support (example/maybe-broken) ...

2007-03-14 Thread Davide Libenzi

On Wed, 14 Mar 2007, Benjamin LaHaise wrote:

> On Wed, Mar 14, 2007 at 04:41:58PM -0700, Davide Libenzi wrote:
> > Yeah, of course. I do not plan revolutions. Just asking if it's a possible 
> > thing to do. I can mlock the userspace ring, if imposing that burden over 
> > aio_complete() is seen as too heavy.
> 
> I'm not sure I follow what you're doing -- why isn't asyncfd merely calling 
> io_getevents() instead of reinventing everything the ringbuffer does?  The 
> aio ringbuffer is already locked in memory.  Fwiw, the aio ringbuffer was 
> originally wired up to a file descriptor, but that gave way to the actual 
> syscall in order to enforce proper typechecking and typical usage scenarios 
> with timeouts.

The purpose of asyncfd is to provide a pollable (by the mean of 
f_op->poll) device that can be hosted inside a standard select/poll/epoll 
wait subsystem, and that, at the same time, provide a zero-copy way for 
kernel code (KAIO and syslets/threadlets were my thought) to deliver 
results to userspace.

> Also, there have been patches floating around for aio_poll and a way to get 
> epoll wakeups into the aio event queue.  They deserve serious consideration 
> if this asyncfd seems necessary.

I don't want to talk about the AIO poll code, because last time I saw it, 
it did not look shiny.
But I think we can agree that ppl needs to have a way to wait for both 
block I/O (covered by either KAIO or syslets/threadlets) and all the other 
world (covered by epoll). This has been pretty clear for me, looking at 
the continuous request I got to provide block I/O completions through 
epoll, and looking at the hackage that ppl has currently to do in 
userspace to achieve that.
Now that I'm seeing I can wait for both block and net I/O, I got excited ;)

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Bug 8040] Hang before INIT when CONFIG_HIGHMEM4G=y [Fix CONFIG_COMPAT_VDSO] <- Bad

2007-03-14 Thread Andi Kleen

On Thursday 15 March 2007 02:01, Andrew Morton wrote:
> > On Wed, 14 Mar 2007 17:52:01 + (UTC) Leroy van Logchem <[EMAIL 
> > PROTECTED]> wrote:
> > Leroy van Logchem  wldelft.nl> writes:
> > 

Where does it hang exactly? Do you have a boot log?

> > > > > None whatsoever.  Three people are reporting this and it's a drop-dead
> > > > > showstopper for a 2.6.21 release so we just have to wait until someone
> > > > > wakes up and thinks about it.
> > > 
> > > The topic should be "when CONFIG_HIGHMEM64G=y" imo.
> > > 
> > > I'll try to do my first bi-sect today.
> 
> Thanks.   Please always do reply-to-all.  Cc's restored (and added..)
> 
> > Bisecting went well, after 13 compiles this commit was found:
> > 
> > a1f3bb9ae4497a2ed3eac773fd7798ac33a0371f is first bad commit
> > commit a1f3bb9ae4497a2ed3eac773fd7798ac33a0371f
> > Author: Roland McGrath <[EMAIL PROTECTED]>
> > Date:   Fri Jan 26 00:56:46 2007 -0800

Can you please double check this by trying with/without again -- sometimes 
bisects go bad.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: oops in __nodemgr_remove_host_dev (was Re: Ooops with suspend to RAM)

2007-03-14 Thread Stefan Richter

I wrote:
> according to a quick test I made right now it is a regression post 2.6.20.
> # modprobe ohci1394   # wait a bit, eth1394 is auto-loaded
> # modprobe -r eth1394
> # modprobe -r ohci1394
> works.
> # modprobe ohci1394   # wait a bit, eth1394 is auto-loaded
> # modprobe -r ohci1394
> oopses with the same trace as Ismael posted. And indeed, looking at his
> trace once more I now also spot eth1394 among his linked-in modules.

To avoid any misunderstandings: Both the former and the latter sequence
work under 2.6.20 and earlier.
-- 
Stefan Richter
-=-=-=== --== -
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix COMPAT_VDSO regression bug

2007-03-14 Thread Leroy van Logchem



   Revert "[PATCH] Fix CONFIG_COMPAT_VDSO"
   This reverts commit a1f3bb9ae4497a2ed3eac773fd7798ac33a0371f.

   Several systems couldnt boot using CONFIG_HIGHMEM64G=y as
   reported in bug #8040. Reverting the above patch solved the problem.


   Cc: Randy Dunlap <[EMAIL PROTECTED]>
   Cc: Ingo Molnar <[EMAIL PROTECTED]>
   Cc: Roland McGrath <[EMAIL PROTECTED]>
   Bisected-by: Leroy Raymond van Logchem <[EMAIL PROTECTED]>


arch/i386/kernel/entry.S|4 
arch/i386/kernel/sysenter.c |2 --
include/asm-i386/elf.h  |7 ---
include/asm-i386/fixmap.h   |2 --
include/asm-i386/page.h |2 --
5 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
index 5e47683..06461b8 100644
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -302,16 +302,12 @@ sysenter_past_esp:
   pushl $(__USER_CS)
   CFI_ADJUST_CFA_OFFSET 4
   /*CFI_REL_OFFSET cs, 0*/
-#ifndef CONFIG_COMPAT_VDSO
   /*
* Push current_thread_info()->sysenter_return to the stack.
* A tiny bit of offset fixup is necessary - 4*4 means the 4 words
* pushed above; +8 corresponds to copy_thread's esp0 setting.
*/
   pushl (TI_sysenter_return-THREAD_SIZE+8+4*4)(%esp)
-#else
-   pushl $SYSENTER_RETURN
-#endif
   CFI_ADJUST_CFA_OFFSET 4
   CFI_REL_OFFSET eip, 0

diff --git a/arch/i386/kernel/sysenter.c b/arch/i386/kernel/sysenter.c
index 666f70d..a1090e1 100644
--- a/arch/i386/kernel/sysenter.c
+++ b/arch/i386/kernel/sysenter.c
@@ -95,7 +95,6 @@ int __init sysenter_setup(void)
   return 0;
}

-#ifndef CONFIG_COMPAT_VDSO
static struct page *syscall_nopage(struct vm_area_struct *vma,
   unsigned long adr, int *type)
{
@@ -190,4 +189,3 @@ int in_gate_area_no_task(unsigned long addr)
{
   return 0;
}
-#endif
diff --git a/include/asm-i386/elf.h b/include/asm-i386/elf.h
index 369035d..157bb7a 100644
--- a/include/asm-i386/elf.h
+++ b/include/asm-i386/elf.h
@@ -143,9 +143,12 @@ extern int dump_task_extended_fpu (struct
task_struct *, struct user_fxsr_struct
# define VDSO_PRELINK  0
#endif

-#define VDSO_SYM(x) \
+#define VDSO_COMPAT_SYM(x) \
   (VDSO_COMPAT_BASE + (unsigned long)(x) - VDSO_PRELINK)

+#define VDSO_SYM(x) \
+   (VDSO_BASE + (unsigned long)(x) - VDSO_PRELINK)
+
#define VDSO_HIGH_EHDR ((const struct elfhdr *) VDSO_HIGH_BASE)
#define VDSO_EHDR  ((const struct elfhdr *) VDSO_COMPAT_BASE)

@@ -153,12 +156,10 @@ extern void __kernel_vsyscall;

#define VDSO_ENTRY VDSO_SYM(&__kernel_vsyscall)

-#ifndef CONFIG_COMPAT_VDSO
#define ARCH_HAS_SETUP_ADDITIONAL_PAGES
struct linux_binprm;
extern int arch_setup_additional_pages(struct linux_binprm *bprm,
   int executable_stack);
-#endif

extern unsigned int vdso_enabled;

diff --git a/include/asm-i386/fixmap.h b/include/asm-i386/fixmap.h
index 3e9f610..02428cb 100644
--- a/include/asm-i386/fixmap.h
+++ b/include/asm-i386/fixmap.h
@@ -23,8 +23,6 @@
extern unsigned long __FIXADDR_TOP;
#else
#define __FIXADDR_TOP  0xf000
-#define FIXADDR_USER_START __fix_to_virt(FIX_VDSO)
-#define FIXADDR_USER_END   __fix_to_virt(FIX_VDSO - 1)
#endif

#ifndef __ASSEMBLY__
diff --git a/include/asm-i386/page.h b/include/asm-i386/page.h
index 7b19f45..fd3f64a 100644
--- a/include/asm-i386/page.h
+++ b/include/asm-i386/page.h
@@ -143,9 +143,7 @@ extern int page_is_ram(unsigned long pagenr);
#include 
#include 

-#ifndef CONFIG_COMPAT_VDSO
#define __HAVE_ARCH_GATE_AREA 1
-#endif
#endif /* __KERNEL__ */

#endif /* _I386_PAGE_H */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] cosmetic adaption of drivers/ide/Kconfig concerning SATA

2007-03-14 Thread Patrick Ringl


Hello,
since Serial ATA has it's own menu point now, I guess we can change the
description of the deprecated SATA driver as well, since the new S-ATA
subsystem is not configured through a SCSI low-level driver anymore.

The following patch is against 2.6.21-rc3:

--- linux-2.6.20.orig/drivers/ide/Kconfig2007-03-12
01:34:38.0 +0100
+++ linux-2.6.20/drivers/ide/Kconfig2007-03-12 01:47:10.0 +0100
@@ -103,7 +103,7 @@
---help---
  There are two drivers for Serial ATA controllers.

-  The main driver, "libata", exists inside the SCSI subsystem
+  The main driver, "libata", exists in the "Serial ATA subsystem"
  and supports most modern SATA controllers.

  The IDE driver (which you are currently configuring) supports


Since I am not subscribed to the list, I'd find it great if I were
personally CC'ed. :-)


Best regards
Patrick




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: oops in __nodemgr_remove_host_dev (was Re: Ooops with suspend to RAM)

2007-03-14 Thread Stefan Richter

Ismail Dönmez wrote:
> On Wednesday 14 March 2007 20:25:24 Stefan Richter wrote:
>> Ismail Dönmez wrote:
>> > Are you able to rmmod it?
>>
>> Yes, but on 2.6.20 and earlier kernels, most of the time with
>> development versions of the 1394 drivers. I still haven't tried
>> 2.6.21-rc, will hopefully get to it tonight.
> 
> Ok then that explains a bit, without suspend if I rmmod ohci1394 module I got 
> the exact oops.

Elsewhere, Adrian Bunk wrote:
| Is this an old problem, or what was the last kernel that worked
| for you?

Adrian,

according to a quick test I made right now it is a regression post 2.6.20.
# modprobe ohci1394   # wait a bit, eth1394 is auto-loaded
# modprobe -r eth1394
# modprobe -r ohci1394
works.
# modprobe ohci1394   # wait a bit, eth1394 is auto-loaded
# modprobe -r ohci1394
oopses with the same trace as Ismael posted. And indeed, looking at his
trace once more I now also spot eth1394 among his linked-in modules.

Ismail, if you have the opportunity, the next thing you could test would
be to unload eth1394 explicitly before ohci1394 on 2.6.21-rc3. This
would _not_ oops according to my observation.

Thanks to Ismail's link to the similar report on 2.6.19-rc5-mm2 we
already have a hot candidate to be the trigger (not necessarily to be
the actual bug):
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=43cb76d91ee85f579a69d42bc8efc08bac560278
"Network: convert network devices to use struct device instead of
class_device"

Alas I didn't remember that older 2.6.19-rc5-mm2 discussion when I saw
Greg's pull request with this conversion patch (February 7) and didn't
react and test Linus' newest.

Advice would be appreciated...
-- 
Stefan Richter
-=-=-=== --== -
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc3-mm1

2007-03-14 Thread Andrew Morton

> On Wed, 14 Mar 2007 20:06:02 +0100 Mariusz Kozlowski <[EMAIL PROTECTED]> 
> wrote:
> Hello,
> 
>   Today after +- 24h of uptime I found some more page allocation
> failures ('eth1: Can't allocate skb for Rx'). You'll find more here:
> 
> http://tuxland.pl/misc/2.6.21-rc3-mm1-page-allocation-failure.txt
> 
> System wasn't doing anything unusual, as usual ;-) X, some p2p 
> software, firefox+flash playing music.
> 

Do other kernels do this, or is 2.6.21-rc3-mm1 worse?

It is of course a non-fatal problem and will inevitably happen sometimes,
but we would like the VM to be able to minimise the occurrence of this
problem.

I think we were rather hoping that Mel's anti-fragmentation work would
improve things.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] change futex_wait() to hrtimers

2007-03-14 Thread linux

> BTW. my futex man page says timeout's contents "describe the maximum duration
> of the wait". Surely that should be *minimum*? Michael cc'ed.

Er, the intent of the wording is to say "futex will wait until uaddr
no longer contains val, or the timeout expires, whichever happens first".


One option for selecting different clock resolutions is to use the
clockid_t from the POSIX clock_gettime() family.  That is, specify the
clock that a wait uses, and then have a separate mechanism for turning
a resolution requirement into a clockid_t.

(And there can be default clocks for interfaces that don't specify one
explicitly.)

Although clockid_t is pretty generic, it's biased toward an enumerated
list of clocks rather than a continuous resolution.  Fortunately,
that seems to match the implementation ideas.  The question is how
much the timeout gets rounded, and the choices are currently jiffies
or microseconds.

A related option may be whether rounding down is acceptable.  For some
applications (periodic polling for events), it's fine.  For others,
it's not.  Thus, while it's okay to specify such clocks explicitly,
it'd probably be a good idea to forbid selecting them as the default
for interfaces that don't specify a clock explicitly.

I had some code that suffered 1 ms buzz-loops on Solaris because poll(2)
would round the timeout interval down, but the loop calling it would
explicitly check whether the timeout had expired using gettimeofday()
and would keep re-invoking poll(pollfds, npollfds, 1) until the timeout
really did expire.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Bug 8040] Hang before INIT when CONFIG_HIGHMEM4G=y [Fix CONFIG_COMPAT_VDSO] <- Bad

2007-03-14 Thread Andrew Morton

> On Wed, 14 Mar 2007 17:52:01 + (UTC) Leroy van Logchem <[EMAIL 
> PROTECTED]> wrote:
> Leroy van Logchem  wldelft.nl> writes:
> 
> > 
> > > > None whatsoever.  Three people are reporting this and it's a drop-dead
> > > > showstopper for a 2.6.21 release so we just have to wait until someone
> > > > wakes up and thinks about it.
> > 
> > The topic should be "when CONFIG_HIGHMEM64G=y" imo.
> > 
> > I'll try to do my first bi-sect today.

Thanks.   Please always do reply-to-all.  Cc's restored (and added..)

> Bisecting went well, after 13 compiles this commit was found:
> 
> a1f3bb9ae4497a2ed3eac773fd7798ac33a0371f is first bad commit
> commit a1f3bb9ae4497a2ed3eac773fd7798ac33a0371f
> Author: Roland McGrath <[EMAIL PROTECTED]>
> Date:   Fri Jan 26 00:56:46 2007 -0800
> 
> [PATCH] Fix CONFIG_COMPAT_VDSO
> 
> I wouldn't mind if CONFIG_COMPAT_VDSO went away entirely.  But if it's 
> there,
> it should work properly.  Currently it's quite haphazard: both real vma 
> and
> fixmap are mapped, both are put in the two different AT_* slots, sysenter
> returns to the vma address rather than the fixmap address, and core dumps 
> yet
> are another story.
> 
> This patch makes CONFIG_COMPAT_VDSO disable the real vma and use the 
> fixmap
> area consistently.  This makes it actually compatible with what the old 
> vdso
> implementation did.
> 
> Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
> Cc: Ingo Molnar <[EMAIL PROTECTED]>
> Cc: Paul Mackerras <[EMAIL PROTECTED]>
> Cc: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
> Cc: Andi Kleen <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
> 
> :04 04 802ab3366a651ecba28c8677fa84a9f7c506392b
> f44adc4dcdab733e5965b68ccd0d643f0a550a80 M  arch
> :04 04 be1e217152d8b3fcd05f09aa2b3f4f9dcb8208aa
> 46cc86427e861350dd3fef9469474c55119f27ce M  include
> 
> I had both CONFIG_COMPAT_VDSO=y and CONFIG_HIGHMEM64G=y configured.
> Using a 4GB Supermicro 7044 SMP dual Xeon. Details upon request.
> 
> --
> Leroy
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 13/13] signalfd/timerfd/asyncfd v5 - KAIO asyncfd support (example/maybe-broken) ...

2007-03-14 Thread Benjamin LaHaise

On Wed, Mar 14, 2007 at 04:41:58PM -0700, Davide Libenzi wrote:
> Yeah, of course. I do not plan revolutions. Just asking if it's a possible 
> thing to do. I can mlock the userspace ring, if imposing that burden over 
> aio_complete() is seen as too heavy.

I'm not sure I follow what you're doing -- why isn't asyncfd merely calling 
io_getevents() instead of reinventing everything the ringbuffer does?  The 
aio ringbuffer is already locked in memory.  Fwiw, the aio ringbuffer was 
originally wired up to a file descriptor, but that gave way to the actual 
syscall in order to enforce proper typechecking and typical usage scenarios 
with timeouts.

Also, there have been patches floating around for aio_poll and a way to get 
epoll wakeups into the aio event queue.  They deserve serious consideration 
if this asyncfd seems necessary.

-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/8] x86 boot, pda and gdt cleanups

2007-03-14 Thread Rusty Russell

On Tue, 2007-03-13 at 13:48 -0700, Jeremy Fitzhardinge wrote:
> * init_gdt should always use write_gdt_entry when touching the gdt;
>   if it doesn't and it ends up touching an already-installed gdt
>   under Xen, it will get a write fault.  This happens because
>   init_gdt ends up getting called twice in SMP (see below).

Hmm, this invalidated my assumption that write_gdt_entry is always a
write to this cpu's active gdt.  Better fix is not to call it twice
anyway...

> * init_gdt should always be called before bringing up the cpu,
>   rather than by the cpu itself (and therefore, cpu_init() shouldn't
>   call it).  Obviously the the boot cpu is an exception.

Makes sense.

> * secondary_cpu_init stops being necessary.

Indeed.

> * On SMP, init_gdt can get called twice: first time in
>   smp_prepare_boot_cpu, and a second time in  trap_init.  On UP,
>   trap_init is the only caller.

Getting rid of the call in smp_prepare_boot_cpu currently works, but
it's fragile:  __get_cpu_var(x) && per_cpu(x, smp_processor_id()) will
differ, and changes made to __get_cpu_var(x) will vanish...

Fortunately, UP doesn't have to call init_gdt at all, so I think it's
better to place it in smp_prepare_boot_cpu only and then clean up the UP
code.  I'll try now...

Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 13/13] signalfd/timerfd/asyncfd v5 - KAIO asyncfd support (example/maybe-broken) ...

2007-03-14 Thread Davide Libenzi

On Wed, 14 Mar 2007, Davide Libenzi wrote:

> On Wed, 14 Mar 2007, Benjamin LaHaise wrote:
> 
> > On Wed, Mar 14, 2007 at 04:24:54PM -0700, Davide Libenzi wrote:
> > > Can you point me to a kernel path that ends up calling aio_complete() in 
> > > a 
> > > do-not-sleep mode?
> > 
> > If you remove that invariant, then it is very difficult for device drivers 
> > and other code to make use of aio_complete().
> > 
> > > The offender I see is drivers/usb/gadget/inode.c that calls it with a 
> > > spinlock held.
> > 
> > Which was from irq context last time I checked.

The drivers/usb/gadget/inode.c case seems to be easily fixeable AFAICS, in 
the ep_aio_complete() function.
I was more under the impression that aio_complete() was more of a tasklet 
kind of domain.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 13/13] signalfd/timerfd/asyncfd v5 - KAIO asyncfd support (example/maybe-broken) ...

2007-03-14 Thread Davide Libenzi

On Wed, 14 Mar 2007, Benjamin LaHaise wrote:

> On Wed, Mar 14, 2007 at 04:24:54PM -0700, Davide Libenzi wrote:
> > Can you point me to a kernel path that ends up calling aio_complete() in a 
> > do-not-sleep mode?
> 
> If you remove that invariant, then it is very difficult for device drivers 
> and other code to make use of aio_complete().
> 
> > The offender I see is drivers/usb/gadget/inode.c that calls it with a 
> > spinlock held.
> 
> Which was from irq context last time I checked.
> 
> > The aio_run_iocb function seem to release/reacquire the lock before 
> > calling aio_complete().
> 
> That implies nothing -- aio_complete() has to acquire ctx_lock and cannot 
> be called holding the lock.  Sure, it could probably be split into 
> __aio_complete() and have aio_complete() wrap it acquiring the lock.

Yeah, of course. I do not plan revolutions. Just asking if it's a possible 
thing to do. I can mlock the userspace ring, if imposing that burden over 
aio_complete() is seen as too heavy.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 13/13] signalfd/timerfd/asyncfd v5 - KAIO asyncfd support (example/maybe-broken) ...

2007-03-14 Thread Benjamin LaHaise

On Wed, Mar 14, 2007 at 04:24:54PM -0700, Davide Libenzi wrote:
> Can you point me to a kernel path that ends up calling aio_complete() in a 
> do-not-sleep mode?

If you remove that invariant, then it is very difficult for device drivers 
and other code to make use of aio_complete().

> The offender I see is drivers/usb/gadget/inode.c that calls it with a 
> spinlock held.

Which was from irq context last time I checked.

> The aio_run_iocb function seem to release/reacquire the lock before 
> calling aio_complete().

That implies nothing -- aio_complete() has to acquire ctx_lock and cannot 
be called holding the lock.  Sure, it could probably be split into 
__aio_complete() and have aio_complete() wrap it acquiring the lock.

-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /sys/devices/system/cpu/cpuX/online are missing

2007-03-14 Thread Giuliano Pochini

On Tue, 13 Mar 2007 09:56:52 +
Russell King <[EMAIL PROTECTED]> wrote:

> Right, here's the ARM fix which is now in the ARM tree:
> [...]


The following patch seems to fix the issue (+ minor style fix). I'm not sure
it's ok due to my poor knowledge of this code.


Signed-off-by: Giuliano Pochini <[EMAIL PROTECTED]>

--- linux-2.6.21rc3/arch/powerpc/kernel/setup_32.c__orig2007-03-15 
00:05:02.0 +0100
+++ linux-2.6.21rc3/arch/powerpc/kernel/setup_32.c  2007-03-15 
00:07:02.0 +0100
@@ -195,18 +195,22 @@ EXPORT_SYMBOL(nvram_sync);
 
 #endif /* CONFIG_NVRAM */
 
-static struct cpu cpu_devices[NR_CPUS];
+static DEFINE_PER_CPU(struct cpu, cpu_devices);
 
 int __init ppc_init(void)
 {
-   int i;
+   int cpu;
 
/* clear the progress line */
-   if ( ppc_md.progress ) ppc_md.progress(" ", 0x);
+   if (ppc_md.progress)
+   ppc_md.progress(" ", 0x);
 
/* register CPU devices */
-   for_each_possible_cpu(i)
-   register_cpu(_devices[i], i);
+   for_each_possible_cpu(cpu) {
+   struct cpu *c = _cpu(cpu_devices, cpu);
+   c->hotpluggable = 1;
+   register_cpu(c, cpu);
+   }
 
/* call platform init */
if (ppc_md.init != NULL) {


--
Giuliano.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SMP performance degradation with sysbench

2007-03-14 Thread Siddha, Suresh B

On Tue, Mar 13, 2007 at 05:08:59AM -0700, Nick Piggin wrote:
> I would agree that it points to MySQL scalability issues, however the
> fact that such large gains come from tcmalloc is still interesting.

What glibc version are you, Anton and others are using?

Does that version has this fix included?

Dynamically size mmap treshold if the program frees mmaped blocks.

http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/malloc/malloc.c.diff?r1=1.158=1.159=glibc

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 13/13] signalfd/timerfd/asyncfd v5 - KAIO asyncfd support (example/maybe-broken) ...

2007-03-14 Thread Davide Libenzi

On Wed, 14 Mar 2007, Benjamin LaHaise wrote:

> On Wed, Mar 14, 2007 at 03:19:21PM -0700, Davide Libenzi wrote:
> > +   /*
> > +* Check if the user asked us to deliver the result through an
> > +* asyncfd. Note that asyncfd_add_results() may sleep. It seems
> > +* OK looking at the code, but I'm not sure since inside a USB driver,
> > +* aio_complete() is called with a spinlock held. !!CHECK
> > +*/
> 
> That won't work.  aio_complete() is supposed to be irq safe.

Can you point me to a kernel path that ends up calling aio_complete() in a 
do-not-sleep mode?
The offender I see is drivers/usb/gadget/inode.c that calls it with a 
spinlock held.
The aio_run_iocb function seem to release/reacquire the lock before 
calling aio_complete().

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 953 matches

Mail list logo