date:20070307

Re: 2.6.19: ACPI reports AC not present after resume from STD

2007-03-07 Thread Andrey Borzenkov

On Tuesday 06 March 2007, Rafael J. Wysocki wrote:
> [changed Cc list]
>
> On Sunday, 25 February 2007 18:14, Andrey Borzenkov wrote:
> > On Воскресенье 25 февраля 2007, Rafael J. Wysocki wrote:
> > > On Sunday, 25 February 2007 11:37, Andrey Borzenkov wrote:
> > > > On Воскресенье 25 февраля 2007, Rafael J. Wysocki wrote:
> > > > > On Sunday, 25 February 2007 00:26, Andrey Borzenkov wrote:
> > > > > > On Суббота 24 февраля 2007, Rafael J. Wysocki wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Saturday, 24 February 2007 10:55, Andrey Borzenkov wrote:
> > > > > > > > On Вторник 13 февраля 2007, Andrey Borzenkov wrote:
> > > > > > > > > On Четверг 07 декабря 2006, Lebedev, Vladimir P wrote:
> > > > > > > > > > Please register new bug, attach acpidump and dmesg.
> > > > > > > > >
> > > > > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=7995
> > > > > > > > >
> > > > > > > > > regards
> > > > > > > >
> > > > > > > > Well, this starts looking like ACPI is not at fault.
> > > > > > > >
> > > > > > > > When reporting AC state ACPI just reads contents of system
> > > > > > > > memory (I presume it gets updated by BIOS/ACPI when AC state
> > > > > > > > changes). It looks like this memory area is restored during
> > > > > > > > resume from STD. I updated mentioned bug report with more
> > > > > > > > detailed description. Now if someone could suggest a way to
> > > > > > > > catch if specific physical address gets saved/restored this
> > > > > > > > would finally explain it.
> > > > > > >
> > > > > > > First, if you want the reserved memory areas to be left alone
> > > > > > > by swsusp, you need to mark them as 'nosave'.  On x86_64 this
> > > > > > > is done by the function e820_mark_nosave_range() in
> > > > > > > arch/x86_64/kernel/e820.c that can be ported to i386 with no
> > > > > > > problems.  However, we haven't found that very useful, so far,
> > > > > > > since no one has ever reported any problems with the current
> > > > > > > approach, which is to save and restore them.
> > > > > >
> > > > > > Well, the following proof of concept patch fixes this issue for
> > > > > > me. Please notice that original version of
> > > > > > e820_mark_nosave_range() could fail to exclude some areas due to
> > > > > > alignment issues (exactly what happened to me on first try) so it
> > > > > > still can explain your problem too.
> > > > >
> > > > > Great job, thanks for the patch!  It looks good, so I'm going to
> > > > > forward it for merging.
> > > >
> > > > Please no; I'm currently testing slightly more polished version; I
> > > > will send it later.
> > >
> > > OK
> > >
> > > > Could anybody explain (or give pointer to) what happens which region
> > > > that is not page-aligned? In particular, the very first one:
> > > >
> > > >  BIOS-e820:  - 0009fc00 (usable)
> > > >  BIOS-e820: 0009fc00 - 000a (reserved)
> > > >
> > > > Will the kernel allocate partial page (how?) or will the kernel
> > > > ignore last (first) incomplete page? In the former case how those
> > > > incomplete pages can be detected?
> > >
> > > Well, on x86_64, if I understand e820_register_active_regions()
> > > correctly, the partial pages won't be registered.
> >
> > It appears that for low memory kernel will ignore incomplete pages for
> > sure. I hope it does the same for high memory - but for now I just throw
> > this in and pray :) This also significantly simplifies patch.
>
> Well, can you please check if the appended modification of your patch still
> works?
>

It works for me with caveat

/home/bor/src/linux-git/arch/i386/kernel/e820.c: In 
function ‘e820_mark_nosave_range’:
/home/bor/src/linux-git/arch/i386/kernel/e820.c:328: warning: format ‘%016Lx’ 
expects type ‘long long unsigned int’, but argument 2 has type ‘long unsigned 
int’
/home/bor/src/linux-git/arch/i386/kernel/e820.c:328: warning: format ‘%016Lx’ 
expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned 
int’

regards 

-andrey

> Thanks,
> Rafael
>
>
> ---
>  arch/i386/kernel/e820.c  |   47
> +++ arch/i386/kernel/setup.c | 
>   1 +
>  include/asm-i386/e820.h  |1 +
>  3 files changed, 49 insertions(+)
>
> Index: linux-2.6.21-rc2/arch/i386/kernel/e820.c
> ===
> --- linux-2.6.21-rc2.orig/arch/i386/kernel/e820.c
> +++ linux-2.6.21-rc2/arch/i386/kernel/e820.c
> @@ -313,6 +313,53 @@ static int __init request_standard_resou
>
>  subsys_initcall(request_standard_resources);
>
> +/*
> + * Mark pages corresponding to given pfn range as 'nosave'.
> + */
> +static void __init
> +e820_mark_nosave_range(unsigned long start_pfn, unsigned long end_pfn)
> +{
> + unsigned long pfn;
> +
> + if (start_pfn >= end_pfn)
> + return;
> +
> + printk("Nosave address range: %016Lx - %016Lx\n",
> + PFN_PHYS(start_pfn), PFN_PHYS(end_pfn));
> + for (pfn =

Re: [PATCH 0/20] x86_64 Relocatable bzImage support (V4)

2007-03-07 Thread Vivek Goyal

On Thu, Mar 08, 2007 at 10:15:02AM +1100, Nigel Cunningham wrote:
> Hi.
> 
> On Thu, 2007-03-08 at 07:49 +1100, Nigel Cunningham wrote:
> > Hi.
> > 
> > On Wed, 2007-03-07 at 07:07 -0800, Arjan van de Ven wrote:
> > > On Wed, 2007-03-07 at 12:27 +0530, Vivek Goyal wrote:
> > > > Hi,
> > > > 
> > > > Here is another attempt on x86_64 relocatable bzImage patches(V4). This
> > > > patchset makes a bzImage relocatable and same kernel binary can be 
> > > > loaded
> > > > and run from different physical addresses.
> > > 
> > > 
> > > have these patches been extensively tested with various suspend
> > > scenarios? (S1,S3,S4 in acpi speak or s2ram and s2disk in Linux speak)
> > 
> > We did work on this for RHEL5, getting relocatable kernel support
> > working fine with S4. While doing it and since, I've been running
> > Suspend2 with the same patch.
> > 
> > Since that work, Vivek has done more modifications, but I can confirm
> > that the basic design is reliable with S4. Haven't tried S3, but can do.
> > Will report back shortly.
> 
> S3 works okay here with a relocatable x86_64 kernel (2.6.20).
> 

Ok. Got hold of a system which supports Standby mode (S1) and it works fine
with 2.6.21-rc2 + relocatable patchset.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Wanted: simple, safe x86 stack overflow detection

2007-03-07 Thread Avi Kivity


Bill Irwin wrote:

On Tue, 2007-03-06 at 22:44 -0800, Bill Irwin wrote:
  

What do you see as the obstacle to eliminating nested IRQ's?
  


On Wed, Mar 07, 2007 at 04:34:52AM -0800, Arjan van de Ven wrote:
  

political will, or maybe just the lack of convincing people so far



Political issues are significantly more difficult to resolve than
technical ones.


On Tue, 2007-03-06 at 22:44 -0800, Bill Irwin wrote:
  

 It doesn't
seem so far out to test for being on the interrupt stack and defer the
call to do_IRQ() until after the currently-running instance of do_IRQ()
has returned, or to move to per-irq stacks modulo special arrangements
for the per-cpu IRQ's. Or did you have other methods in mind?
  


On Wed, Mar 07, 2007 at 04:34:52AM -0800, Arjan van de Ven wrote:
  

it's simpler...
irqreturn_t handle_IRQ_event(unsigned int irq, struct irqaction *action)
{ 
irqreturn_t ret, retval = IRQ_NONE;

unsigned int status = 0;

handle_dynamic_tick(action);
   
if (!(action->flags & IRQF_DISABLED))

local_irq_enable_in_hardirq();

just removing the if() and the explicit IRQ enabling already makes irqs no 
longer nest...



I can see why that would raise eyebrows. I can see getting bashed
mercilessly with interrupt latency concerns as a result here. Can you
suggest any defenses?
  


I don't understand why interrupt latency suffers.  Sure, the interrupt 
that's being masked is delayed, but on the other hand the interrupt 
that's doing the masking is not.  We're moving the latency from the 
first interrupt to the second, probably with a slight gain in overall 
throughput.


It *does* matter if the interrupts have meaningful priorities.  Is that 
the case here?


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH][SCTP] Re: lockdep: inconsistent lock state ipv6_add_addr/sctp_v6_copy_addrlist (2.6.21-rc1)

2007-03-07 Thread Jarek Poplawski

On 25-02-2007 10:08, Simon Arlott wrote:
> This happens on every boot if more information is needed:
> 
> [   37.393715] =
> [   37.393830] [ INFO: inconsistent lock state ]
> [   37.393881] 2.6.21-rc1-git #146
> [   37.393929] -
> [   37.393979] inconsistent {softirq-on-R} -> {in-softirq-W} usage.
> [   37.394040] hotplug/1072 [HC0[0]:SC1[2]:HE1:SE0] takes:
> [   37.394092]  (>lock){-+-?}, at: [] 
> ipv6_add_addr+0x164/0x1e0
> [   37.394308] {softirq-on-R} state was registered at:
> [   37.394359]   [] __lock_acquire+0x622/0xbb0
> [   37.394515]   [] lock_acquire+0x62/0x80
> [   37.394678]   [] _read_lock+0x35/0x50
> [   37.394834]   [] sctp_v6_copy_addrlist+0x30/0xc0
...

[SCTP] ipv6: inconsistent lock state ipv6_add_addr/sctp_v6_copy_addrlist

lockdep found that dev->lock taken from softirq in ipv6_add_addr
is also taken in sctp_v6_copy_addrlist with softirqs enabled, so
lockup is possible.

Noticed-by: Simon Arlott <[EMAIL PROTECTED]>
Signed-off-by: Jarek Poplawski <[EMAIL PROTECTED]>

---

diff -Nurp linux-2.6.21-rc2-mm2-/net/sctp/ipv6.c 
linux-2.6.21-rc2-mm2/net/sctp/ipv6.c
--- linux-2.6.21-rc2-mm2-/net/sctp/ipv6.c   2007-02-21 19:46:49.0 
+0100
+++ linux-2.6.21-rc2-mm2/net/sctp/ipv6.c2007-03-07 21:57:37.0 
+0100
@@ -360,7 +360,7 @@ static void sctp_v6_copy_addrlist(struct
return;
}
 
-   read_lock(_dev->lock);
+   read_lock_bh(_dev->lock);
for (ifp = in6_dev->addr_list; ifp; ifp = ifp->if_next) {
/* Add the address to the local list.  */
addr = t_new(struct sctp_sockaddr_entry, GFP_ATOMIC);
@@ -374,7 +374,7 @@ static void sctp_v6_copy_addrlist(struct
}
}
 
-   read_unlock(_dev->lock);
+   read_unlock_bh(_dev->lock);
rcu_read_unlock();
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fix BUG_ON check at move_freepages() (Re: 2.6.21-rc3-mm2)

2007-03-07 Thread Yasunori Goto


Hello.

The BUG_ON() check at move_freepages() is wrong.
Its end_page is start_page + MAX_ORDER_NR_PAGES. So, it can be 
next zone. BUG_ON() should check "end_page - 1".

This is fix of 2.6.21-rc3-mm2 for it.

Signed-off-by: Yasunori Goto <[EMAIL PROTECTED]>

---
 mm/page_alloc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: current_test/mm/page_alloc.c
===
--- current_test.orig/mm/page_alloc.c   2007-03-08 15:44:10.0 +0900
+++ current_test/mm/page_alloc.c2007-03-08 16:17:29.0 +0900
@@ -707,7 +707,7 @@ int move_freepages(struct zone *zone,
unsigned long order;
int blocks_moved = 0;
 
-   BUG_ON(page_zone(start_page) != page_zone(end_page));
+   BUG_ON(page_zone(start_page) != page_zone(end_page - 1));
 
for (page = start_page; page < end_page;) {
if (!PageBuddy(page)) {

-- 
Yasunori Goto 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-07 Thread Jean Delvare

On Tue, 6 Mar 2007 23:45:29 -0800, Andrew Morton wrote:
> On Wed, 07 Mar 2007 15:39:27 +0800 "Wu, Bryan" <[EMAIL PROTECTED]> wrote:
> 
> > Thanks a lot, could you please give me a script just to kill this
> > whitespace? So I can do it before sending you patches.
> 
> 
> Is pretty simple:
> 
> #!/bin/sh
> #
> # Strip any trailing whitespace which a unified diff adds.
> #
> 
> strip1()
> {
>   TMP=$(mktemp /tmp/XX)
>   cp $1 $TMP
>   sed -e '/^+/s/[ ]*$//' < $TMP > $1
>   rm $TMP
> }
> 
> for i in $*
> do
>   strip1 $i
> done
> 
> 
> that'll be in
> http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.20/patch-scripts-0.20.tar.gz
> too

Alternatively, you can use quilt [1] to manage your patches and enable
the --strip-trailing-whitespace option by default.

[1] http://savannah.nongnu.org/projects/quilt/

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc1 and 2.6.21-rc2 kwin dies silently

2007-03-07 Thread Sid Boyce


Andrew Morton wrote:

(cc restored.  Please always do reply-to-all)

  

On Wed, 28 Feb 2007 18:05:13 +0200 [EMAIL PROTECTED] wrote:
On Wednesday 28 February 2007 17:19, Sid Boyce wrote:


openSUSE 10.3 Alpha and KDE-3.5.6, xorg-x11-7.2. KDE is setup not to
require a password to unlock, but it asks for password. When the screen
unlocks, kwin is gone with no errors logged in /var/log/kdm or
/var/log/messages. No problems with 2.6.20.

Same problem on openSUSE 10.2 x86_64, KDE-3.5.5 and 2.6.21-rc2.
Regards
Sid.
  
This is the linux kernel mailing list. Perhaps you should post your problem to 
the opensuse mailing list.



2.6.20 worked.

2.6.20-rc2 did not.

Working theory: the kernel broke.

Sid, the chances that anyone can work out what caused this are pretty low. 
It would be great if you could perform a git bisection search sometime in

the next few weeks, work out which commit caused this.

Thanks.



  

I shall go back to 2.6.20-git3 and work forward. Up to 2.6.20-git2 was OK.
Regards
Sid.

--
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support Specialist, 
Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Eric Dumazet


Kyle Moffett a écrit :


Prefetching is also fairly critical on a Power4 or G5 PowerPC system as 
they have a long memory latency; an L2-cache miss can cost 200+ cycles.  
On such systems the "dcbt" prefetch instruction brings in a single 
128-byte cacheline and has no serializing effects whatsoever, making it 
ideal for use in a linked-list-traversal inner loop.


OK, 200 cycles...

But what is the cost of the conditional branch you added in prefetch(x) ?

if (!x) return;

(correctly predicted or not, but do powerPC have a BTB ?)

About the NULL 'potential problem', maybe we could use a dummy nil (but 
mapped) object, and use its address in lists, ie compare for  instead of 
NULL. This would avoid :


- The conditional test in some prefetch() implementations
- The potential TLB problem with the NULL value.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

2007-03-07 Thread Thomas Gleixner

On Wed, 2007-03-07 at 17:01 -0800, Daniel Arai wrote:
> Thomas Gleixner wrote:
> 
> > You managed to avoid the usage of other code (i.e. PIT / HPET) already,
> > so why is it sooo desireable to emulate apics instead of substituting it
> > by a small and sane replacement ? Just because you happen to have an
> > LAPIC emulator ? That's no reason to wire yourself into the kernel code
> > and make it harder to change and maintain.
> 
> There are several reasons why it's desirable to emulate the APIC.  As you 
> mentioned, we already have APIC emulation, and APIC emulation isn't a huge 
> bottleneck on most workloads.  Our code works, the Linux code works, and 
> replacing both pieces of code with something "small and sane" isn't going to 
> improve performance very much, so why bother?  Any hypervisor implementation 
> is 
> going to be a tradeoff between what's easy to implement in the hypervisor, 
> what's easy to implement in the guest operating system, and what's 
> performance 
> critical.

It is not about performance. It is about maintainability. 

> Secondly, not all (para-)virtualized operating systems will want to use 
> abstracted devices.  Some virtual operating systems will be given direct 
> access 
> to hardware devices, and will need to run the actual driver for that device 
> and 
> not some abstracted device driver.  So I don't buy your argument that every 
> piece of the kernel that interacts with a paravirtualized driver should have 
> a 
> "small and sane replacement."

Err. We talk about paravirtualized Linux and not about what you have to
emulate to get Windows running. I don't care at all. Do you really
expect that we have to accept your design decisions, just because they
allow you to make your life easy ? This is exactly what you are using
paravirt ops for: a backdoor to throw your hackery at the kernel and
leave us with the mess of hardwired crap.

> But more importantly, we want a kernel that can run both on native hardware 
> and 
> in a paravirtualized environment.  Linux doesn't really provide abstractions 
> for 
>   replacing the appropriate code.  We tried to hook into the source code at a 
> level that seemed possible.

Again. You just refuse to change your implementation and you want to
keep it by arguing how hard it is because there are no abstractions.

I went through the business of creating abstractions into hardwired
hairballs twice. I know exactly what I'm talking about. It _IS_ hard
work, but at the end it makes the code better and more maintainable. You
do nothing for that, but expect that we live with your addons to the
hairball.

> There's no good way to override __send_IPI_shortcut.  I suppose we could add 
> paravirt ops for __send_IPI_shortcut and every other op that touches the 
> APIC. 
> But there are dozens of functions in apic.c that would need to be included in 
> paravirt ops.  And for our implementation, we really just want to override 
> apic_read and apic_write, since we can make these faster when done through 
> hypercalls than through memory accesses.  If we were to make these paravirt 
> ops, 
> their implementations would be the same, except with a different apic_read 
> and 
> apic_write.  This is a whole lot of useless code duplication.

No it is not. #include  is an abstraction and
__send_IPI ... is the i386 low level implementation.

You insist to hook yourself into the low level code instead of hooking
into the high level code, because it is _YOUR_ implementation and we
have to accept it as is.

This is the completely wrong way. We get the same crap and discussion
for every other architecture we are going to support with paravirt ops.
And probably for every other hypervisor implementation, which has a
different way of doing things.

> Most of the interrupt system is not written in such a way that multiple APICs 
> implementations can be selected from at boot time.  This is an absolute 
> requirement so that the same kernel can boot on native and in a 
> paravirtualized 
> environment.  While this could be implemented, it seems like a waste of time, 
> since we can just emulate something similar to a real interrupt system and 
> not 
> change things very much.

Waste of your precious time. I'm working on low level code and
abstractions and from now on I have also to take care not to break
_YOUR_ implementation. You are going to waste _MY_ time and I'm going to
fight that forever.

Your prayer wheel argument of missing abstractions and easiness of
emulating things is annoying. If you think it is better to emulate APIC,
please emulate it without paravirt ops. If you want the speed
improvement, work with us to create the interfaces and abstractions
which are necessary to have a sane, maintainable and useful for all
hypervisors implementation.

tglx

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.21-rc2-mm2

2007-03-07 Thread Mike Galbraith

On Wed, 2007-03-07 at 11:52 -0800, Andrew Morton wrote:
> On Wed, 7 Mar 2007 16:46:20 -0300 "Luiz Fernando N. Capitulino" <[EMAIL 
> PROTECTED]> wrote:
> 
> > Em Tue, 6 Mar 2007 00:44:08 -0800
> > Andrew Morton <[EMAIL PROTECTED]> escreveu:
> > 
> > | 
> > | Temporarily at
> > | 
> > |   http://userweb.kernel.org/~akpm/2.6.21-rc2-mm2/
> > | 
> > | Will appear later at
> > | 
> > |   
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc2/2.6.20-rc2-mm2/
> > 
> >  Getting this while rebooting:
> > 
> > [  166.588469] BUG: atomic counter underflow at:
> > [  166.588527]  [] show_trace_log_lvl+0x1a/0x30
> > [  166.588632]  [] show_trace+0x12/0x20
> > [  166.588730]  [] dump_stack+0x16/0x20
> > [  166.588828]  [] kref_put+0xa1/0x100
> > [  166.588927]  [] kobject_put+0x14/0x20
> > [  166.589027]  [] kobject_unregister+0x22/0x30
> > [  166.589127]  [] bus_remove_driver+0x79/0x90
> > [  166.589227]  [] driver_unregister+0xb/0x20
> > [  166.589327]  [] pci_unregister_driver+0x13/0x70
> > [  166.589428]  [] alsa_card_via82xx_exit+0xd/0xf [snd_via82xx]
> > [  166.589534]  [] sys_delete_module+0x140/0x1b0
> > [  166.589635]  [] sysenter_past_esp+0x5f/0x99
> > [  166.589734]  ===
> > 
> 
> Me too.  Greg has reverted the offenging commit, so now rmmod of the IPMI
> driver locks the machine again.

Hi,

The hang (which /me screwed up fixing) isn't upon rmmod, it's when IPMI
is built-in and ipmi_si finds nobody home.  Driver tries to back out,
and waits forever for completion. (about 0.7 seconds into boot)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Locking function (interrupt handler) in the L1/L2 cache

2007-03-07 Thread Parav K Pandit

Hi,

I have MPC8548 Linux based firewall which will mostly do packet processing
for 80% time.
So obviously most of the time it will RX and TX packets through gianfar
Ethernet driver.

I want to lock my interrupt handler of this driver in the L1 cache.

1. Are there any kernel APIs to lock any function and data in the L1/L2
cache?

2. How can I use "icbtls" - Instruction Cache Block Touch and Lock Set" for
locking my interrupt handler?

3. Is "icbtls" is the correct instruction at which I am looking at?

4. How do I find end address of the interrupt handler or any other function
and how do we pass it to cache locking instructions? (Because it can happen
that interrupt handler size is more than a cache line, not aligned etc)?

5. Can we enhance request_irq() function to take an additional parameter to
lock the interrupt handler in the cache?

I understand that if my interrupt handler is going to be called most of the
time then it is very likely to happen that OS will not flush the same, but
there is no guarantee for it.

Regards,
Parav Pandit


DISCLAIMER:
This message (including attachment if any) is confidential and may be 
privileged. Before opening attachments please check them for viruses and 
defects. MindTree Consulting Limited (MindTree) will not be responsible for any 
viruses or defects or any forwarded attachments emanating either from within 
MindTree or outside. If you have received this message by mistake please notify 
the sender by return  e-mail and delete this message from your system. Any 
unauthorized use or dissemination of this message in whole or in part is 
strictly prohibited.  Please note that e-mails are susceptible to change and 
MindTree shall not be liable for any improper, untimely or incomplete 
transmission.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SATA resume slowness, e1000 MSI warning

2007-03-07 Thread Eric W. Biederman

Andrew Morton <[EMAIL PROTECTED]> writes:

>
> That's:
>
> pci_restore_pcix_state(dev);
> pci_restore_msi_state(dev);
> WARN_ON(!hlist_empty(>saved_cap_space));
>
> return 0;

Hmm.  Either I am confused of I just found an unanticipated leak.

pci_restore_msi_state should be out of the picture as we don't yet
have ppc msi support and I don't think the g5 generation hardware
supported it either.

The only case I can see which might trigger this is if we saved
pci-X state and then didn't restore it because we could not find
the capability on restore.

Any chance you could walk that list and find the cap_nr of the remaining
element?  

Something like:
{
struct pci_cap_saved_state *tmp;
struct hlist_node *pos;

hlist_for_each_entry(tmp, pos, _dev->saved_cap_space, next)
printk(KERN_INFO "saved_cap: 0x%02x\n", tmp->cap_nr);
}

Until I get the best scenario I can come up with is a tg3 hardware bug
that doesn't renable the pci-X capability after a restore of power state.

Getting that cap_nr will at least allow me to be certain if I am dealing
with msi, pci-X or pci-e.

Unanticipated bugs aren't supposed to be this easy to find!

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 2.6.16.43

2007-03-07 Thread Adrian Bunk

New hwmon drivers since 2.6.16.42 for the following hardware:
- National Semiconductor pc87427
- SMSC lpc47m192 and lpc47m997
- Winbond w83791d


Location:
ftp://ftp.kernel.org/pub/linux/kernel/v2.6/

git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.16.y.git


Changes since 2.6.16.42:

Adrian Bunk (3):
  Linux 2.6.16.43-rc1
  fs/bad_inode.c 64bit fix
  Linux 2.6.16.43

Alexey Dobriyan (1):
  [IPV4/IPV6] multicast: Check add_grhead() return value

Charles Spirakis (2):
  HWMON: w83791d: New hardware monitoring driver for the Winbond W83791D
  w83791d: Documentation update

Francois Romieu (1):
  sis190: failure to set the MAC address from EEPROM

Hartmut Rick (1):
  smsc47m192: New hwmon driver for SMSC LPC47M192/997

Ilpo Järvinen (1):
  [TCP]: Prevent pseudo garbage in SYN's advertized window

Jean Delvare (3):
  hwmon: New PC87427 hardware monitoring driver
  hwmon: Add support for the Winbond W83687THF
  i2c-isa: Restore driver owner

Jim Cromie (2):
  hwmon: Allow sensor attributes arrays
  hwmon: Refactor SENSOR_DEVICE_ATTR_2

Jordan Crouse (1):
  hwmon lm83: Add LM82 support

Kirill Korotaev (1):
  fix ext3 block bitmap leakage

Marcel Siegert (1):
  V4L/DVB: Dvbdev: fix illegal re-usage of fileoperations struct

Martin Devera (1):
  I2C: i2c-piix4: Add Broadcom HT-1000 support

Patrick McHardy (1):
  [DECNET]: Fix sfuzz hanging on 2.6.18

Rudolf Marek (1):
  i2c-piix4: Add ATI IXP200/300/400 support

Stephen Hemminger (6):
  sky2: fix ram buffer allocation settings
  sky2: allow multicast pause frames
  sky2: fix for use on big endian
  sky2: more stats
  sky2: add more pci ids
  sky2: email and version change.


 Documentation/hwmon/lm83|   16 
 Documentation/hwmon/pc87427 |   38 
 Documentation/hwmon/smsc47m192  |  102 ++
 Documentation/hwmon/sysfs-interface |6 
 Documentation/hwmon/w83627hf|4 
 Documentation/hwmon/w83791d |  120 ++
 Documentation/i2c/busses/i2c-piix4  |4 
 Makefile|2 
 drivers/hwmon/Kconfig   |   57 +
 drivers/hwmon/Makefile  |3 
 drivers/hwmon/it87.c|1 
 drivers/hwmon/lm78.c|1 
 drivers/hwmon/lm83.c|   50 -
 drivers/hwmon/pc87360.c |1 
 drivers/hwmon/pc87427.c |  627 +
 drivers/hwmon/sis5595.c |1 
 drivers/hwmon/smsc47b397.c  |1 
 drivers/hwmon/smsc47m1.c|1 
 drivers/hwmon/smsc47m192.c  |  648 ++
 drivers/hwmon/via686a.c |1 
 drivers/hwmon/vt8231.c  |1 
 drivers/hwmon/w83627ehf.c   |1 
 drivers/hwmon/w83627hf.c|   73 +
 drivers/hwmon/w83781d.c |1 
 drivers/hwmon/w83791d.c | 1256 
 drivers/i2c/busses/Kconfig  |9 
 drivers/i2c/busses/i2c-piix4.c  |   10 
 drivers/media/dvb/dvb-core/dvbdev.c |   13 
 drivers/net/sis190.c|2 
 drivers/net/sky2.c  |  146 ++-
 fs/bad_inode.c  |8 
 fs/ext3/inode.c |1 
 include/linux/hwmon-sysfs.h |   24 
 include/linux/pci_ids.h |4 
 net/decnet/af_decnet.c  |4 
 net/ipv4/igmp.c |2 
 net/ipv4/tcp_output.c   |4 
 net/ipv6/mcast.c|2 
 38 files changed, 3130 insertions(+), 115 deletions(-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

2007-03-07 Thread Thomas Gleixner

On Wed, 2007-03-07 at 17:23 -0800, Jeremy Fitzhardinge wrote:
> Daniel Arai wrote:
> > But more importantly, we want a kernel that can run both on native hardware 
> > and 
> > in a paravirtualized environment.  Linux doesn't really provide 
> > abstractions for 
> >   replacing the appropriate code.  We tried to hook into the source code at 
> > a 
> > level that seemed possible.
> >   
> 
> Xen doesn't support any kind of apic emulation, so we'll need to hook
> anything which relies on an apic.  The ipi code you quote below will
> probably be one of those.
> 
> My opinion is that pv_ops shouldn't have raw apic operations, but
> instead have appropriate high-level interfaces to achieve the same
> ends.  Zach's counter-argument was basically your's: that the VMI code
> will use a lot of the native code except for the actual apic operations.
> 
> I can live with VMI emulating apics if it wants, so long as it does it
> in private and doesn't make a big scene about it.  We'll need the
> high-level interfaces regardless.

I can't because it reaches out into non private parts of the low level
implementation and is not helping to distangle things and making the
overall code better. No it forces its own view of the world on us
without giving us anything back.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman

Matt Helsley <[EMAIL PROTECTED]> writes:

> On Thu, 2007-03-08 at 16:32 +-1300, Sam Vilain wrote:
>
> +ADw-snip+AD4
>
> +AD4 Kirill, 06032418:36+-03:
> +AD4 +AD4 I propose to use +ACI-namespace+ACI naming.
> +AD4 +AD4 1. This is already used in fs.
> +AD4 +AD4 2. This is what IMHO suites at least OpenVZ/Eric
> +AD4 +AD4 3. it has good acronym +ACI-ns+ACI.
> +AD4 
> +AD4 Right.  So, now I'll also throw into the mix:
> +AD4 
> +AD4   - resource groups (I get a strange feeling of d+AOk-j+AOA v+APo there)
>
> +ADw-offtopic+AD4
> Re: d+AOk-j+AOA v+APo: yes+ACE
>
> It's like that Star Trek episode ... except we can't agree on the name
> of the impossible particle we will invent which solves all our problems.
> +ADw-/offtopic+AD4
>
> At the risk of prolonging the agony I hate to ask: are all of these
> groupings really concerned with +ACI-resources+ACI?
>
> +AD4   - supply chains (think supply and demand)
> +AD4   - accounting classes
>
> CKRM's use of the term +ACI-class+ACI drew negative comments from Paul Jackson
> and Andrew Morton about this time last year. That led to my suggestion
> of +ACI-Resource Groups+ACI. Unless they've changed their minds...
>
> +AD4 Do any of those sound remotely close?  If not, your turn :)
>
> I'll butt in here: task groups? task sets? confuselets? +ADs)

Generically we can use subsystem now for the individual pieces without
confusing anyone.

I really don't much care as long as we don't start redefining
container as something else.  I think the IBM guys took it from
solaris originally which seems to define a zone as a set of
isolated processes (for us all separate namespaces).  And a container
as a set of as a zone that uses resource control.  Not exactly how
we have been using the term but close enough not to confuse someone.

As long as we don't go calling the individual subsystems or the
process groups they need to function a container I really don't care.

I just know that if we use container for just the subsystem level
it makes effective communication impossible, and code reviews
essentially impossible.  As the description says one thing the
reviewer reads it as another and then the patch does not match
the description.  Leading to NAKs.

Resource groups at least for subset of subsystems that aren't
namespaces sounds reasonable.  Heck resource group, resource
controller, resource subsystem, resource just about anything seems
sane to me.

The important part is that we find a vocabulary without doubly
defined words so we can communicate and a small common set we can
agree on so people can work on and implement the individual
resource controllers/groups, and get the individual pieces merged
as they are reading.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] chaostables

2007-03-07 Thread Jan Engelhardt

Hello netfilter-devel,


I would like to submit chaostables (v0.5_svn23) for inclusion. Primary 
use is to detect, spoof and slowdown various sorts of port scans.
Implementation details can be found at http://jengelh.hopto.org/p/chaostables/

If you have any comments or suggestions, do not hestitate to 
let me know.


Signed-off-by: Jan Engelhardt <[EMAIL PROTECTED]>

---

 include/linux/netfilter/x_tables.h|2 
 include/linux/netfilter/xt_CHAOS.h|   14 +
 include/linux/netfilter/xt_portscan.h |8 +
 net/netfilter/Kconfig |   12 +
 net/netfilter/Makefile|3 
 net/netfilter/x_tables.c  |   12 +
 net/netfilter/xt_CHAOS.c  |  184 +++
 net/netfilter/xt_DELUDE.c |  259 
 net/netfilter/xt_portscan.c   |  271 ++
 9 files changed, 765 insertions(+)

Index: linux-2.6.21-rc3/include/linux/netfilter/x_tables.h
===
--- linux-2.6.21-rc3.orig/include/linux/netfilter/x_tables.h
+++ linux-2.6.21-rc3/include/linux/netfilter/x_tables.h
@@ -292,6 +292,8 @@ extern struct xt_table_info *xt_replace_
  int *error);
 
 extern struct xt_match *xt_find_match(int af, const char *name, u8 revision);
+extern struct xt_match *xt_request_find_match(int af, const char *name,
+   u8 revision);
 extern struct xt_target *xt_find_target(int af, const char *name, u8 revision);
 extern struct xt_target *xt_request_find_target(int af, const char *name, 
u8 revision);
Index: linux-2.6.21-rc3/include/linux/netfilter/xt_CHAOS.h
===
--- /dev/null
+++ linux-2.6.21-rc3/include/linux/netfilter/xt_CHAOS.h
@@ -0,0 +1,14 @@
+#ifndef _LINUX_XT_CHAOS_H
+#define _LINUX_XT_CHAOS_H 1
+
+enum xt_chaos_variant {
+   XTCHAOS_NORMAL,
+   XTCHAOS_TARPIT,
+   XTCHAOS_DELUDE,
+};
+
+struct xt_chaos_info {
+   enum xt_chaos_variant variant;
+};
+
+#endif /* _LINUX_XT_CHAOS_H */
Index: linux-2.6.21-rc3/include/linux/netfilter/xt_portscan.h
===
--- /dev/null
+++ linux-2.6.21-rc3/include/linux/netfilter/xt_portscan.h
@@ -0,0 +1,8 @@
+#ifndef _LINUX_XT_PORTSCAN_H
+#define _LINUX_XT_PORTSCAN_H 1
+
+struct xt_portscan_info {
+   unsigned int match_stealth, match_syn, match_cn, match_gr;
+};
+
+#endif /* _LINUX_XT_PORTSCAN_H */
Index: linux-2.6.21-rc3/net/netfilter/Kconfig
===
--- linux-2.6.21-rc3.orig/net/netfilter/Kconfig
+++ linux-2.6.21-rc3/net/netfilter/Kconfig
@@ -286,6 +286,14 @@ config NETFILTER_XTABLES
 
 # alphabetically ordered list of targets
 
+config NETFILTER_XT_TARGET_CHAOS
+   tristate '"CHAOS" target support'
+   depends on NETFILTER_XTABLES
+
+config NETFILTER_XT_TARGET_DELUDE
+   tristate '"DELUDE" target support'
+   depends on NETFILTER_XTABLES
+
 config NETFILTER_XT_TARGET_CLASSIFY
tristate '"CLASSIFY" target support'
depends on NETFILTER_XTABLES
@@ -562,6 +570,10 @@ config NETFILTER_XT_MATCH_POLICY
 
  To compile it as a module, choose M here.  If unsure, say N.
 
+config NETFILTER_XT_MATCH_PORTSCAN
+   tristate '"portscan" match support'
+   depends on NETFILTER_XTABLES && NF_CONNTRACK
+
 config NETFILTER_XT_MATCH_MULTIPORT
tristate "Multiple port match support"
depends on NETFILTER_XTABLES
Index: linux-2.6.21-rc3/net/netfilter/Makefile
===
--- linux-2.6.21-rc3.orig/net/netfilter/Makefile
+++ linux-2.6.21-rc3/net/netfilter/Makefile
@@ -37,8 +37,10 @@ obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_co
 obj-$(CONFIG_NETFILTER_XTABLES) += x_tables.o xt_tcpudp.o
 
 # targets
+obj-$(CONFIG_NETFILTER_XT_TARGET_CHAOS) += xt_CHAOS.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CLASSIFY) += xt_CLASSIFY.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CONNMARK) += xt_CONNMARK.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_DELUDE) += xt_DELUDE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_MARK) += xt_MARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFQUEUE) += xt_NFQUEUE.o
@@ -63,6 +65,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += 
 obj-$(CONFIG_NETFILTER_XT_MATCH_MARK) += xt_mark.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_PORTSCAN) += xt_portscan.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA) += xt_quota.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o
Index: linux-2.6.21-rc3/net/netfilter/x_tables.c
===
---

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman

Sam Vilain <[EMAIL PROTECTED]> writes:

> And do we bother changing IPC namespaces or let that one slide?

ipc namespaces works (if you worry about tiny details like we put
the resource limits for the sysv ipc objects inside the namespace).

Probably the most instructive example of this is that you can you
map a sysv ipc shared memory segment with shmat and then switch to
another sysvipc namespace you still have access by reads and writes
to that shared memory segment but you cannot manipulate it because it
doesn't have a name.

Either that or look at the output of ipcs, before and after an unshare.

SYSVIPC really doesn't have it's own (very weird) set of global names
and that is essentially all the ipc namespace deals with.

I think you have the sysvipc namespace confused with something else
though (like signal sending).

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

MMC: Fix typo in mmc highspeed

2007-03-07 Thread Kyungmin Park

MMC: Fix typo in mmc highspeed

Signed-off-by: Kyungmin Park <[EMAIL PROTECTED]>
--
diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c
index 4a73e8b..3b8f7af 100644
--- a/drivers/mmc/mmc.c
+++ b/drivers/mmc/mmc.c
@@ -1134,7 +1134,7 @@ static void mmc_process_ext_csds(struct mmc_host
*host)
 
mmc_card_set_highspeed(card);
 
-   host->ios.timing = MMC_TIMING_SD_HS;
+   host->ios.timing = MMC_TIMING_MMC_HS;
mmc_set_ios(host);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/20] x86_64: Assembly safe page.h and pgtable.h

2007-03-07 Thread Eric W. Biederman

Vivek Goyal <[EMAIL PROTECTED]> writes:

> Hi Sam,
>
> Thanks for the review. This makes sense to me. Move const.h into
> asm-generic and let everybody use it.
>
> This is more of a small cleanup issue and involves changing few header files
> in asm-sparc64 and make sure nothing is broken on sparc64. This patchset
> is already becoming big and complex. Is it ok if we let the patch 
> remain unmodified for now and once this gets in and settles down, I can
> post another patch to do above modification?

Actually unless there is a reason not to, we can probably move this
into include/linux instead of include/asm-generic.  I don't see anything
in that header file that is architecture specific in any way.  Except
that it happens to only be used in architecture specific code.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

IP Defragmentation

2007-03-07 Thread Kanhu Rauta


Hi list,

I am using kernel 2.6.20.1. I have written a module,which will
register a function at local_in hook, i have found a strange behavior
with the packets getting in my callback function i.e

[let say i am sending 1500 bytes to this machine from the network]
ping -s 1500 

1>in case of fragmention i am getting only one packet at the
hook,While analyzing the ip header it says this is the assembled
packet(skb->len=1528,offset=0,MF=0).

While dumping the data(for 0 to 1528 print skb->data[i]) it shows that
only 1472 bytes are valid data and rest 28 bytes are something
garbage.
I verified this with ethereal.

2>I have dumped these packets in ip_local_deliver function after
ip_defrag and before NF_HOOK,But the result is same.

Can Anybody let me know why i am not getting the complete data ?

Regards,
kanhu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Matt Helsley

On Thu, 2007-03-08 at 16:32 +1300, Sam Vilain wrote:

> Kirill, 06032418:36+03:
> > I propose to use "namespace" naming.
> > 1. This is already used in fs.
> > 2. This is what IMHO suites at least OpenVZ/Eric
> > 3. it has good acronym "ns".
> 
> Right.  So, now I'll also throw into the mix:
> 
>   - resource groups (I get a strange feeling of déjà vú there)

Re: déjà vú: yes!

It's like that Star Trek episode ... except we can't agree on the name
of the impossible particle we will invent which solves all our problems.

At the risk of prolonging the agony I hate to ask: are all of these
groupings really concerned with "resources"?

>   - supply chains (think supply and demand)
>   - accounting classes

CKRM's use of the term "class" drew negative comments from Paul Jackson
and Andrew Morton about this time last year. That led to my suggestion
of "Resource Groups". Unless they've changed their minds...

> Do any of those sound remotely close?  If not, your turn :)

I'll butt in here: task groups? task sets? confuselets? ;)

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm

2007-03-07 Thread Andrew Morton

> On Thu, 8 Mar 2007 16:25:05 +1100 Con Kolivas <[EMAIL PROTECTED]> wrote:
> > It also boots OK on a very similar but somewhat older Nocona machine. 
> > Perhaps due to config changes: 
> > http://userweb.kernel.org/~akpm/ck/config-ok.txt
> 
> Ok I just remembered that not only did I expect the cpu task to never be 
> scheduled and it _might_ be scheduled on sched_init, it is actually 
> _consciously_ scheduled on hotplug cpu which I have no way of handling at the 
> moment. On both your configs I noticed you had hotplug cpu enabled, but 
> perhaps it isn't really being used on the more conservative config. So this 
> is something I already know I need to handle. Did your ppc that had 
> the "bitmap error" have hotplug cpu enabled? It might be an unrelated 
> bug^Wphenomenon.

The powerpc config has CONFIG_HOTPLUG_CPU=n
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/20] x86_64: Assembly safe page.h and pgtable.h

2007-03-07 Thread Vivek Goyal

On Wed, Mar 07, 2007 at 08:24:04PM +0100, Sam Ravnborg wrote:
> On Wed, Mar 07, 2007 at 12:29:20PM +0530, Vivek Goyal wrote:
> > 
> > 
> > This patch makes pgtable.h and page.h safe to include
> > in assembly files like head.S.  Allowing us to use
> > symbolic constants instead of hard coded numbers when
> > refering to the page tables.
> > 
> > This patch copies asm-sparc64/const.h to asm-x86_64 to
> > get a definition of _AC() a very convinient macro that
> > allows us to force the type when we are compiling the
> > code in C and to drop all of the type information when
> > we are using the constant in assembly.
> Should this file not live in asm-generic and be useable
> for all architectures?
> 

Hi Sam,

Thanks for the review. This makes sense to me. Move const.h into
asm-generic and let everybody use it.

This is more of a small cleanup issue and involves changing few header files
in asm-sparc64 and make sure nothing is broken on sparc64. This patchset
is already becoming big and complex. Is it ok if we let the patch 
remain unmodified for now and once this gets in and settles down, I can
post another patch to do above modification?

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 8/6] mm: fix cpdfio vs fault race

2007-03-07 Thread Nick Piggin

On Wed, Mar 07, 2007 at 01:02:14PM -0800, Andrew Morton wrote:
> On Wed, 7 Mar 2007 12:31:21 +0100
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > Index: linux-2.6/mm/memory.c
> > ===
> > --- linux-2.6.orig/mm/memory.c
> > +++ linux-2.6/mm/memory.c
> > @@ -1664,6 +1664,15 @@ gotten:
> >  unlock:
> > pte_unmap_unlock(page_table, ptl);
> > if (dirty_page) {
> > +   /*
> > +* Yes, Virginia, this is actually required to prevent a race
> > +* with clear_page_dirty_for_io() from clearing the page dirty
> > +* bit after it clear all dirty ptes, but before a racing
> > +* do_wp_page installs a dirty pte.
> > +*
> > +* do_no_page is protected similarly.
> > +*/
> > +   wait_on_page_locked(dirty_page);
> > set_page_dirty_balance(dirty_page);
> > put_page(dirty_page);
> > }
> > @@ -2316,6 +2325,7 @@ retry:
> >  unlock:
> > pte_unmap_unlock(page_table, ptl);
> > if (dirty_page) {
> > +   wait_on_page_locked(dirty_page);
> > set_page_dirty_balance(dirty_page);
> > put_page(dirty_page);
> > }
> > Index: linux-2.6/mm/page-writeback.c
> 
> now that's scary - applying this on top of your
> lock-the-page-in-the-fault-handler patches gives:
> 
>   if (dirty_page) {
>   /*
>* Yes, Virginia, this is actually required to prevent a race
>* with clear_page_dirty_for_io() from clearing the page dirty
>* bit after it clear all dirty ptes, but before a racing
>* do_wp_page installs a dirty pte.
>*
>* do_no_page is protected similarly.
>*/
>   wait_on_page_locked(dirty_page);
>   wait_on_page_locked(dirty_page);
>   set_page_dirty_balance(dirty_page);
>   put_page(dirty_page);
>   }
> 
> One wonders how on earth patch(1) managed to do that.  If it has inserted
> the comment twice as well then it might be explicable..

Ouch ;) Yeah that patch I sent was supposed to apply underneath
the previous ones, sorry I wasn't clear.

> Oh well, let's try this:

Yeah that looks like the correct one for applying on top. Thanks.

> 
> From: Nick Piggin <[EMAIL PROTECTED]>
> 
> Fix msync data loss and (less importantly) dirty page accounting
> inaccuracies due to the race remaining in clear_page_dirty_for_io().
> 
> The deleted comment explains what the race was, and the added comments
> explain how it is fixed.
> 
> Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
> Cc: Linus Torvalds <[EMAIL PROTECTED]>
> Cc: Miklos Szeredi <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> ---
> 
>  mm/memory.c |9 +
>  mm/page-writeback.c |   17 -
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff -puN mm/memory.c~mm-fix-cpdfio-vs-fault-race mm/memory.c
> --- a/mm/memory.c~mm-fix-cpdfio-vs-fault-race
> +++ a/mm/memory.c
> @@ -1669,6 +1669,15 @@ gotten:
>  unlock:
>   pte_unmap_unlock(page_table, ptl);
>   if (dirty_page) {
> + /*
> +  * Yes, Virginia, this is actually required to prevent a race
> +  * with clear_page_dirty_for_io() from clearing the page dirty
> +  * bit after it clear all dirty ptes, but before a racing
> +  * do_wp_page installs a dirty pte.
> +  *
> +  * do_no_page is protected similarly.
> +  */
> + wait_on_page_locked(dirty_page);
>   set_page_dirty_balance(dirty_page);
>   put_page(dirty_page);
>   }
> diff -puN mm/page-writeback.c~mm-fix-cpdfio-vs-fault-race mm/page-writeback.c
> --- a/mm/page-writeback.c~mm-fix-cpdfio-vs-fault-race
> +++ a/mm/page-writeback.c
> @@ -903,6 +903,8 @@ int clear_page_dirty_for_io(struct page 
>  {
>   struct address_space *mapping = page_mapping(page);
>  
> + BUG_ON(!PageLocked(page));
> +
>   if (mapping && mapping_cap_account_dirty(mapping)) {
>   /*
>* Yes, Virginia, this is indeed insane.
> @@ -928,14 +930,19 @@ int clear_page_dirty_for_io(struct page 
>* We basically use the page "master dirty bit"
>* as a serialization point for all the different
>* threads doing their things.
> -  *
> -  * FIXME! We still have a race here: if somebody
> -  * adds the page back to the page tables in
> -  * between the "page_mkclean()" and the "TestClearPageDirty()",
> -  * we might have it mapped without the dirty bit set.
>*/
>   if (page_mkclean(page))
>   set_page_dirty(page);
> + /*
> +  * We carefully synchronise fault handlers against
> +  * installing a dirty

Re: 2.6.21-rc1 and 2.6.21-rc2 kwin dies silently

2007-03-07 Thread Andrew Morton


(cc restored.  Please always do reply-to-all)

> On Wed, 28 Feb 2007 18:05:13 +0200 [EMAIL PROTECTED] wrote:
> On Wednesday 28 February 2007 17:19, Sid Boyce wrote:
> > openSUSE 10.3 Alpha and KDE-3.5.6, xorg-x11-7.2. KDE is setup not to
> > require a password to unlock, but it asks for password. When the screen
> > unlocks, kwin is gone with no errors logged in /var/log/kdm or
> > /var/log/messages. No problems with 2.6.20.
> >
> > Same problem on openSUSE 10.2 x86_64, KDE-3.5.5 and 2.6.21-rc2.
> > Regards
> > Sid.
> 
> This is the linux kernel mailing list. Perhaps you should post your problem 
> to 
> the opensuse mailing list.

2.6.20 worked.

2.6.20-rc2 did not.

Working theory: the kernel broke.

Sid, the chances that anyone can work out what caused this are pretty low. 
It would be great if you could perform a git bisection search sometime in
the next few weeks, work out which commit caused this.

Thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC: -mm patch] #if 0 mmc_deselect_cards()

2007-03-07 Thread Pierre Ossman

Adrian Bunk wrote:
> On Tue, Mar 06, 2007 at 12:44:08AM -0800, Andrew Morton wrote:
>> ...
>> Changes since 2.6.20-rc2-mm1:
>> ...
>>  git-mmc.patch
>> ...
>>  git trees
>> ...
> 
> mmc_deselect_cards() is no longer used.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> 

Indeed, but it's probably better to just remove it rather than have old crud
lying around.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm

2007-03-07 Thread Con Kolivas

On Thursday 08 March 2007 15:15, Andrew Morton wrote:
> On Wed, 7 Mar 2007 18:54:30 -0800 Andrew Morton <[EMAIL PROTECTED]> 
wrote:
> > On Wed, 7 Mar 2007 17:43:45 -0800 Andrew Morton 
<[EMAIL PROTECTED]> wrote:
> > > On Wed, 7 Mar 2007 12:26:42 +1100
> > >
> > > Con Kolivas <[EMAIL PROTECTED]> wrote:
> > > > What follows is the same patch series that constitutes the RDSL
> > > > "Rotating Staircase DeadLine" cpu scheduler resynced for
> > > > 2.6.21-rc2-mm2.
> > >
> > > Big oops early in boot on x86_64 SMP, in rq_bitmap_error+0x97/0x9f.
> > >
> > > I stubbed it out with a `return MAX_RT_PRIO;' (I think) but it then
> > > oopsed differently.  Before netconsole had come up, no serial console,
> > > no digital camera.
> > >
> > > There's stuff in http://userweb.kernel.org/~akpm/ck/ - you can probably
> > > boot that kernel on your own machine.
> > >
> > > I need to do rc3-mm1 now.  I might find some time to poke at this
> > > further after that, but I have to leave for a week in .jp and it'll be
> > > squeezy, sorry.
> >
> > well it boots os dual pIII and quad powerpc.
>
> It also boots OK on a very similar but somewhat older Nocona machine. 
> Perhaps due to config changes: 
> http://userweb.kernel.org/~akpm/ck/config-ok.txt

Ok I just remembered that not only did I expect the cpu task to never be 
scheduled and it _might_ be scheduled on sched_init, it is actually 
_consciously_ scheduled on hotplug cpu which I have no way of handling at the 
moment. On both your configs I noticed you had hotplug cpu enabled, but 
perhaps it isn't really being used on the more conservative config. So this 
is something I already know I need to handle. Did your ppc that had 
the "bitmap error" have hotplug cpu enabled? It might be an unrelated 
bug^Wphenomenon.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 15/20] Move swsusp __pa() dependent code to arch portion

2007-03-07 Thread Vivek Goyal

On Wed, Mar 07, 2007 at 11:47:40PM +0100, Pavel Machek wrote:
> Hi!
> 
> > o __pa() should be used only on kernel linearly mapped virtual addresses
> >   and not on kernel text and data addresses.
> > 
> > o Hibernation code needs to determine the physical address associated
> >   with kernel symbol to mark a section boundary which contains pages which
> >   don't have to be saved and restored during hibernate/resume operation.
> > 
> > o Move this piece of code in arch dependent section. So that architectures
> >   which don't have kernel text/data mapped into kernel linearly mapped
> >   region can come up with their own ways of determining physical addresses
> >   associated with a kernel text.
> > 
> > Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
> 
> ...hmm, but that means 3 copies of same code. Can we put the
> 

Actually it is not exactly same code. i386 and x86_64 use __pa_symbol()
and powerpc uses __pa() for determining physical address associated with
a kernel text symbol. That's the precise intent here. Leave it to arch
code to decide how to calculate physical address associated with a kernel
symbol.

> > +/*
> > + * pfn_is_nosave - check if given pfn is in the 'nosave' section
> > + */
> > +
> > +int pfn_is_nosave(unsigned long pfn)
> > +{
> > +   unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> 
> > PAGE_SHIFT;
> > +   unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) 
> > >> PAGE_SHIFT;
> > +   return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
> > +}
> 
> ...in asm-generic/suspend.h (or something) and then just include it?
>   Pavel

As code is not exactly same, we can't put it in asm-generic/suspend.h.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PAGE_SIZE Availability Inconsistency

2007-03-07 Thread David Brown


While I agree, NBPG is a bit of a problem, although it's only needed for aout
coredumps AFAICT, but still needed to compile e.g. gdb.


Well then how does gdb deal with ia64? because PAGE_SIZE and friends
aren't available for that arch same with ppc.

Looking at the gdb code they do have places where they define a
PAGE_SIZE but they even mention its a bug
(gdb-6.6/libiberty/getpagesize.c:14) also grepped through their code
looking for includes of page.h come up with nothing.

- David Brown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 16/20] swsusp: do not use virt_to_page on kernel data address

2007-03-07 Thread Vivek Goyal

On Wed, Mar 07, 2007 at 11:49:15PM +0100, Pavel Machek wrote:
> Hi!
> 
> > o virt_to_page() call should be used on kernel linear addresses and not
> >   on kernel text and data addresses. Swsusp code uses it on kernel data
> >   (statically allocated swsusp_header).
> > 
> > o Allocate swsusp_header dynamically so that virt_to_page() can be used
> >   safely.
> > 
> > o I am changing this because in next few patches, __pa() on x86_64 will
> >   no longer support kernel text and data addresses and hibernation breaks. 
> > 
> > Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
> 
> > +static int swsusp_header_init(void)
> > +{
> > +   swsusp_header = (struct swsusp_header*) __get_free_page(GFP_KERNEL);
> > +   if (!swsusp_header)
> > +   panic("Could not allocate memory for swsusp_header\n");
> > +   return 0;
> > +}
> > +
> > +core_initcall(swsusp_header_init);
> 
> I do not like the panic, but I guess it is okay as we are running
> during boot? (Could you add a comment?) Otherwise ok.
> 

Hi Pavel,

Yes, it is an initcall and this memory page will be allocated during
boot time. Not very sure what comment to put there. To me it seems
pretty obivious with "core_initcall".

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Raid 10 Problems?

2007-03-07 Thread Jan Engelhardt

On Mar 7 2007 10:20, dean gaudet wrote:
>>> http://gentoo-wiki.com/HOWTO_Install_on_Software_RAID#Write-intent_bitmap
>> 
>> That information has been extremely useful. Thanks a
>> lot. I fund a command to do the bitmap internal after
>> the array was made so I added that. Seems like some of
>> these features should be default. Maybe it's time for
>> the raid folks to update what is default?
>
>the bitmap has performance implications... for example:
>http://www.mail-archive.com/linux-raid@vger.kernel.org/msg07229.html

I wonder if bitmapping a raid1 volume is faster than bmp.ing raid5.

The other thing is, the bitmap is supposed to be written out at intervals,
not at every write, so the extra head movement for bitmap updates should
be really low, and not making the tar -xjf process slower by half a
minute.
Is there a way to tweak the write-bitmap-to-disk interval? Perhaps 
something in /sys or ye olde /proc. Maybe linux-raid@ knows 8)

>note that unless you tweak your init scripts you'll need to put external 
>bitmaps on your root partition, see this thread:

Huh? That statement does not make sense. But I think you meant: when using
external bitmaps, adjust the init scripts. Because internal bitmaps are good
for one thing: you don't need to change anything.

Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 16/20] swsusp: do not use virt_to_page on kernel data address

2007-03-07 Thread Vivek Goyal

On Wed, Mar 07, 2007 at 11:50:06PM +0100, Pavel Machek wrote:
> Hi!
> 
> > o virt_to_page() call should be used on kernel linear addresses and not
> >   on kernel text and data addresses. Swsusp code uses it on kernel data
> >   (statically allocated swsusp_header).
> > 
> > o Allocate swsusp_header dynamically so that virt_to_page() can be used
> >   safely.
> > 
> > o I am changing this because in next few patches, __pa() on x86_64 will
> >   no longer support kernel text and data addresses and hibernation breaks. 
> > 
> > Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
> 
> (I assume this was tested, too?)
>   Pavel

Yes. I have tested this and it works fine.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/20] x86_64 Relocatable bzImage support (V4)

2007-03-07 Thread Vivek Goyal

On Thu, Mar 08, 2007 at 10:15:02AM +1100, Nigel Cunningham wrote:
> Hi.
> 
> On Thu, 2007-03-08 at 07:49 +1100, Nigel Cunningham wrote:
> > Hi.
> > 
> > On Wed, 2007-03-07 at 07:07 -0800, Arjan van de Ven wrote:
> > > On Wed, 2007-03-07 at 12:27 +0530, Vivek Goyal wrote:
> > > > Hi,
> > > > 
> > > > Here is another attempt on x86_64 relocatable bzImage patches(V4). This
> > > > patchset makes a bzImage relocatable and same kernel binary can be 
> > > > loaded
> > > > and run from different physical addresses.
> > > 
> > > 
> > > have these patches been extensively tested with various suspend
> > > scenarios? (S1,S3,S4 in acpi speak or s2ram and s2disk in Linux speak)
> > 
> > We did work on this for RHEL5, getting relocatable kernel support
> > working fine with S4. While doing it and since, I've been running
> > Suspend2 with the same patch.
> > 
> > Since that work, Vivek has done more modifications, but I can confirm
> > that the basic design is reliable with S4. Haven't tried S3, but can do.
> > Will report back shortly.
> 
> S3 works okay here with a relocatable x86_64 kernel (2.6.20).
> 

Hi Nigel,

Is it possible to test S3 with 2.6.21-rc2 kernels also. Right now I don't 
have access to any machine supporting S3. I tested it at the time of my last
posting and it had worked well. Appreciate your help.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] Input fixes for 2.6.21-rc3

2007-03-07 Thread Dmitry Torokhov

Hi Linus,

Please consider pulling from:

        git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus

or
        master.kernel.org:/pub/scm/linux/kernel/git/dtor/input.git for-linus

to receive fix for AUX IRQ delivery test that causes missing keyboards
on some boxes without PS/2 mice. The fix is confirmed to be working for
MSI K8M800 and also confirmed not to break previous fix (re. bugzilla
7833).

Changelog:
--
Dmitry Torokhov (1):
  Input: i8042 - another attempt to fix AUX delivery checks

Diffstat:
-
 i8042.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/20] x86_64: wakeup.S misc cleanups

2007-03-07 Thread Vivek Goyal

On Wed, Mar 07, 2007 at 11:41:57PM +0100, Pavel Machek wrote:
> Hi!
> 
> > +   movw$0x0e00 + 'i', %ds:(0xb8012)
> > +   movb$0xa8, %al  ;  outb %al, $0x80;
> > +
> 
> > -   movw$0x0e00 + 'i', %ds:(0xb8012)
> > -   movb$0xa8, %al  ;  outb %al, $0x80; 
> 
> Outbs were my debugging hacks, perhaps you can simply remove them at
> this point? Not sure how useful "Linux" debug print is, it can
> probably be removed, too.
> 

Hi Pavel,

I found these debugging hacks useful while debugging some problem with 
my changes in this code. It helps to find out till what poing code flow
as reached in this assembly code. So I think its not a bad idea to let
this piece code be there.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/20] x86_64: wakeup.S misc cleanups

2007-03-07 Thread Vivek Goyal

On Wed, Mar 07, 2007 at 11:40:53PM +0100, Pavel Machek wrote:
> Hi!
> 
> > o Various cleanups. One of the main purpose of cleanups is that make
> >   wakeup.S as close as possible to trampoline.S.
> > 
> > o Following are the changes
> > - Indentations for comments.
> > - Changed the gdt table to compact form and to resemble the
> >   one in trampoline.S
> > - Take the jump to 32bit from real mode using ljmpl. Makes code
> >   more readable.
> > - After enabling long mode, directly take a long jump for 64bit
> >   mode. No need to take an extra jump to "reach_comaptibility_mode"
> > - Stack is not used after real mode. So don't load stack in
> >   32 bit mode.
> > - No need to enable PGE here.
> > - No need to do extra EFER read, anyway we trash the read contents.
> > - No need to enable system call (EFER_SCE). Anyway it will be 
> >   enabled when original EFER is restored.
> > - No need to set MP, ET, NE, WP, AM bits in cr0. Very soon we will
> >   reload the original cr0 while restroing the processor state.
> > 
> > Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
> > Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
> 
> ACK, provided it was tested.
>   Pavel

Hi Pavel,

Thanks. I tested all the S3 related changes during my last posting. That
time I had access to an x86_64 box which supported ACPI state S3. Since then
this code has not changed and it has been running successfully in RHEL5 
kernels. Now I don't have access to an x86_64 machine which supports S3
so I can't test suspend to RAM. But I am sure that these patches are working
as nothing has changed since last posting.

Just now Nigel reported successful suspend to RAM results for 2.6.20. I
have requested him to test it for 2.6.21-rc2 also, if possible.

I have throughly tested suspend to disk (S4) and it works fine.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm

2007-03-07 Thread Con Kolivas

On 08/03/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Wed, 7 Mar 2007 17:43:45 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Wed, 7 Mar 2007 12:26:42 +1100
> Con Kolivas <[EMAIL PROTECTED]> wrote:
>
> > What follows is the same patch series that constitutes the RDSL "Rotating
> > Staircase DeadLine" cpu scheduler resynced for 2.6.21-rc2-mm2.
>
> Big oops early in boot on x86_64 SMP, in rq_bitmap_error+0x97/0x9f.
>
> I stubbed it out with a `return MAX_RT_PRIO;' (I think) but it then oopsed
> differently.  Before netconsole had come up, no serial console, no digital
> camera.
>
> There's stuff in http://userweb.kernel.org/~akpm/ck/ - you can probably
> boot that kernel on your own machine.
>
> I need to do rc3-mm1 now.  I might find some time to poke at this
> further after that, but I have to leave for a week in .jp and it'll be
> squeezy, sorry.

well it boots os dual pIII and quad powerpc.

The powerpc says

Scheduler bitmap error - bitmap being reconstructed..

during bootup.  But it didn't crash like the Nocona machine.

Ah thanks. Sorry I have a very busy day at work and am unable to do
anything about it till tonight. I could imagine on nocona this would
be due to the idle task being scheduled on init - which it is not
supposed to do but if you read the comment in sched_init it says it
*might. I have no way of handling that at the moment because I wasn't
sure it ever happened any more.  As for the powerpc.. I have no idea
(from where I am at the moment which it would be totally inappopriate
for me to try to debug :P), sorry.

--
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tcp_cubic: use 32 bit math

2007-03-07 Thread Willy Tarreau

On Wed, Mar 07, 2007 at 07:10:47PM -0800, Stephen Hemminger wrote:
> David Miller wrote:
> >From: Stephen Hemminger <[EMAIL PROTECTED]>
> >Date: Wed, 7 Mar 2007 17:07:31 -0800
> >
> >  
> >>The basic calculation has to be done in 32 bits to avoid
> >>doing 64 bit divide by 3. The value x is only 22bits max
> >>so only need full 64 bits only for x^2.
> >>
> >>Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
> >>
> >
> >Applied, thanks Stephen.
> >
> >What about Willy Tarreau's supposedly even faster variant?
> >Or does this incorporate that set of improvements?
> >  
> That's what this is:
>x = (2 * x + (uint32_t)div64_64(a, (uint64_t)x*(uint64_t)x)) / 3;

Confirmed, it's the same. BTW, has someone tested on a 64bit system if
it brings any difference ?

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.21-rc3-mm2

2007-03-07 Thread Andrew Morton


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc3/2.6.21-rc3-mm2/

- This is the same as 2.6.21-rc3-mm1, except Con's CPU scheduler changes
  were dropped.

  This is for A/B comparison purposes, and because those changes crashed on
  one test setup.



Changes since 2.6.21-rc3-mm1:

-lists-add-list-splice-tail.patch
-sched-remove-sleepavg-from-proc.patch
-sched-remove-noninteractive-flag.patch
-sched-implement-180-bit-sched-bitmap.patch
-sched-implement-rsdl-cpu-scheduler.patch
-sched-document-rsdl-cpu-scheduler.patch

 Removed.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.21-rc3-mm1

2007-03-07 Thread Andrew Morton


Temporarily at

  http://userweb.kernel.org/~akpm/2.6.21-rc3-mm1/

Will appear later at

  
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc3/2.6.21-rc3-mm1/



- The wireless changes in here need a lot of testers, please.  It is major
  rework.

  Of course the config files got all changed around so `make oldconfig'
  breaks everything.  I was able to get ipw2200 working after some fumbling,
  but perhaps John can tell people what has been changed in there?  What has
  happened, from a big picture perspective?

- This patchset contains Con's rip-up-and-rewrite of the CPU scheduling
  algorithm.  It oopsed for me on one machine so I'll do an rc3-mm2 without
  those changes shortly.  If 2.6.21-rc3-mm1 crashes and 2.6.rc3-mm2 does not,
  don't forget to Cc: Con Kolivas <[EMAIL PROTECTED]> on the report ;)

  Feedback on this change is sought.  Especially from the
  enterprise-database and volanomark loonies: this stuff might be headed your
  way so don't tell us afterwards that it hurt.

- Added Nick's lock-the-page-in-the-pagefault-handler patches.  These reduce
  the incidence of one bug and increase the incidence of another.  VM is fun. 

- Re-added the ext4 development tree to the -mm lineup.  It has stuff in
  it.  



Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git 
tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

echo "subscribe mm-commits" | mail [EMAIL PROTECTED]

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers filter by Subject: when looking for messages to read.

- Occasional snapshots of the -mm lineup are uploaded to
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
  the mm-commits list.




Changes since 2.6.21-rc2-mm2:


 origin.patch
 git-acpi.patch
 git-alsa.patch
 git-arm.patch
 git-avr32.patch
 git-cpufreq.patch
 git-drm.patch
 git-dvb.patch
 git-gfs2-nmw.patch
 git-hid.patch
 git-ia64.patch
 git-ieee1394.patch
 git-input.patch
 git-kvm.patch
 git-leds.patch
 git-libata-all.patch
 git-md-accel.patch
 git-md-accel-fixup.patch
 git-mmc.patch
 git-ubi.patch
 git-netdev-all.patch
 git-ioat.patch
 git-ocfs2.patch
 git-parisc.patch
 git-r8169.patch
 git-selinux.patch
 git-pciseg.patch
 git-s390.patch
 git-unionfs.patch
 git-watchdog.patch
 git-wireless.patch
 git-wireless-fixup.patch
 git-ipwireless_cs.patch
 git-gccbug.patch

 git trees

-paravirt-build-fixes.patch
-fix-suspend-resume-with-periodic-tick-devices.patch
-nvidiafb-backlight-fix-implicit-declaration-in-nv_backlight.patch
-atyfb-fix-kconfig-error-part-2.patch
-fbdev-fix-kconfig-error-if-fb_ddc=n.patch
-fix-2621-rfcomm-lockups.patch
-scheduled-removal-of-sa_xxx-interrupt-flags-fixups-3.patch
-i386-make-x86_64-tsc-header-require-i386-rather-than-vice-versa.patch
-hrtimers-fix-hrtimer_cb_irqsafe_no_softirq-description.patch
-hrtimers-hrtimer_clock_base-description-typo.patch
-highres-do-not-run-the-timer_softirq-after-switching-to-highres-mode.patch
-highres-do-not-run-the-timer_softirq-after-switching-to-highres-mode-tweak.patch
-highres-do-not-run-the-timer_softirq-after-switching-to-highres-mode-tweak-fix.patch
-kconfig-update-swsusp-description.patch
-remove-arch-i386-kernel-tscccustom_sched_clock.patch
-mqueue-nested-locking-annotation.patch
-fix-vsyscall-settimeofday.patch
-fs-nobh_truncate_page-fix.patch
-geode-aes-use-unsigned-long-for-spin_lock_irqsave.patch
-publish-rcutorture-module-parameters-via-sysfs-read-only.patch
-cciss-fix-for-2tb-support.patch
-cciss-add-struct-pci_driver-shutdown-support-replaces-reboot-notifier.patch
-initramfs-should-not-depend-on-config_block.patch
-linux-audith-needs-linux-typesh.patch
-uml-fix-formatting-violations-in-signal-delivery-code.patch
-uml-add-a-debugging-message.patch
-uml-comment-the-initialization-of-a-global.patch
-knfsd-use-recv_msg-to-get-peer-address-for-nfsd-instead-of-code-copying.patch
-knfsd-remove-config_ipv6-ifdefs-from-sunrpc-server-code.patch

Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm

2007-03-07 Thread Andrew Morton

On Wed, 7 Mar 2007 18:54:30 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Wed, 7 Mar 2007 17:43:45 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > On Wed, 7 Mar 2007 12:26:42 +1100
> > Con Kolivas <[EMAIL PROTECTED]> wrote:
> > 
> > > What follows is the same patch series that constitutes the RDSL "Rotating 
> > > Staircase DeadLine" cpu scheduler resynced for 2.6.21-rc2-mm2.
> > 
> > Big oops early in boot on x86_64 SMP, in rq_bitmap_error+0x97/0x9f.
> > 
> > I stubbed it out with a `return MAX_RT_PRIO;' (I think) but it then oopsed
> > differently.  Before netconsole had come up, no serial console, no digital
> > camera.
> > 
> > There's stuff in http://userweb.kernel.org/~akpm/ck/ - you can probably
> > boot that kernel on your own machine.
> > 
> > I need to do rc3-mm1 now.  I might find some time to poke at this
> > further after that, but I have to leave for a week in .jp and it'll be
> > squeezy, sorry.
> 
> well it boots os dual pIII and quad powerpc.
> 

It also boots OK on a very similar but somewhat older Nocona machine.  Perhaps
due to config changes:  http://userweb.kernel.org/~akpm/ck/config-ok.txt

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc2 regression vs. 2.6.20: AT keyboard only works with pci=noacpi

2007-03-07 Thread Dmitry Torokhov

On Wednesday 07 March 2007 16:50, Dmitry Torokhov wrote:
> On 3/7/07, Ash Milsted <[EMAIL PROTECTED]> wrote:
> >
> > So, I tracked this down to 2.6.21-git7, the first snapshot that gives me
> > this problem. Tellingly it does contain an input tree merge. I would git 
> > bisect
> > but I don't have a local copy of the tree - I tried to get one, but it 
> > stopped
> > halfway through the clone, probably because I had to use http... So, I hope 
> > that
> > helps.
> >
> 
> Hm, that is strange... 2.6.20-rc7 has i8042 AUX IRQ delivery test fix
> and fix for panic blink, both shoudl not really affect your keyboard.
> Can I please get full dmesg of boot with "i8042.debug
> log_buf_len=131072"?
> 

Argh, I can't believe I forgot to get this into my tree. Could you please
tell me if the patch below fixes ytour issue?

-- 
Dmitry

Input: i8042 - another attempt to fix AUX delivery checks

Do not assume that AUX_LOOP command is broken unless it
completes successfully but returns wrong (unexpected) data.

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>
---
 drivers/input/serio/i8042.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

Index: linux/drivers/input/serio/i8042.c
===
--- linux.orig/drivers/input/serio/i8042.c
+++ linux/drivers/input/serio/i8042.c
@@ -553,7 +553,8 @@ static int __devinit i8042_check_aux(voi
  */
 
param = 0x5a;
-   if (i8042_command(, I8042_CMD_AUX_LOOP) || param != 0x5a) {
+   retval = i8042_command(, I8042_CMD_AUX_LOOP);
+   if (retval || param != 0x5a) {
 
 /*
  * External connection test - filters out AT-soldered PS/2 i8042's
@@ -567,7 +568,12 @@ static int __devinit i8042_check_aux(voi
(param && param != 0xfa && param != 0xff))
return -1;
 
-   aux_loop_broken = 1;
+/*
+ * If AUX_LOOP completed without error but returned unexpected data
+ * mark it as broken
+ */
+   if (!retval)
+   aux_loop_broken = 1;
}
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/6 -rt] powerpc 2.6.20-rt8: to convert spinlocks to raw ones.

2007-03-07 Thread hui

On Thu, Mar 08, 2007 at 02:26:47PM +1100, Paul Mackerras wrote:
> Bill Huey (hui) writes:
> 
> > The places that need to be reverted to raw spinlocks are generally either
> > acquired by function calls that allocate the spinlock at a terminal of the
> > kernel's lock graph or isolated from other callers completely (parts of the
> > timer for logic for instance). It's all about the collision of various lock
> > (preemptive and non-preemptive) subtrees and how to avoid scheduling within
> > atomic violations that lead to deadlocks. The -rt patch gets arbitrary
> > preemption abilities by shrinking the non-preemptive sub-tree bit to the 
> > bare
> > essentials of what will let a system to run yet still preserve all of
> > the expected locking semantics of a critical section.
> 
> Thanks; that's an interesting explanation.
> 
> It misses the point of what I was saying to Sergei, though, which was
> *not* "I don't understand your patch", it was "if this patch goes into
> a git tree, someone coming along in 3 years time won't understand the
> patch."  In other words I was ranting about the need for a decent
> description to accompany the patch itself, so it would go into the
> permanent record.

Yeah, I think it's a a fear and uncertainly about the technical details about
the patch. That is why folks CC Ingo and company to get either a kind of
confirmation that this is ok along with comments. There are very few folks
that really understand the basic principals of the patch in this community
and that's not going to change any time soon. The mystery, paranoia (FUD)
and criticism surrounding it can make folks a bit shy.

I'll talk to you and Ben about it if we all get to OLS again. :)

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tcp_cubic: use 32 bit math

2007-03-07 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 07 Mar 2007 19:10:47 -0800

> David Miller wrote:
> > What about Willy Tarreau's supposedly even faster variant?
> > Or does this incorporate that set of improvements?
> >   
> That's what this is:
> x = (2 * x + (uint32_t)div64_64(a, (uint64_t)x*(uint64_t)x)) / 3;

Great, thanks for the clarification.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

trailing whitespace killing (Re: [PATCH -mm] Blackfin: blackfin i2c driver)

2007-03-07 Thread Oleg Verych

> From: Andrew Morton
> Newsgroups: gmane.linux.kernel
> Subject: Re: [PATCH -mm] Blackfin: blackfin i2c driver
> Date: Tue, 6 Mar 2007 23:45:29 -0800
[]
> On Wed, 07 Mar 2007 15:39:27 +0800 "Wu, Bryan" <[EMAIL PROTECTED]> wrote:
>
>> Thanks a lot, could you please give me a script just to kill this
>> whitespace? So I can do it before sending you patches.
>
>
> Is pretty simple:
>
> #!/bin/sh
> #
> # Strip any trailing whitespace which a unified diff adds.
> #
>
> strip1()
> {
>   TMP=$(mktemp /tmp/XX)
>   cp $1 $TMP
>   sed -e '/^+/s/[ ]*$//' < $TMP > $1
>   rm $TMP
> }
>
> for i in $*
> do
>   strip1 $i
> done
>
>
> that'll be in
> http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.20/patch-scripts-0.20.tar.gz
> too

It doesn't work for me. Maybe i can't understand what you are trying to
do, anyway.

General suggestion is can be:

   sed -e 's_[ \t]*$__'

(i.e any line on stdin with space/tab mixed tails is stripped on stdout)

You can use it as wrapper for diff, sending patch bombs, etc.
(very nice with pipes):

shell$ diff -Npu2 old new | sed -e 's_[ \t]*$__' > patch.diff
shell$ < patch-set.mbox sed -e 's_[ \t]*$__' | formail -s /usr/sbin/sendmail 
-bm -t 

similar in scripts; quilt (patch sets manager) notices about them.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/20] x86_64 Relocatable bzImage support (V4)

2007-03-07 Thread Vivek Goyal

On Wed, Mar 07, 2007 at 07:07:39AM -0800, Arjan van de Ven wrote:
> On Wed, 2007-03-07 at 12:27 +0530, Vivek Goyal wrote:
> > Hi,
> > 
> > Here is another attempt on x86_64 relocatable bzImage patches(V4). This
> > patchset makes a bzImage relocatable and same kernel binary can be loaded
> > and run from different physical addresses.
> 
> 
> have these patches been extensively tested with various suspend
> scenarios? (S1,S3,S4 in acpi speak or s2ram and s2disk in Linux speak)

Hi Arjan,

I have tested these patches for suspend to RAM and suspend to disk and they
work fine. In the past we had few issues with suspend to disk and now
these issues have been resolved in this patchset. 

Thanks
Vivek

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Sam Vilain

Paul Menage wrote:
> I made sure to check [...]wikipedia.org[...] when this argument started ... 
> :-)
>   

Wikipedia?!  That's not a referen[...]

oh bugger it.  I've vented enough today and we're on the same page now I
think.

>> This is the classic terminology problem between substance and function.
>> ie, some things share characteristics but does that mean they are the
>> same thing?
>> 
>
> Aren't you arguing my side here? My point is that what I'm trying to
> add with "containers" (or whatever name we end up using) can't easily
> be subsumed into the "namespace" concept, and you're arguing that they
> should go into nsproxy because they share some characteristics.
>   

Ok, they share this characteristic with namespaces: that they group
processes.  So, they conceptually hang off task_struct.  But we put them
on ns_proxy because we've got this vague notion that things might be
better that way.

>> about this you still insist on calling this sub-system specific stuff
>> the "container",
>> 
> Uh, no. I'm trying to call a *grouping* of processes a container.
>   

Ok, so is this going to supplant the namespaces too?

>> and then go screaming that I am wrong and you are right
>> on terminology.
>> 
>
> Actually I asked if you/Eric had better suggestions.
>   

Cool, let's review them.

Me, 07921311:38+12:
> This would suggesting re-write this patchset, part 2 as a "CPUSet
> namespace", part 4 as a "CPU scheduling namespace", parts 5 and 6 as
> "Resource Limits Namespace" (drop this "BeanCounter" brand), and of
> course part 7 falls away.
Me, 07022110:58+12:
> Did you like the names I came up with in my original reply?
>  - CPUset namespace for CPU partitioning
>  - Resource namespaces:
>- cpusched namespace for CPU
>- ulimit namespace for memory
>- quota namespace for disk space
>- io namespace for disk activity
>- etc

Ok, there's nothing original or useful there; I'm obviously quite deliberately 
still punting on the issue.

Eric, 07030718:32-07:
> Pretty much.  For most of the other cases I think we are safe referring
> to them as resource controls or resource limits.I know that roughly
> covers what cpusets and beancounters and ckrm currently do.

Let's go back in time to the thread I referred to:

Me, 06032209:08+12 and nearby posts
>  - "vserver" spelt in full
>  - family
>  - container
>  - jail
>  - task_ns (sort for namespace)
> Using the term "box" and ID term "boxid":
> create_space - creates a new space and "hashes" it

Kirill, 06032418:36+03:
> I propose to use "namespace" naming.
> 1. This is already used in fs.
> 2. This is what IMHO suites at least OpenVZ/Eric
> 3. it has good acronym "ns".

Right.  So, now I'll also throw into the mix:

  - resource groups (I get a strange feeling of déjà vú there)
  - supply chains (think supply and demand)
  - accounting classes

Do any of those sound remotely close?  If not, your turn :)

And do we bother changing IPC namespaces or let that one slide?

Sam.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/6 -rt] powerpc 2.6.20-rt8: to convert spinlocks to raw ones.

2007-03-07 Thread Paul Mackerras

Bill Huey (hui) writes:

> The places that need to be reverted to raw spinlocks are generally either
> acquired by function calls that allocate the spinlock at a terminal of the
> kernel's lock graph or isolated from other callers completely (parts of the
> timer for logic for instance). It's all about the collision of various lock
> (preemptive and non-preemptive) subtrees and how to avoid scheduling within
> atomic violations that lead to deadlocks. The -rt patch gets arbitrary
> preemption abilities by shrinking the non-preemptive sub-tree bit to the bare
> essentials of what will let a system to run yet still preserve all of
> the expected locking semantics of a critical section.

Thanks; that's an interesting explanation.

It misses the point of what I was saying to Sergei, though, which was
*not* "I don't understand your patch", it was "if this patch goes into
a git tree, someone coming along in 3 years time won't understand the
patch."  In other words I was ranting about the need for a decent
description to accompany the patch itself, so it would go into the
permanent record.

Regards,
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds

On Wed, 7 Mar 2007, Michael K. Edwards wrote:
> 
> People's prejudices against prefetch instructions are sometimes
> traceable to the 3DNow! prefetch(w) botch, which some processors
> "support" as no-ops and others are too aggressive about (Opteron
> prefetches are reputed to be "strong", i. e., not dropped on DTLB
> miss).

No, I just checked, and Intel's own optimization manual makes it clear 
that you should be careful. They talk about performance penalties due to 
resource constraints - which makes tons of sense with a core that is good 
at handling its own resources and could quite possibly use those resources 
better to actually execute the loads and stores deeper down the 
instruction pipeline.

So it's not just 3DNow! making AMD look bad, or Intel would obviously 
suggest people use it out of the wazoo ;)

> XScale gets it right.

Blah. XScale isn't even an OoO CPU, *of*course* it needs prefetching. 
Calling that "getting it right" is ludicrous. If anything, it gets things 
so wrong that prefetching is *required* for good performance.

I'm talking about real CPU's with real memory pipelines that already do 
prefetching in hardware. The better the core is, the less the prefetch 
helps (and often the more it hurts in comparison to how much it helps).

But if you mean "doesn't try to fill the TLB on data prefetches", then 
yes, that's generally the right thing to do.

> (Oddly, Prescott seems to have initiated a page table walk on DTLB miss 
> during software prefetch -- just one of many weird Prescott flaws.)  

Netburst in general is *very* happy to do speculative TLB fills, I think.

> I'm guessing Pentium M and its descendants (Core Solo and Duo) get it 
> right but I'm having a hell of a time finding out for sure.  Can any of 
> the x86 experts answer this?

I just suspect that the upside for Core 2 Due is likely fairly low. The L2 
cache is good, the memory re-ordering is working.. I doubt "prefetch" 
helps in generic code that much for things like linked list following, you 
should probably limit it to code that has *known* access patterns and you 
know it's not going to be in the cache.

(In other words, I bet prefetching can help a lot with MMX/media kind of 
code, I doubt it's a huge win for "for_each_entry()")

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tcp_cubic: use 32 bit math

2007-03-07 Thread Stephen Hemminger

David Miller wrote:

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 7 Mar 2007 17:07:31 -0800

The basic calculation has to be done in 32 bits to avoid
doing 64 bit divide by 3. The value x is only 22bits max
so only need full 64 bits only for x^2.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied, thanks Stephen.

What about Willy Tarreau's supposedly even faster variant?
Or does this incorporate that set of improvements?

That's what this is:
   x = (2 * x + (uint32_t)div64_64(a, (uint64_t)x*(uint64_t)x)) / 3;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] rcfs core patch

2007-03-07 Thread Eric W. Biederman

Srivatsa Vaddagiri <[EMAIL PROTECTED]> writes:

> Heavily based on Paul Menage's (inturn cpuset) work. The big difference
> is that the patch uses task->nsproxy to group tasks for resource control
> purpose (instead of task->containers).
>
> The patch retains the same user interface as Paul Menage's patches. In
> particular, you can have multiple hierarchies, each hierarchy giving a 
> different composition/view of task-groups.
>
> (Ideally this patch should have been split into 2 or 3 sub-patches, but
> will do that on a subsequent version post)

After looking at the discussion that happened immediately after this was
posted this feels like the right general direction to get the different
parties talking to each other.  I'm not convinced about the whole idea
yet but this looks like a step in a useful direction.

I have a big request.

Please next time this kind of patch is posted add a description of
what is happening and why.  I have yet to see people explain why
this is a good idea.  Why the current semantics were chosen.

The review is still largely happening at the why level but no
one is addressing that yet.  So please can we have a why.

I have a question?  What does rcfs look like if we start with
the code that is in the kernel?  That is start with namespaces
and nsproxy and just build a filesystem to display/manipulate them?
With the code built so it will support adding resource controllers
when they are ready?

> Signed-off-by : Srivatsa Vaddagiri <[EMAIL PROTECTED]>
> Signed-off-by : Paul Menage <[EMAIL PROTECTED]>
>
>
> ---
>
>  linux-2.6.20-vatsa/include/linux/init_task.h |4 
>  linux-2.6.20-vatsa/include/linux/nsproxy.h   |5 
>  linux-2.6.20-vatsa/init/Kconfig  |   22 
>  linux-2.6.20-vatsa/init/main.c   |1 
>  linux-2.6.20-vatsa/kernel/Makefile   |1 
>
>
> ---
>

> diff -puN include/linux/nsproxy.h~rcfs include/linux/nsproxy.h
> --- linux-2.6.20/include/linux/nsproxy.h~rcfs 2007-03-01 14:20:47.0
> +0530
> +++ linux-2.6.20-vatsa/include/linux/nsproxy.h 2007-03-01 14:20:47.0
> +0530
> @@ -28,6 +28,10 @@ struct nsproxy {
We probably want to rename this struct task_proxy
And then we can rename most of the users things like:
dup_task_proxy, clone_task_proxy, get_task_proxy, free_task_proxy,
put_task_proxy, exit_task_proxy, init_task_proxy

>   struct ipc_namespace *ipc_ns;
>   struct mnt_namespace *mnt_ns;
>   struct pid_namespace *pid_ns;
> +#ifdef CONFIG_RCFS
> + struct list_head list;

This extra list of nsproxy's is unneeded and a performance problem the
way it is used.  In general we want to talk about the individual resource
controllers not the nsproxy.

> + void *ctlr_data[CONFIG_MAX_RC_SUBSYS];

I still don't understand why these pointers are so abstract,
and why we need an array lookup into them?

> +#endif
>  };
>  extern struct nsproxy init_nsproxy;
>  
> @@ -35,6 +39,12 @@ struct nsproxy *dup_namespaces(struct ns
>  int copy_namespaces(int flags, struct task_struct *tsk);
>  void get_task_namespaces(struct task_struct *tsk);
>  void free_nsproxy(struct nsproxy *ns);
> +#ifdef CONFIG_RCFS
> +struct nsproxy *find_nsproxy(struct nsproxy *ns);
> +int namespaces_init(void);
> +#else
> +static inline int namespaces_init(void) { return 0;}
> +#endif
>  
>  static inline void put_nsproxy(struct nsproxy *ns)
>  {
> diff -puN /dev/null include/linux/rcfs.h
> --- /dev/null 2006-02-25 03:06:56.0 +0530
> +++ linux-2.6.20-vatsa/include/linux/rcfs.h 2007-03-01 14:20:47.0 
> +0530
> @@ -0,0 +1,72 @@
> +#ifndef _LINUX_RCFS_H
> +#define _LINUX_RCFS_H
> +
> +#ifdef CONFIG_RCFS
> +
> +/* struct cftype:
> + *
> + * The files in the container filesystem mostly have a very simple read/write
> + * handling, some common function will take care of it. Nevertheless some 
> cases
> + * (read tasks) are special and therefore I define this structure for every
> + * kind of file.

I'm still inclined to think this should be part of /proc, instead of a purely
separate fs.  But I might be missing something.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 097/101] revert "drivers/net/tulip/dmfe: support basic carrier detection"

2007-03-07 Thread Linus Torvalds



On Wed, 7 Mar 2007, Dan Williams wrote:
> 
> Definitely right.  If it doesn't work for your card, it needs to be
> fixed for your card.

Well, regressions are regressions. And they are a *lot* more important 
than any new features. If it doesn't work, it gets reverted.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Loop device - Tracking page writes made to a loop device through mmap

2007-03-07 Thread Kandan Venkataraman



All comments have been taken care of.

Description:

A  file_operations structure variable called loop_fops is initialised with 
the default block device file operations (def_blk_fops).

The mmap operation is overriden with a new function called loop_file_mmap.

A vm_operations structure variable called loop_file_vm_ops is initialised 
with the default operations for a disk file.
The page_mkwrite operation in this variable is initialised to a new 
function called loop_track_pgwrites.


In the function lo_open, the file operations pointer of the device file is 
initialised with the address of loop_fops.


The function loop_file_mmap simply calls generic_file_mmap and then 
initialises the vm_ops of the vma with address of loop_file_vm_ops.


The function loop_track_pgwrites stores the page offset of the page that 
is being written to,  in a red-black tree within the loop device.


A flag lo_track_pgwrite has been added to the structs loop_device and 
loop_info64 to turn on/off tracking of page writes.


Two new ioctls have been added.

The ioctl cmd LOOP_GET_PGWRITES retrieves the page offsets of pages that 
have been written to.

The ioctl cmd LOOP_CLR_PGWRITES empties the red-black tree

This functionality would allow us to have a read only version and a write 
version of memory by doing the following:
Associate a normal file as backing storage for  the loop device and mmap 
to the loop device. Call this mmapped address space as area1.
Mmap to a normal file of identical size. Call this mmapped address space 
as area2.


Changes made to area1 can be periodically copied to area2 using the ioctl 
cmds (retreive dirty page offsets and copy the dirty pages from area1 to 
area2). This facility would provide a quick way of updating the read only 
version.


Motivation for new ioctls:

Imagine a business server application which processes messages from 
clients as they come in (say over a TCP connection).
Some of those messages may be transactions, i.e. they cause data changes 
in the application.
Rest of those messages may be queries i.e. they get information from the 
application.
The application can consist of two processes. One process will handle the 
transactions.
The other process will handle the queries. Each process will have its own 
copy of the business data.
The process handling transactions can mmap to the loop device for its copy 
of the memory. The loop device must have a normal file for its backing 
storage.
The process handling queries can mmap to another normal file for its copy 
of the memory.  Both these memories have identical data at the beginning.
Queries and transactions can now be handled simultaneously by the 
respective processes.
The query process can update its memory periodically by obtaining the 
changes that have have happened to the loop device.
By using the ioctl call to retrieve the dirty page offsets, only the dirty 
pages need to be copied over to the query process's copy of memory. We can 
infact have multiple processes to handle queries sharing the same memory.
During this copy over, the transaction process will hold off processing 
transactions till the update is complete.


This would be very useful for high speed in-memory transaction systems, 
where the query load can be passed of to other processes. Example of such 
systems would be a stock trading system, where clients buy and sell

stock(equity, options etc).
At the same time lot of clients would be downloading market data and this 
can be done independently of the transactions.


This new facility will provide a way of tracking changes made to business 
data, independent of the application domain.



Test program:

Before you run the test program, please create the backing storage file
for the loop device as follows

dd if=/dev/zero of=/root/file bs=4K count=10

Set bs to be whatever pagesize is in your machine. In my machine it was 
4K.



#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main()
{
int maxPages = 10;
char* start = 0;
int fd;
int dfd;
int *array = 0;
int pageSize;
int elemsPerPage;
struct loop_info64 info;
struct loop_pgoff_array pgarray;
pgarray.max = maxPages;

pgarray.pgoff = calloc(maxPages, sizeof(long));

if (pgarray.pgoff == NULL) {
fprintf(stderr, "can't create pgarray\n");
exit(1);
}
pageSize = getpagesize();

elemsPerPage = pageSize/sizeof(int);

/* open the device file */
if ((fd = open ("/dev/loop0", O_RDWR, S_IRWXU)) < 0) {
fprintf(stderr, "can't create device file for writing\n");
goto out5;
}
/* open the disk file  to set as backing storage*/
if ((dfd = open ("/root/file", O_RDWR, S_IRWXU)) < 0) {
fprintf(stderr, "can't create device file for writing\n");

Re: SATA resume slowness, e1000 MSI warning

2007-03-07 Thread Andrew Morton

On Wed, 07 Mar 2007 12:28:11 -0700 [EMAIL PROTECTED] (Eric W. Biederman) wrote:

> Below is an additional set of warnings that should help debug this.
> The old code just got lucky that it triggered a warning when this happens.
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 01869b1..5113913 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -613,6 +613,7 @@ int pci_enable_msi(struct pci_dev* dev)
>   return -EINVAL;
>  
>   WARN_ON(!!dev->msi_enabled);
> + WARN_ON(!hlist_empty(>saved_cap_space));
>  
>   /* Check whether driver already requested for MSI-X irqs */
>   if (dev->msix_enabled) {
> @@ -638,6 +639,8 @@ void pci_disable_msi(struct pci_dev* dev)
>   if (!dev->msi_enabled)
>   return;
>  
> + WARN_ON(!hlist_empty(>saved_cap_space));
> +
>   msi_set_enable(dev, 0);
>   pci_intx(dev, 1);   /* enable intx */
>   dev->msi_enabled = 0;
> @@ -739,6 +742,7 @@ int pci_enable_msix(struct pci_dev* dev, struct 
> msix_entry *entries, int nvec)
>   }
>   }
>   WARN_ON(!!dev->msix_enabled);
> + WARN_ON(!hlist_empty(>saved_cap_space));
>  
>   /* Check whether driver already requested for MSI irq */
>   if (dev->msi_enabled) {
> @@ -763,6 +767,8 @@ void pci_disable_msix(struct pci_dev* dev)
>   if (!dev->msix_enabled)
>   return;
>  
> + WARN_ON(!hlist_empty(>saved_cap_space));
> +
>   msix_set_enable(dev, 0);
>   pci_intx(dev, 1);   /* enable intx */
>   dev->msix_enabled = 0;
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index bd44a48..4418839 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -677,6 +677,7 @@ pci_restore_state(struct pci_dev *dev)
>   }
>   pci_restore_pcix_state(dev);
>   pci_restore_msi_state(dev);
> + WARN_ON(!hlist_empty(>saved_cap_space));
>  
>   return 0;
>  }

Got a hit on a powerpc g5:

PM: Writing back config space on device 0001:05:04.0 at offset 1 (was 2b0, 
writing 2b6)
[ cut here ]
Badness at drivers/pci/pci.c:679
Call Trace:
[C80F7410] [C0011EFC] .show_stack+0x50/0x1cc (unreliable)
[C80F74C0] [C01AD610] .report_bug+0xa0/0x110
[C80F7550] [C00256E4] .program_check_exception+0xb4/0x670
[C80F7630] [C00046F4] program_check_common+0xf4/0x100
--- Exception: 700 at .pci_restore_state+0x310/0x340
LR = .pci_restore_state+0x2e0/0x340
[C80F79D0] [C026A174] .tg3_chip_reset+0x19c/0xa04
[C80F7A90] [C026D948] .tg3_reset_hw+0xa4/0x2718
[C80F7BA0] [C0270030] .tg3_init_hw+0x74/0x94
[C80F7C30] [C0270BE0] .tg3_open+0x4c8/0x854
[C80F7CF0] [C03A74A4] .dev_open+0x100/0x12c
[C80F7D90] [C03BAEA8] .netpoll_setup+0x2dc/0x3ec
[C80F7E40] [C0283450] .init_netconsole+0x64/0x8c
[C80F7EC0] [C05C0BE4] .init+0x1d0/0x390
[C80F7F90] [C00271F8] .kernel_thread+0x4c/0x68
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
Scheduler bitmap error - bitmap being reconstructed..
netconsole: network logging started
Calling initcall 0xc06bd180: .macio_module_init+0x0/0x3c()

That's:

pci_restore_pcix_state(dev);
pci_restore_msi_state(dev);
WARN_ON(!hlist_empty(>saved_cap_space));

return 0;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Paul Menage


On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:


Sorry, I didn't realise I was talking with somebody qualified enough to
speak on behalf of the Generally Established Principles of Computer Science.


I made sure to check

http://en.wikipedia.org/wiki/Namespace
http://en.wikipedia.org/wiki/Namespace_%28computer_science%29

when this argument started ... :-)



This is the classic terminology problem between substance and function.
ie, some things share characteristics but does that mean they are the
same thing?


Aren't you arguing my side here? My point is that what I'm trying to
add with "containers" (or whatever name we end up using) can't easily
be subsumed into the "namespace" concept, and you're arguing that they
should go into nsproxy because they share some characteristics.



Look, I already agreed in the earlier thread that the term "namespace"
was being stretched beyond belief, yet instead of trying to be useful
about this you still insist on calling this sub-system specific stuff
the "container",


Uh, no. I'm trying to call a *grouping* of processes a container.


and then go screaming that I am wrong and you are right
on terminology.


Actually I asked if you/Eric had better suggestions.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tcp_cubic: use 32 bit math

2007-03-07 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 7 Mar 2007 17:07:31 -0800

> The basic calculation has to be done in 32 bits to avoid
> doing 64 bit divide by 3. The value x is only 22bits max
> so only need full 64 bits only for x^2.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied, thanks Stephen.

What about Willy Tarreau's supposedly even faster variant?
Or does this incorporate that set of improvements?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm

2007-03-07 Thread Andrew Morton

On Wed, 7 Mar 2007 17:43:45 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Wed, 7 Mar 2007 12:26:42 +1100
> Con Kolivas <[EMAIL PROTECTED]> wrote:
> 
> > What follows is the same patch series that constitutes the RDSL "Rotating 
> > Staircase DeadLine" cpu scheduler resynced for 2.6.21-rc2-mm2.
> 
> Big oops early in boot on x86_64 SMP, in rq_bitmap_error+0x97/0x9f.
> 
> I stubbed it out with a `return MAX_RT_PRIO;' (I think) but it then oopsed
> differently.  Before netconsole had come up, no serial console, no digital
> camera.
> 
> There's stuff in http://userweb.kernel.org/~akpm/ck/ - you can probably
> boot that kernel on your own machine.
> 
> I need to do rc3-mm1 now.  I might find some time to poke at this
> further after that, but I have to leave for a week in .jp and it'll be
> squeezy, sorry.

well it boots os dual pIII and quad powerpc.

The powerpc says

Scheduler bitmap error - bitmap being reconstructed..

during bootup.  But it didn't crash like the Nocona machine.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Kyle Moffett


On Mar 07, 2007, at 20:25:14, Michael K. Edwards wrote:

On 3/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote

In general, using software prefetching is just a stupid idea, unless

 - the prefetch really is very strict (ie for a linked list you do  
exactly the above kinds of things to make sure that you don't try  
to prefetch the non-existent end entry)

AND
 - the CPU is stupid (in-order in particular).

I think Intel even suggests in their optimization manuals to *not*  
do software prefetching, because hw can usually simply do better  
without it.


Not the XScale -- it performs quite poorly without prefetch, as  
people who have run ARMv5-optimized binaries on it can testify.


The Intel XScale(r) core prefetch load instruction is a true  
prefetch instruction because the load destination is the data or  
mini-data cache and not a register. Compilers for processors which  
have data caches, but do not support prefetch, sometimes use a load  
instruction to preload the data cache. This technique has the  
disadvantages of using a register to load data and requiring  
additional registers for
subsequent preloads and thus increasing register pressure. By  
contrast, the prefetch can be used to reduce register pressure  
instead of increasing it.


The prefetch load is a hint instruction and does not guarantee that  
the data will be loaded. Whenever the load would cause a fault or a  
table walk, then the processor will ignore the prefetch  
instruction, the fault or table walk, and continue processing the  
next instruction. This is particularly advantageous in the case  
where a linked list or recursive data structure is terminated by a  
NULL pointer. Prefetching the NULL pointer will not fault program  
flow.


Prefetching is also fairly critical on a Power4 or G5 PowerPC system  
as they have a long memory latency; an L2-cache miss can cost 200+  
cycles.  On such systems the "dcbt" prefetch instruction brings in a  
single 128-byte cacheline and has no serializing effects whatsoever,  
making it ideal for use in a linked-list-traversal inner loop.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Sam Vilain

Paul Menage wrote:
> Sorry, I think this statement is wrong, by the generally established
> meaning of the term namespace in computer science.
>   

Sorry, I didn't realise I was talking with somebody qualified enough to
speak on behalf of the Generally Established Principles of Computer Science.

>> Trying to extend the well-known term namespace to refer to thingsthat
>> are semantically equivalent namespaces is a useful approach, IMHO.
>>
>> 
> Yes, that would be true. But the kinds of groupings that we're talking
> about are supersets of namespaces, not semantically equivalent to
> them. To use Eric's "shoe" analogy from earlier, it's like insisting
> that we use the term "sneaker" to refer to all footware, including ski
> boots and birkenstocks ...
>   

I see it more like insisting that we use the term "clothing" to also
refer to "weapons" because for both of them you tell your body to "wear"
them in some game.

This is the classic terminology problem between substance and function. 
ie, some things share characteristics but does that mean they are the
same thing?

Look, I already agreed in the earlier thread that the term "namespace"
was being stretched beyond belief, yet instead of trying to be useful
about this you still insist on calling this sub-system specific stuff
the "container", and then go screaming that I am wrong and you are right
on terminology.

I've normally recognised[1] these three things as the primary feature
groups of vserver:

  - isolation
  - resource limiting
  - resource sharing

So I've got no problem with using "clothing" remaining for isolation and
"weapons" for resource sharing and limiting.  Or some other suitable terms.

Sam.

1. eg, http://utsl.gen.nz/talks/vserver/slide4c.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 097/101] revert "drivers/net/tulip/dmfe: support basic carrier detection"

2007-03-07 Thread Dan Williams

On Wed, 2007-03-07 at 10:14 -0800, Stephen Hemminger wrote:
> On Wed, 07 Mar 2007 09:12:12 -0800
> Greg KH <[EMAIL PROTECTED]> wrote:
> 
> > 
> > From: Andrew Morton <[EMAIL PROTECTED]>
> > 
> > Revert 7628b0a8c01a02966d2228bdf741ddedb128e8f8.  Thomas Bachler
> > reports:
> > 
> >   Commit 7628b0a8c01a02966d2228bdf741ddedb128e8f8 (drivers/net/tulip/dmfe:
> >   support basic carrier detection) breaks networking on my Davicom DM9009. 
> >   ethtool always reports there is no link.  tcpdump shows incoming packets,
> >   but TX is disabled.  Reverting the above patch fixes the problem.
> > 
> 
> Carrier detection support is important and should be fixed rather than 
> removed.

Definitely right.  If it doesn't work for your card, it needs to be
fixed for your card.

Dan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/6 -rt] powerpc 2.6.20-rt8: fix boot/runtime errors/warnings for PowerPC(ppc64)

2007-03-07 Thread Tsutomu OWA


At Wed, 07 Mar 2007 17:26:50 +0300,
Sergei Shtylyov wrote:
> 
> Tsutomu OWA wrote:

> >   CONFIG_MCOUNT, CONFIG_LATENCY_TRACE and other tracing options nor
> > CONFIG_GENERIC_TIME,
> 
> There is PowerPC genTOD patch and it's incorporated into -rt (don't know 
> it works for Cell) but it breaks TOD vsyscalls. Several months ago I've 
> posted 
> patches removing them for the time being:

> > clockevents etc are not yet ported.
> 
> Note that there *is* PowerPC clockevents driver already (don't know if it 
> works for Cell) -- it just never got merged to -rt:

  I should have written like "... are not yet ported by myself."

  anyway, thanks for the info.
-- owa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman

"Paul Menage" <[EMAIL PROTECTED]> writes:

> On 3/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>> The real trick is that I believe these groupings are designed to be something
>> you can setup on login and then not be able to switch out of.
>
> That's going to to be the case for most resource controllers - is that
> the case for namespaces? (e.g. can any task unshare say its mount
> namespace?)

With namespaces there are secondary issues with unsharing.  Weird things
like a simple unshare might allow you to replace /etc/shadow and thus
mess up a suid root application.

Once people have worked through those secondary issues unsharing of
namespaces is likely allowable (for someone without CAP_SYS_ADMIN).
Although if you pick the truly hierarchical namespaces the pid
namespace unsharing will simply give you a parent of the current
namespace.

For resource controls I expect unsharing is likely to be like the pid
namespace.  You might allow it but if you do you are forced to be a
child and possible there will be hierarchy depth restrictions.
Assuming you can implement hierarchical accounting without to much
expense.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PAGE_SIZE Availability Inconsistency

2007-03-07 Thread Roman Zippel

Hi,

On Tuesday 06 March 2007 10:29, Christoph Hellwig wrote:

> PAGE_SIZE should not be available at all.  Please use getpagesize()
> instead.

While I agree, NBPG is a bit of a problem, although it's only needed for aout 
coredumps AFAICT, but still needed to compile e.g. gdb.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: f_owner.lock and file->pos updates

2007-03-07 Thread Michael K. Edwards


I wrote:

I didn't see any clean way to intersperse overwrites and appends to a
record-structured file without using vfs_llseek, which steps on f_pos.


The context, of course, is an attempt to fix -ENOPATCH with regard to
the netlink-based AIO submission scheme I outlined a couple of days
ago.  :-)

Maybe f_pos should be advanced atomically by the number of bytes
expected to be read/written, before entering the vfs_(read|write)(|v)
call?  And then if the read/write doesn't complete normally, f_pos
should be decremented by the number of bytes we failed to read/write?
Or do we have to make absolutely, positively sure that sampling f_pos
from another thread never returns any value outside (before)..(before
+ bytes read/written)?  If so, the only way to cure the worst symptom
of the append race appears to be to hold a per-fd lock for the
duration of the sys_(read|write).

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: f_owner.lock and file->pos updates

2007-03-07 Thread Michael K. Edwards


On 3/7/07, Alan Cox <[EMAIL PROTECTED]> wrote:

The right way IMHO would be to do the work that was done for pread/pwrite
and implement preadv/pwritev. The moment you want to do atomic things
with the file->f_pos instead of doing it with a local passed pos value it
gets ugly.. why do you need to do it with f_pos ?


I didn't see any clean way to intersperse overwrites and appends to a
record-structured file without using vfs_llseek, which steps on f_pos.
Actually, we may already have a problem with append races in
sys_write/sys_writev.  If it's possible for two threads to write() to
the same file in different threads (both intending to append), they
may wind up passing the same "pos" value into vfs_write().  Or does
fget_light/fput_light do some sort of locking that I'm not seeing?

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [RSDL-mm 0/6] RSDL cpu scheduler for -mm

2007-03-07 Thread Andrew Morton

On Wed, 7 Mar 2007 12:26:42 +1100
Con Kolivas <[EMAIL PROTECTED]> wrote:

> What follows is the same patch series that constitutes the RDSL "Rotating 
> Staircase DeadLine" cpu scheduler resynced for 2.6.21-rc2-mm2.

Big oops early in boot on x86_64 SMP, in rq_bitmap_error+0x97/0x9f.

I stubbed it out with a `return MAX_RT_PRIO;' (I think) but it then oopsed
differently.  Before netconsole had come up, no serial console, no digital
camera.

There's stuff in http://userweb.kernel.org/~akpm/ck/ - you can probably
boot that kernel on your own machine.

I need to do rc3-mm1 now.  I might find some time to poke at this
further after that, but I have to leave for a week in .jp and it'll be
squeezy, sorry.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Replace misspelled "PRINTK" with "CONFIG_PRINTK".

2007-03-07 Thread Neil Brown

On Wednesday March 7, [EMAIL PROTECTED] wrote:
> 
>   Replace the apparently misspelled preprocessor variable "PRINTK"
> with "CONFIG_PRINTK".

No, it is meant to be "PRINTK".
It dates way way back before my time, but presumably the idea was you
could -DPRINTK=something and if you didn't do that, it would figure
out what it thought you wanted.

Definitely not meant to be CONFIG_PRINTK.

NeilBrown


> 
> Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>
> 
> ---
> 
>   not sure who the official maintainer here is, sorry.
> 
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index 5554ada..0c09772 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -53,7 +53,7 @@
>  //#define DPRINTK PRINTK /* set this NULL to avoid verbose debug output */
>  #define DPRINTK(x...) do { } while(0)
> 
> -#ifndef PRINTK
> +#ifndef CONFIG_PRINTK
>  #  if DEBUG > 0
>  #define PRINTK(x...) printk(KERN_DEBUG x)
>  #  else
> 
> -- 
> 
> Robert P. J. Day
> Linux Consulting, Training and Annoying Kernel Pedantry
> Waterloo, Ontario, CANADA
> 
> http://fsdev.net/wiki/index.php?title=Main_Page
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Paul Menage


On 3/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:

Pretty much.  For most of the other cases I think we are safe referring
to them as resource controls or resource limits.I know that roughly covers
what cpusets and beancounters and ckrm currently do.


Plus resource monitoring (which may often be a subset of resource
control/limits).



The real trick is that I believe these groupings are designed to be something
you can setup on login and then not be able to switch out of.


That's going to to be the case for most resource controllers - is that
the case for namespaces? (e.g. can any task unshare say its mount
namespace?)

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc2-mm2 hang

2007-03-07 Thread Siddha, Suresh B

On Wed, Mar 07, 2007 at 02:12:16PM -0800, Dave Hansen wrote:
> I'm seeing weird hangs running ltp on 2.6.21-rc2-mm2.  It manifests
> itself by the waitpid06 test in LTP hanging.  This is very, very
> reproducible in about 5 seconds by adding '-s wait' to the ltp command
> line.
> 
> I see 4 waitpid06 processes on my 4-way machine spinning in userspace.
> But, the weird part is that I can't ssh in once this happens, but I can
> log in to the console.  I've bisected it down to:
> 
> sched-fix-idle-load-balancing-in-softirqd-context

[having some mailer issues. Pl ignore if this is a duplicate]

This sounds like an issue in merge we recently had and 2.6.21-rc2-mm2 already
has a fix for this.

sched-fix-idle-load-balancing-in-softirqd-context-fix.patch  

Can you please apply both
sched-fix-idle-load-balancing-in-softirqd-context
sched-fix-idle-load-balancing-in-softirqd-context-fix.patch
and see if you still see this problem?

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman

"Paul Menage" <[EMAIL PROTECTED]> writes:

> On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:
>> But "namespace" has well-established historical semantics too - a way
>> of changing the mappings of local * to global objects. This
>> accurately describes things liek resource controllers, cpusets, resource
>> monitoring, etc.
>
> Sorry, I think this statement is wrong, by the generally established
> meaning of the term namespace in computer science.
>
>>
>> Trying to extend the well-known term namespace to refer to things that
>> are semantically equivalent namespaces is a useful approach, IMHO.
>>
>
> Yes, that would be true. But the kinds of groupings that we're talking
> about are supersets of namespaces, not semantically equivalent to
> them. To use Eric's "shoe" analogy from earlier, it's like insisting
> that we use the term "sneaker" to refer to all footware, including ski
> boots and birkenstocks ...

Pretty much.  For most of the other cases I think we are safe referring
to them as resource controls or resource limits.I know that roughly covers
what cpusets and beancounters and ckrm currently do.

The real trick is that I believe these groupings are designed to be something
you can setup on login and then not be able to switch out of.  Which means
we can't use sessions and process groups as the grouping entities as those 
have different semantics.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Michael K. Edwards


On 3/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote

Yeah, I'm not at all surprised. Any implementation of "prefetch" that
doesn't just turn into a no-op if the TLB entry doesn't exist (which makes
them weaker for *actual* prefetching) will generally have a hard time with
a NULL pointer. Exactly because it will try to do a totally unnecessary
TLB fill - and since most CPU's will not cache negative TLB entries, that
unnecessary TLB fill will be done over and over and over again..


Data prefetch instructions should indeed avoid page table walks.
(Instruction prefetch mechanisms often do induce table walks on ITLB
miss.)  Not just because of the null pointer case, but because it's
quite normal to run off the end of an array in a loop with an embedded
prefetch instruction.  If you have an extra instruction issue unit
that shares the same DTLB, and you know you will really want that
data, you can sometimes use it to force DTLB preloads by doing an
actual data fetch from the foreseeable page.  This is potentially one
of the best uses of chip multi-threading on an architecture like Sun's
Niagara.

(I don't think Intel's hyper-threading works for this purpose; the
DTLB is shared but the entries are marked as owned by one thread or
the other.  HT can be used for L2 cache prefetching, although the
results so far seem to be mixed:
http://www.cgo.org/cgo2004/papers/02_80_Kim_D_REVISED.pdf)


In general, using software prefetching is just a stupid idea, unless

 - the prefetch really is very strict (ie for a linked list you do exactly
   the above kinds of things to make sure that you don't try to prefetch
   the non-existent end entry)
AND
 - the CPU is stupid (in-order in particular).

I think Intel even suggests in their optimization manuals to *not* do
software prefetching, because hw can usually simply do better without it.


Not the XScale -- it performs quite poorly without prefetch, as people
who have run ARMv5-optimized binaries on it can testify.  From the
XScale Core Developer's Manual:


The Intel XScale(r) core has a true prefetch load instruction (PLD).
The purpose of this instruction is to preload data into the data and
mini-data caches. Data prefetching allows hiding of memory transfer
latency while the processor continues to execute instructions. The
prefetch is important to compiler and assembly code because judicious
use of the prefetch instruction can enormously improve throughput
performance of the core. Data prefetch can be applied not only to
loops but also to any data references within a block of code. Prefetch
also applies to data writing when the memory type is enabled as write
allocate

The Intel XScale(r) core prefetch load instruction is a true prefetch
instruction because the load destination is the data or mini-data
cache and not a register. Compilers for processors which have data
caches, but do not support prefetch, sometimes use a load instruction
to preload the data cache. This technique has the disadvantages of
using a register to load data and requiring additional registers for
subsequent preloads and thus increasing register pressure. By
contrast, the prefetch can be used to reduce register pressure instead
of increasing it.

The prefetch load is a hint instruction and does not guarantee that
the data will be loaded. Whenever the load would cause a fault or a
table walk, then the processor will ignore the prefetch instruction,
the fault or table walk, and continue processing the next instruction.
This is particularly advantageous in the case where a linked list or
recursive data structure is terminated by a NULL pointer. Prefetching
the NULL pointer will not fault program flow.


People's prejudices against prefetch instructions are sometimes
traceable to the 3DNow! prefetch(w) botch, which some processors
"support" as no-ops and others are too aggressive about (Opteron
prefetches are reputed to be "strong", i. e., not dropped on DTLB
miss).  XScale gets it right.  So do most Pentium 4's using the SSE
prefetches, according to the IA-32 optimization manual.  (Oddly,
Prescott seems to have initiated a page table walk on DTLB miss during
software prefetch -- just one of many weird Prescott flaws.)  I'm
guessing Pentium M and its descendants (Core Solo and Duo) get it
right but I'm having a hell of a time finding out for sure.  Can any
of the x86 experts answer this?

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Linux 2.6.20.1 - unable to handle kernel paging request - accessing freed memory?

2007-03-07 Thread Chris Rankin

--- Pekka J Enberg <[EMAIL PROTECTED]> wrote:
> It should give us a better clue which sysfs file is causing the oops.

This BUG happened during boot-up! The only USB device I have is a pwc webcam:

$ /sbin/lsusb
Bus 004 Device 001: ID :
Bus 003 Device 001: ID :
Bus 002 Device 001: ID :
Bus 001 Device 003: ID 046d:08b4 Logitech, Inc. QuickCam Zoom
Bus 001 Device 001: ID :

Linux version 2.6.20.1 ([EMAIL PROTECTED]) (gcc version 4.1.1 20070105 (Red Hat 
4.1.1-51))
#3 SMP PREEMPT Thu Mar 1 12:06:59 GMT 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start:  size: 000a end: 
000a type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 000f size: 0001 end: 
0010 type: 2
copy_e820_map() start: 0010 size: 7fe75000 end: 
7ff75000 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 7ff75000 size: 2000 end: 
7ff77000 type: 4
copy_e820_map() start: 7ff77000 size: 00021000 end: 
7ff98000 type: 3
copy_e820_map() start: 7ff98000 size: 00068000 end: 
8000 type: 2
copy_e820_map() start: fec0 size: 0009 end: 
fec9 type: 2
copy_e820_map() start: fee0 size: 0001 end: 
fee1 type: 2
copy_e820_map() start: ffb0 size: 0050 end: 
0001 type: 2
 BIOS-e820:  - 000a (usable)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 7ff75000 (usable)
 BIOS-e820: 7ff75000 - 7ff77000 (ACPI NVS)
 BIOS-e820: 7ff77000 - 7ff98000 (ACPI data)
 BIOS-e820: 7ff98000 - 8000 (reserved)
 BIOS-e820: fec0 - fec9 (reserved)
 BIOS-e820: fee0 - fee1 (reserved)
 BIOS-e820: ffb0 - 0001 (reserved)
1151MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000fe710
Entering add_active_range(0, 0, 524149) 0 entries of 256 used
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   229376
  HighMem229376 ->   524149
early_node_map[1] active PFN ranges
0:0 ->   524149
On node 0 totalpages: 524149
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 1760 pages used for memmap
  Normal zone: 223520 pages, LIFO batch:31
  HighMem zone: 2302 pages used for memmap
  HighMem zone: 292471 pages, LIFO batch:31
DMI 2.3 present.
ACPI: RSDP (v000 DELL  ) @ 0x000febc0
ACPI: RSDT (v001 DELLWS 650  0x0009 ASL  0x0061) @ 0x000fd4f1
ACPI: FADT (v001 DELLWS 650  0x0009 ASL  0x0061) @ 0x000fd529
ACPI: SSDT (v001   DELLst_ex 0x1000 MSFT 0x010d) @ 0xfffefafa
ACPI: MADT (v001 DELLWS 650  0x0009 ASL  0x0061) @ 0x000fd59d
ACPI: BOOT (v001 DELLWS 650  0x0009 ASL  0x0061) @ 0x000fd621
ACPI: ASF! (v016 DELLWS 650  0x0009 ASL  0x0061) @ 0x000fd649
ACPI: DSDT (v001   DELLdt_ex 0x1000 MSFT 0x010d) @ 0x
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
Processor #6 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
Processor #1 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] enabled)
Processor #7 15:2 APIC version 20
ACPI: IOAPIC (id[0x08] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec0, GSI 0-23
ACPI: IOAPIC (id[0x09] address[0xfec8] gsi_base[24])
IOAPIC[1]: apic_id 9, version 32, address 0xfec8, GSI 24-47
ACPI: IOAPIC (id[0x0a] address[0xfec80800] gsi_base[48])
IOAPIC[2]: apic_id 10, version 32, address 0xfec80800, GSI 48-71
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 3 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 8800 (gap: 8000:7ec0)
Detected 2658.187 MHz processor.
Built 1 zonelists.  Total pages: 520055
Kernel command line: ro root=LABEL=/ nmi_watchdog=1 elevator=cfq 
console=ttyS0,115200n8
console=tty0 acpi_pm_good
mapped APIC to d000 (fee0)
mapped IOAPIC to c000 (fec0)
mapped IOAPIC to b000 (fec8)
mapped IOAPIC to a000 (fec80800)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c0345000 soft=c033d000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Console: colour VGA+ 80x25
Dentry

Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

2007-03-07 Thread Jeremy Fitzhardinge

Daniel Arai wrote:
> But more importantly, we want a kernel that can run both on native hardware 
> and 
> in a paravirtualized environment.  Linux doesn't really provide abstractions 
> for 
>   replacing the appropriate code.  We tried to hook into the source code at a 
> level that seemed possible.
>   

Xen doesn't support any kind of apic emulation, so we'll need to hook
anything which relies on an apic.  The ipi code you quote below will
probably be one of those.

My opinion is that pv_ops shouldn't have raw apic operations, but
instead have appropriate high-level interfaces to achieve the same
ends.  Zach's counter-argument was basically your's: that the VMI code
will use a lot of the native code except for the actual apic operations.

I can live with VMI emulating apics if it wants, so long as it does it
in private and doesn't make a big scene about it.  We'll need the
high-level interfaces regardless.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 5/5] signalfd v2 - compat code ...

2007-03-07 Thread Davide Libenzi

This patch implement the necessary compat code for the signalfd system calls.


Signed-off-by: Davide Libenzi 


- Davide



Index: linux-2.6.20.ep2/fs/compat.c
===
--- linux-2.6.20.ep2.orig/fs/compat.c   2007-03-07 13:28:39.0 -0800
+++ linux-2.6.20.ep2/fs/compat.c2007-03-07 13:42:18.0 -0800
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -2235,3 +2236,41 @@
return sys_ni_syscall();
 }
 #endif
+
+asmlinkage long compat_sys_signalfd(int ufd,
+   const compat_sigset_t __user *sigmask,
+   compat_size_t sigsetsize)
+{
+   compat_sigset_t ss32;
+   sigset_t tmp;
+   sigset_t __user *ksigmask;
+
+   if (sigsetsize != sizeof(compat_sigset_t))
+   return -EINVAL;
+   if (copy_from_user(, sigmask, sizeof(ss32)))
+   return -EFAULT;
+   sigset_from_compat(, );
+   ksigmask = compat_alloc_user_space(sizeof(sigset_t));
+   if (copy_to_user(ksigmask, , sizeof(sigset_t)))
+   return -EFAULT;
+
+   return sys_signalfd(ufd, ksigmask, sizeof(sigset_t));
+}
+
+asmlinkage long compat_sys_signalfd_dequeue(int fd,
+   struct compat_siginfo __user *info,
+   long timeo)
+{
+   siginfo_t kinfo;
+   long ret;
+   mm_segment_t old_fs = get_fs();
+
+   set_fs(KERNEL_DS);
+   ret = sys_signalfd_dequeue(fd, (siginfo_t __user *) , timeo);
+   set_fs(old_fs);
+   if (!ret)
+   ret = copy_siginfo_to_user32(info, );
+
+   return ret;
+}
+
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/5] signalfd v2 - anonymous inode source ...

2007-03-07 Thread Davide Libenzi

This patch add an anonymous inode source, to be used for files that need 
and inode only in order to create a file*. We do not care of having an 
inode for each file, and we do not even care of having different names in 
the associated dentries (dentry names will be same for classes of file*).
This allow code reuse, and will be used by epoll, signalfd and timerfd 
(and whatever else there'll be).

(Andrew already has this in -mm)


Signed-off-by: Davide Libenzi 



- Davide



Index: linux-2.6.20.ep2/fs/anon_inodes.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.20.ep2/fs/anon_inodes.c   2007-03-07 15:58:01.0 -0800
@@ -0,0 +1,203 @@
+/*
+ *  fs/anon_inodes.c
+ *
+ *  Copyright (C) 2007  Davide Libenzi 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+
+
+static int ainofs_delete_dentry(struct dentry *dentry);
+static struct inode *aino_getinode(void);
+static struct inode *aino_mkinode(void);
+static int ainofs_get_sb(struct file_system_type *fs_type, int flags,
+const char *dev_name, void *data, struct vfsmount 
*mnt);
+
+
+
+static struct vfsmount *aino_mnt __read_mostly;
+static struct inode *aino_inode;
+static struct file_operations aino_fops = { };
+static struct file_system_type aino_fs_type = {
+   .name   = "ainofs",
+   .get_sb = ainofs_get_sb,
+   .kill_sb= kill_anon_super,
+};
+static struct dentry_operations ainofs_dentry_operations = {
+   .d_delete   = ainofs_delete_dentry,
+};
+
+
+
+int aino_getfd(int *pfd, struct inode **pinode, struct file **pfile,
+  char const *name, const struct file_operations *fops, void *priv)
+{
+   struct qstr this;
+   struct dentry *dentry;
+   struct inode *inode;
+   struct file *file;
+   int error, fd;
+
+   error = -ENFILE;
+   file = get_empty_filp();
+   if (!file)
+   goto eexit_1;
+
+   inode = aino_getinode();
+   if (IS_ERR(inode)) {
+   error = PTR_ERR(inode);
+   goto eexit_2;
+   }
+
+   error = get_unused_fd();
+   if (error < 0)
+   goto eexit_3;
+   fd = error;
+
+   /*
+* Link the inode to a directory entry by creating a unique name
+* using the inode sequence number.
+*/
+   error = -ENOMEM;
+   this.name = name;
+   this.len = strlen(name);
+   this.hash = 0;
+   dentry = d_alloc(aino_mnt->mnt_sb->s_root, );
+   if (!dentry)
+   goto eexit_4;
+   dentry->d_op = _dentry_operations;
+   /* Do not publish this dentry inside the global dentry hash table */
+   dentry->d_flags &= ~DCACHE_UNHASHED;
+   d_instantiate(dentry, inode);
+
+   file->f_path.mnt = mntget(aino_mnt);
+   file->f_path.dentry = dentry;
+   file->f_mapping = inode->i_mapping;
+
+   file->f_pos = 0;
+   file->f_flags = O_RDONLY;
+   file->f_op = fops;
+   file->f_mode = FMODE_READ;
+   file->f_version = 0;
+   file->private_data = priv;
+
+   fd_install(fd, file);
+
+   *pfd = fd;
+   *pinode = inode;
+   *pfile = file;
+   return 0;
+
+eexit_4:
+   put_unused_fd(fd);
+eexit_3:
+   iput(inode);
+eexit_2:
+   put_filp(file);
+eexit_1:
+   return error;
+}
+
+
+static int ainofs_delete_dentry(struct dentry *dentry)
+{
+   /*
+* We faked vfs to believe the dentry was hashed when we created it.
+* Now we restore the flag so that dput() will work correctly.
+*/
+   dentry->d_flags |= DCACHE_UNHASHED;
+   return 1;
+}
+
+
+static struct inode *aino_getinode(void)
+{
+   return igrab(aino_inode);
+}
+
+
+/*
+ * A single inode exist for all aino files. On the contrary of pipes,
+ * aino inodes has no per-instance data associated, so we can avoid
+ * the allocation of multiple of them.
+ */
+static struct inode *aino_mkinode(void)
+{
+   int error = -ENOMEM;
+   struct inode *inode = new_inode(aino_mnt->mnt_sb);
+
+   if (!inode)
+   goto eexit_1;
+
+   inode->i_fop = _fops;
+
+   /*
+* Mark the inode dirty from the very beginning,
+* that way it will never be moved to the dirty
+* list because mark_inode_dirty() will think
+* that it already _is_ on the dirty list.
+*/
+   inode->i_state = I_DIRTY;
+   inode->i_mode = S_IRUSR | S_IWUSR;
+   inode->i_uid = current->fsuid;
+   inode->i_gid = current->fsgid;
+   inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+   return inode;
+
+eexit_1:
+   return ERR_PTR(error);
+}
+
+
+static int ainofs_get_sb(struct file_system_type *fs_type, int flags,
+const char *dev_name, void *data, struct vfsmount *mnt)
+{
+   return get_sb_pseudo(fs_type, "aino:",

[patch 2/5] signalfd v2 - signalfd core ...

2007-03-07 Thread Davide Libenzi


This patch series implements the new signalfd() and signalfd_dequeue()
system calls. I took part of the original Linus code (and you know how
badly it can be broken :), and I added even more breakage ;)
The patch had to be almost completely changed. This patch allows multiple 
signalfd to listen for signals on the same sighand, w/out raing with 
dequeue_signal. Plus other changes that I don't remember (see here for the 
original patch http://tinyurl.com/3yuna5 ).
This seems to be working fine on my Dual Opteron machine. I made a quick 
test program for it:

http://www.xmailserver.org/signafd-test.c

The signalfd() system call implements signal delivery into a file 
descriptor receiver. The signalfd file descriptor if created with the 
following API:

int signalfd(int ufd, const sigset_t *mask, size_t masksize);

The "ufd" parameter allows to change an existing signalfd sigmask, w/out 
going to close/create cycle (Linus idea). Use "ufd" == -1 if you want a 
brand new signalfd file.
The "mask" allows to specify the signal mask of signals that we are 
interested in. The "masksize" parameter is the size of "mask".
Note that signalfd delivery and standard signal delivery can go in 
parallel. So you can receive signals on the signalfd file, and on the 
signal handlers. This makes the system more flexible IMO. If you don't 
want to see standard delivery, just pass the same "mask" to 
sigprocmask(SIG_BLOCK).
The signalfd fd supports the poll(2) system call. The poll(2) will return 
POLLIN when signals are available to be dequeued. As a direct consequence
of supporting the Linux poll subsystem, the signalfd fd can use used
together with epoll(2) too.
A new system call has been also introduced to allow signal dequeueing:

int signalfd_dequeue(int fd, siginfo_t *info, long timeo);

The "fd" parameter must ba a signalfd file descriptor. The "info" parameter
is a pointer to the siginfo that will receive the dequeued signal, and
"timeo" is a timeout in milliseconds, or -1 for infinite.
The signalfd_dequeue function returns 0 if successfull.


Signed-off-by: Davide Libenzi 



- Davide



Index: linux-2.6.20.ep2/fs/signalfd.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.20.ep2/fs/signalfd.c  2007-03-07 17:06:07.0 -0800
@@ -0,0 +1,369 @@
+/*
+ *  fs/signalfd.c
+ *
+ *  Copyright (C) 2003  Linus Torvalds
+ *
+ *  Mon Mar 5, 2007: Davide Libenzi 
+ *  Changed signal delivery and de-queueing.
+ *  Now using anonymous inode source.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+
+
+#define MAX_MSTIMEO min(1000ULL * MAX_SCHEDULE_TIMEOUT / HZ, (LONG_MAX - 
999ULL) / HZ)
+
+
+
+struct signalfd_ctx {
+   struct list_head lnk;
+   wait_queue_head_t wqh;
+   sigset_t sigmask;
+   sigset_t pending;
+   struct list_head squeue[_NSIG];
+   long lost_sigs;
+   struct task_struct *tsk;
+   struct sighand_struct *sighand;
+};
+
+struct signalfd_sq {
+   struct list_head lnk;
+   siginfo_t info;
+};
+
+
+
+static void signalfd_cleanup(struct signalfd_ctx *ctx);
+static int signalfd_close(struct inode *inode, struct file *file);
+static unsigned int signalfd_poll(struct file *filp, poll_table *wait);
+static struct signalfd_sq *signalfd_fetchsig(struct signalfd_ctx *ctx);
+
+
+
+static const struct file_operations signalfd_fops = {
+   .release= signalfd_close,
+   .poll   = signalfd_poll,
+};
+static struct kmem_cache *signalfd_ctx_cachep;
+static struct kmem_cache *signalfd_sq_cachep;
+
+
+/*
+ * This must be called with the sighand lock held.
+ */
+int signalfd_deliver(struct sighand_struct *sighand, int sig, struct siginfo 
*info)
+{
+   int nsig = 0;
+   struct list_head *pos;
+   struct signalfd_ctx *ctx;
+   struct signalfd_sq *sq;
+
+   list_for_each(pos, >sfdlist) {
+   ctx = list_entry(pos, struct signalfd_ctx, lnk);
+   /*
+* We use a negative signal value as a way to broadcast that the
+* sighand has been orphaned, so that we can notify all the
+* listeners about this.
+*/
+   if (sig < 0)
+   __wake_up_locked(>wqh, TASK_UNINTERRUPTIBLE | 
TASK_INTERRUPTIBLE);
+   else if (sigismember(>sigmask, sig) &&
+(sig >= SIGRTMIN || !sigismember(>pending, sig))) 
{
+   sigaddset(>pending, sig);
+   sq = kmem_cache_alloc(signalfd_sq_cachep, GFP_ATOMIC);
+   if (sq) {
+   signal_fill_info(>info, sig, info);
+   list_add_tail(>lnk, >squeue[sig - 1]);
+   } else
+   ctx->lost_sigs++;
+   __wake_up_locked(>wqh,

[patch 3/5] signalfd v2 - wire i386 syscall ...

2007-03-07 Thread Davide Libenzi

This patch wire the signalfd system calls to the i386 architecture.



Signed-off-by: Davide Libenzi 


- Davide



Index: linux-2.6.20.ep2/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.20.ep2.orig/arch/i386/kernel/syscall_table.S  2007-03-07 
11:07:45.0 -0800
+++ linux-2.6.20.ep2/arch/i386/kernel/syscall_table.S   2007-03-07 
12:34:33.0 -0800
@@ -319,3 +319,5 @@
.long sys_move_pages
.long sys_getcpu
.long sys_epoll_pwait
+   .long sys_signalfd  /* 320 */
+   .long sys_signalfd_dequeue
Index: linux-2.6.20.ep2/include/asm-i386/unistd.h
===
--- linux-2.6.20.ep2.orig/include/asm-i386/unistd.h 2007-03-07 
11:07:45.0 -0800
+++ linux-2.6.20.ep2/include/asm-i386/unistd.h  2007-03-07 12:34:02.0 
-0800
@@ -325,10 +325,12 @@
 #define __NR_move_pages317
 #define __NR_getcpu318
 #define __NR_epoll_pwait   319
+#define __NR_signalfd  320
+#define __NR_signalfd_dequeue  321
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 320
+#define NR_syscalls 322
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 4/5] signalfd v2 - wire x86_64 syscall ...

2007-03-07 Thread Davide Libenzi

This patch wire the signalfd system calls to the x86_64 architecture.



Signed-off-by: Davide Libenzi 


- Davide



Index: linux-2.6.20.ep2/include/asm-x86_64/unistd.h
===
--- linux-2.6.20.ep2.orig/include/asm-x86_64/unistd.h   2007-03-07 
13:28:41.0 -0800
+++ linux-2.6.20.ep2/include/asm-x86_64/unistd.h2007-03-07 
13:42:12.0 -0800
@@ -619,8 +619,12 @@
 __SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages279
 __SYSCALL(__NR_move_pages, sys_move_pages)
+#define __NR_signalfd  280
+__SYSCALL(__NR_signalfd, sys_signalfd)
+#define __NR_signalfd_dequeue  281
+__SYSCALL(__NR_signalfd_dequeue, sys_signalfd_dequeue)
 
-#define __NR_syscall_max __NR_move_pages
+#define __NR_syscall_max __NR_signalfd_dequeue
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.20.ep2/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.20.ep2.orig/arch/x86_64/ia32/ia32entry.S  2007-03-07 
13:28:41.0 -0800
+++ linux-2.6.20.ep2/arch/x86_64/ia32/ia32entry.S   2007-03-07 
13:42:12.0 -0800
@@ -714,8 +714,11 @@
.quad compat_sys_get_robust_list
.quad sys_splice
.quad sys_sync_file_range
-   .quad sys_tee
+   .quad sys_tee   /* 315 */
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
.quad sys_getcpu
-ia32_syscall_end:  
+   .quad sys_epoll_pwait
+   .quad sys_signalfd  /* 320 */
+   .quad sys_signalfd_dequeue
+ia32_syscall_end:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] tcp_cubic: use 32 bit math

2007-03-07 Thread Stephen Hemminger

The basic calculation has to be done in 32 bits to avoid
doing 64 bit divide by 3. The value x is only 22bits max
so only need full 64 bits only for x^2.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

---
 net/ipv4/tcp_cubic.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- net-2.6.22.orig/net/ipv4/tcp_cubic.c2007-03-07 15:51:37.0 
-0800
+++ net-2.6.22/net/ipv4/tcp_cubic.c 2007-03-07 17:06:02.0 -0800
@@ -96,7 +96,7 @@
  */
 static u32 cubic_root(u64 a)
 {
-   u64 x;
+   u32 x;
 
/* Initial estimate is based on:
 * cbrt(x) = exp(log(x) / 3)
@@ -104,9 +104,9 @@
x = 1u << (fls64(a)/3);
 
/* converges to 32 bits in 3 iterations */
-   x = (2 * x + div64_64(a, x*x)) / 3;
-   x = (2 * x + div64_64(a, x*x)) / 3;
-   x = (2 * x + div64_64(a, x*x)) / 3;
+   x = (2 * x + (u32)div64_64(a, (u64)x*(u64)x)) / 3;
+   x = (2 * x + (u32)div64_64(a, (u64)x*(u64)x)) / 3;
+   x = (2 * x + (u32)div64_64(a, (u64)x*(u64)x)) / 3;
 
return x;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix get_unmapped_area and fsync for hugetlb shm segments

2007-03-07 Thread Eric W. Biederman

Bill Irwin <[EMAIL PROTECTED]> writes:

> On Wed, Mar 07, 2007 at 04:03:17PM -0700, Eric W. Biederman wrote:
>> I think the right answer is most likely to add an extra file method or
>> two so we can remove the need for is_file_hugepages.
>> There are still 4 calls to is_file_hugepages in ipc/shm.c and
>> 2 calls in mm/mmap.c not counting the one in is_file_shm_hugepages.
>> The special cases make it difficult to properly wrap hugetlbfs files
>> with another file, which is why we have the weird special case above.
>
> It's not clear to me that the core can be insulated from hugetlb's
> distinct pagecache and memory mapping granularities in a Linux-native
> manner, but if you come up with something new or manage to get the
> known methods past Linus, akpm, et al, more power to you.

I will agree with that there are limits on what can be achieved.
However looking at where we have tests for is_file_hugepages most of
those tests don't appear to be inherently anything to do with huge
pages, so it wouldn't surprise me if we couldn't generalize things a
little more.

> I'm not entirely sure what you're up to, but I'm mostly here to sanction
> others' design notions since my own are far too extreme, and, of course,
> review and ack patches, take bugreports and write fixes (not that I've
> managed to get to any of them first in a long while, if ever), and so on.
> I say killing the is_whatever_hugepages() checks with whatever abstraction
> is good, since I don't like them myself, provided it's sane. Go for it.

Mostly I had reference counting and consistency problems with
ipc/shm.c that had horrible leak potential when I exited a ipc
namespace.  Implementing everything as stacked files made the code
simpler and more maintainable. (shm_nattach stopped being a special
case yea!)

I'm happy to stop here but if someone cares to proceed with removing
is_file_hugepages I want to encourage that.  I don't see any other
cleanups short of that are really worth doing.

Everything in ipc/shm.c could be considered a weird special case, so
I'm not going to worry about it too much.  Although removing those
special cases is good.

There is some odd accounting logic in mm/mmap.c based on
is_file_hugepages and there is the get_unmapped_area case.  For
get_unmapped_area I see no reason to presume that the only kind of
file that must live at a specific address are huge pages (even if that
is the only kind of file where we have that case today).  So
generalizing that check should be relatively straight forward.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

2007-03-07 Thread Daniel Arai


Thomas Gleixner wrote:


You managed to avoid the usage of other code (i.e. PIT / HPET) already,
so why is it sooo desireable to emulate apics instead of substituting it
by a small and sane replacement ? Just because you happen to have an
LAPIC emulator ? That's no reason to wire yourself into the kernel code
and make it harder to change and maintain.


There are several reasons why it's desirable to emulate the APIC.  As you 
mentioned, we already have APIC emulation, and APIC emulation isn't a huge 
bottleneck on most workloads.  Our code works, the Linux code works, and 
replacing both pieces of code with something "small and sane" isn't going to 
improve performance very much, so why bother?  Any hypervisor implementation is 
going to be a tradeoff between what's easy to implement in the hypervisor, 
what's easy to implement in the guest operating system, and what's performance 
critical.


Secondly, not all (para-)virtualized operating systems will want to use 
abstracted devices.  Some virtual operating systems will be given direct access 
to hardware devices, and will need to run the actual driver for that device and 
not some abstracted device driver.  So I don't buy your argument that every 
piece of the kernel that interacts with a paravirtualized driver should have a 
"small and sane replacement."


But more importantly, we want a kernel that can run both on native hardware and 
in a paravirtualized environment.  Linux doesn't really provide abstractions for 
 replacing the appropriate code.  We tried to hook into the source code at a 
level that seemed possible.


For example, take smp_call_function().  What this essentially does is call 
send_IPI_allbutself().


void fastcall send_IPI_self(int vector)
{
__send_IPI_shortcut(APIC_DEST_SELF, vector);
}

void __send_IPI_shortcut(unsigned int shortcut, int vector)
{
/*
 * Subtle. In the case of the 'never do double writes' workaround
 * we have to lock out interrupts to be safe.  As we don't care
 * of the value read we use an atomic rmw access to avoid costly
 * cli/sti.  Otherwise we use an even cheaper single atomic write
 * to the APIC.
 */
unsigned int cfg;

/*
 * Wait for idle.
 */
apic_wait_icr_idle();

/*
 * No need to touch the target chip field
 */
cfg = __prepare_ICR(shortcut, vector);

/*
 * Send the IPI. The write to APIC_ICR fires this off.
 */
apic_write_around(APIC_ICR, cfg);
}


There's no good way to override __send_IPI_shortcut.  I suppose we could add 
paravirt ops for __send_IPI_shortcut and every other op that touches the APIC. 
But there are dozens of functions in apic.c that would need to be included in 
paravirt ops.  And for our implementation, we really just want to override 
apic_read and apic_write, since we can make these faster when done through 
hypercalls than through memory accesses.  If we were to make these paravirt ops, 
their implementations would be the same, except with a different apic_read and 
apic_write.  This is a whole lot of useless code duplication.


Most of the interrupt system is not written in such a way that multiple APICs 
implementations can be selected from at boot time.  This is an absolute 
requirement so that the same kernel can boot on native and in a paravirtualized 
environment.  While this could be implemented, it seems like a waste of time, 
since we can just emulate something similar to a real interrupt system and not 
change things very much.


Dan Arai
VMware, Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Paul Menage


On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:

But "namespace" has well-established historical semantics too - a way
of changing the mappings of local * to global objects. This
accurately describes things liek resource controllers, cpusets, resource
monitoring, etc.


Sorry, I think this statement is wrong, by the generally established
meaning of the term namespace in computer science.



Trying to extend the well-known term namespace to refer to things that
are semantically equivalent namespaces is a useful approach, IMHO.



Yes, that would be true. But the kinds of groupings that we're talking
about are supersets of namespaces, not semantically equivalent to
them. To use Eric's "shoe" analogy from earlier, it's like insisting
that we use the term "sneaker" to refer to all footware, including ski
boots and birkenstocks ...

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v2] epoll use a single inode ...

2007-03-07 Thread Sami Farin

On Tue, Mar 06, 2007 at 21:20:33 +0100, Eric Dumazet wrote:
...
> I rewrote the reciprocal_div() for i386 so that one multiply is used.
> 
> static inline u32 reciprocal_divide(u32 A, u32 R)
> {
> #if __i386
> unsigned int edx, eax;
> asm("mul %2":"=a" (eax), "=d" (edx):"rm" (R), "0" (A));
   ^^^

mul does not work if R is memory operand.
mull should be used instead.

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

the usage of DEBUG_DRIVER seems ambiguous

2007-03-07 Thread Robert P. J. Day


  the usage of the DEBUG_DRIVER preprocessor variable is a big
confusing:

$ $ grep -rw DEBUG_DRIVER *
drivers/net/sunlance.c:#undef DEBUG_DRIVER
drivers/net/a2065.c:#ifdef DEBUG_DRIVER
drivers/net/a2065.c:#ifdef DEBUG_DRIVER
drivers/net/7990.c:#ifdef DEBUG_DRIVER
drivers/net/7990.c:#ifdef DEBUG_DRIVER
drivers/base/Kconfig:config DEBUG_DRIVER
...

  it's clearly a configuration variable, but it's also being used by
itself in a few drivers/net/ source files.  is that deliberate?

rday

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: system call time increase when turning on CONFIG_PARAVIRT

2007-03-07 Thread Tim Chen

On Fri, 2007-03-02 at 16:16 -0800, Jeremy Fitzhardinge wrote:

> 
> Yes, the intent is that running a CONFIG_PARAVIRT kernel on native
> hardware will have negligible performance hit compared to running a
> non-paravirt kernel.
> 
> J

It turned out that VDSO was turned off by CONFIG_PARAVIRT option,
causing the system call to use inefficient int 0x80 which led to the
increase system_call time I was seeing.  I noted that Ingo has caught
this problem and proposed a patch to correct this issue in another mail
thread.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: system call time increase when turning on CONFIG_PARAVIRT

2007-03-07 Thread Jeremy Fitzhardinge

Tim Chen wrote:
> It turned out that VDSO was turned off by CONFIG_PARAVIRT option,
> causing the system call to use inefficient int 0x80 which led to the
> increase system_call time I was seeing.  I noted that Ingo has caught
> this problem and proposed a patch to correct this issue in another mail
> thread.

Thanks for identifying this.  We'll be posting a more general fix for
COMPAT_VDSO soon which will address this.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Sam Vilain

Paul Menage wrote:
> But "namespace" has well-established historical semantics too - a way
> of changing the mappings of local names to global objects. This
> doesn't describe things liek resource controllers, cpusets, resource
> monitoring, etc.
>
> Trying to extend the well-known term namespace to refer to things that
> aren't namespaces isn't a useful approach, IMO.
>
> Paul
>   

But "namespace" has well-established historical semantics too - a way
of changing the mappings of local * to global objects. This
accurately describes things liek resource controllers, cpusets, resource
monitoring, etc.

Trying to extend the well-known term namespace to refer to things that
are semantically equivalent namespaces is a useful approach, IMHO.

Sam.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cifs: remove useless cargo-cult checks

2007-03-07 Thread Steve French

Christoph Hellwig <[EMAIL PROTECTED]> wrote on 03/07/2007 04:17:46 PM:
> On Wed, Mar 07, 2007 at 12:51:04PM -0600, Steven French wrote:
> > Is there an easy way to mirror particular patches going into the
> > cifs-2.6.git tree (which is pulled into mm) to lkml?
>  
> Maybe some git expert can comment on that.
What I would be looking for is a way via e.g. "git commit" (to my 
project tree on kernel.org)
to pass it an option to send a copy of the patch to lkml or some list 
(or perhaps the reverse,
set a flag that says don't bother mirroring patch for review to fsdevel 
or lkml).  With Samba,

some people just watch all commits, but for the kernel that is way too many.

> > The cifs patches go in mm for at least a week before they go into 
kernel

> > but some of them I would like to post again to lkml.
>  
> polling -mm is a little hard as it's an enormous blob, so posting to

> lkml or -fsdevel would definitively be quite helpfull.

Yes agreed (watching fsdevel is easier than scanning every new -mm 
patch) - but I would rather not
bore people, and make them waste time on fsdevel or lkml looking at 
every single cifs patch.  Only about
three of the past 10 cifs patches were interesting enough to ask for 
detailed review (and I would
have loved an easier way to get review on those - as I would love to get 
more review of

Q's interesting DFS patch - but it is hard in practice to make this easy).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Sam Vilain

Srivatsa Vaddagiri wrote:
> container structure in your patches provides for these things:
>
> a.  A way to group tasks
> b.  A way to maintain several hierarchies of such groups
>
> If you consider just a. then I agree that container abstraction is
> redundant, esp for vserver resource control (nsproxy can already be used
> to group tasks).
>
> What nsproxy doesn't provide is b - a way to represent hierarchies of
> groups. 
>   

Well, that's like saying you can't put hierarchical data in a relational
database.

The hierarchy question is an interesting one, though. However I believe
it first needs to be broken down into subsystems and considered on a
subsystem-by-subsystem basis again, and if general patterns are
observed, then a common solution should stand out.

Let's go back to the namespaces we know about and discuss how
hierarchies apply to them. Please those able to brainstorm, do so - I
call green hat time.

1. UTS namespaces

Can a UTS namespace set any value it likes?

Can you inspect or set the UTS namespace values of a subservient UTS
namespace?

2. IPC namespaces

Can a process in an IPC namespace send a signal to those in a
subservient namespace?

3. PID namespaces

Can a process in a PID namespace see the processes in a subservient
namespace?

Do the processes in a subservient namespace appear in a higher level
namespace mapped to a different set of PIDs?

4. Filesystem namespaces

Can we see all of the mounts in a subservient namespace?

Does our namespace receive updates when their namespace mounts change?
(perhaps under a sub-directory)

5. L2 network namespaces

Can we see or alter the subservient network namespace's
interfaces/iptables/routing?

Are any of the subservient network namespace's interfaces visible in our
namespace, and by which mapping?

6. L3 network namespaces

Can we bind to a subservient network namespace's addresses?

Can we give or remove addresses to and from the subservient network
namespace's namespace?

Can we allow the namespace access to modify particular IP tables?

7. resource namespaces

Is the subservient namespace's resource usage counting against ours too?

Can we dynamically alter the subservient namespace's resource allocations?

8. anyone else?

So, we can see some general trends here - but it's never quite the same
question, and I think the best answers will come from a tailored
approach for each subsystem.

Each one *does* have some common questions - for instance, "is the
namespace allowed to create more namespaces of this type". That's
probably a capability bit for each, though.

So let's bring this back to your patches. If they are providing
visibility of ns_proxy, then it should be called namesfs or some such.
It doesn't really matter if processes disappear from namespace
aggregates, because that's what's really happening anyway. The only
problem is that if you try to freeze a namespace that has visibility of
things at this level, you might not be able to reconstruct the
filesystem in the same way. This may or may not be considered a problem,
but open filehandles and directory handles etc surviving a freeze/thaw
is part of what we're trying to achieve. Then again, perhaps some
visibility is better than none for the time being.

If they are restricted entirely to resource control, then don't use the
nsproxy directly - use the structure or structures which hang off the
nsproxy (or even task_struct) related to resource control.

Sam.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/6 -rt] powerpc 2.6.20-rt8: to convert spinlocks to raw ones.

2007-03-07 Thread hui

On Thu, Mar 08, 2007 at 08:30:43AM +1100, Paul Mackerras wrote:
> Sergei Shtylyov writes:
> 
> > I've floowed up to my patch with such explanation. In the context of 
> > an-rt 
> > patch itself, it was just too clear, hence I didn't go into explanations in 
> > the patch itself. :-)
> 
> Well, it might be clear, to you, now, with the context in your head.
> But if such a patch is to go into a git tree, and somebody comes along
> in 3 years time and wants to know exactly why you made that change
> (and maybe that somebody is you :), then they will need more detail -
> such as how you came to the conclusion that those locks and no others
> needed to be changed, for instance.
> 
> At least give some of the reasoning behind your choice of which locks
> to convert, so that in future, if the patch turns out to have
> introduced a bug somehow, the person debugging it can either identify
> that there was a flaw in your logic, or else understand something that
> you have seen that they missed.

Paul,

It has to do with how locking is done in the -rt patch itself. It's probably
before the time of general maintainers since the -rt patch hasn't been fully
merged, but I agree a document needs to be written outlining what needs to
be changed to spinlocks and what locks can be emulated with the rtmutex.c/rt.c
logic. There aren't that many people that know specifically unless they've
tried to map out chunks of the Linux kernel for this purpose in the first
place. I only know because of my own parallel effort to get the kernel to be
preemptive (the old mmLinux project that I abandoned for Ingo's stuff).

Generally, things that run within interrupt contexts need to be spinlocks.
The interrupt controller is one of those things obviously, the timer interrupt
for practical reasons such as performance and other places so that locking is
outside of direct control and scope of the scheduler. Of course the scheduler's
runqueues needs to be spinlocked for the reasons above otherwise your system
is stuck with a kind chicken and the egg problem interacting with the 
scheduler. 

The places that need to be reverted to raw spinlocks are generally either
acquired by function calls that allocate the spinlock at a terminal of the
kernel's lock graph or isolated from other callers completely (parts of the
timer for logic for instance). It's all about the collision of various lock
(preemptive and non-preemptive) subtrees and how to avoid scheduling within
atomic violations that lead to deadlocks. The -rt patch gets arbitrary
preemption abilities by shrinking the non-preemptive sub-tree bit to the bare
essentials of what will let a system to run yet still preserve all of
the expected locking semantics of a critical section.

Otherwise everything by default is backed by a blocking rtmutex identity
to provide for correct preemptivity behavior within critical sections. That
is why these reverts are needed to restore the mathematical correctness of
the kernel's locking structures.

I hope this is helpful.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] swsusp: Disable nonboot CPUs before entering platform suspend

2007-03-07 Thread Rafael J. Wysocki

On Thursday, 8 March 2007 01:20, Dave Jones wrote:
> On Thu, Mar 08, 2007 at 12:13:05AM +0100, Rafael J. Wysocki wrote:
> 
>  > > > Well, the WARN_ON() in 
> arch/x86_64/kernel/acpi/sleep.c:init_low_mapping()
>  > > > triggers every time an SMP x86_64 box is suspended to disk using the 
> platform
>  > > > mode (default), which is quite annoying IMHO and users think something 
> wrong is
>  > > > going on.  This will probably cause them to report the problem and I'd 
> rather
>  > > > like to avoid handling these reports. ;-)
>  > > 
>  > > Well sure - if patches were always error-free, we'd always apply them
>  > > immediately.
>  > > 
>  > > The question is: is the risk of this patch breaking things exceeded by 
> the
>  > > benefit which you describe?
>  > 
>  > Well, it has survived some testing (http://lkml.org/lkml/2007/3/7/16).  
> Also,
>  > before the code ordering in 2.6.21-rc* we had been running on one CPU
>  > here, so I think the risk is small.
>  > 
>  > We could remove the WARN_ON() as Pavel has just suggested, but first I'd 
> like
>  > to know who put it there and why.
> 
> It was introduced as part of ..
> 
> commit 55b2355eefc2f160246226d4d69fed431173a4d5
> Author: Shaohua Li <[EMAIL PROTECTED]>
> Date:   Fri Jun 23 02:04:49 2006 -0700
> 
> [PATCH] don't use flush_tlb_all in suspend time
> 
> flush_tlb_all uses on_each_cpu, which will disable/enable interrupt.
> In suspend/resume time, this will make interrupt wrongly enabled.

Ah, thanks.

So the question is what can go wrong if we ignore the TLBs of the other
CPUs that may be on-line when init_low_mapping() is executed.

Frankly, I don't know.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Paul Menage

On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:

Paul Menage wrote:
>> In the namespace world when we say container we mean roughly at the level
>> of nsproxy and container_group.
>>
> So you're saying that a task can only be in a single system-wide container.
>

Nope, we didn't make the mistake of nailing down what a "container" was
too far before it is implemented.  We talked before about
containers-within-containers because, inevitably if you provide a
feature you'll end up having to deal with virtualising systems that in
turn use that feature.

Sure, my aproach allows containers hierarchically as children of other
containers too.

> My patch provides multiple potentially-independent ways of dividing up
> the tasks on the system - if the "container" is the set of all
> divisions that the process is in, what's an appropriate term for the
> sub-units?
>

namespace, since 2.4.x

> That assumes the viewpoint that your terminology is "correct" and
> other people's needs "fixing". :-)
>

Absolutely.  Please respect the semantics established so far; changing
them adds nothing at the cost of much confusion.

But "namespace" has well-established historical semantics too - a way
of changing the mappings of local names to global objects. This
doesn't describe things liek resource controllers, cpusets, resource
monitoring, etc.

Trying to extend the well-known term namespace to refer to things that
aren't namespaces isn't a useful approach, IMO.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Sleeping thread not receive signal until it wakes up

2007-03-07 Thread Luong Ngo

On 3/7/07, Lee Revell <[EMAIL PROTECTED]> wrote:

On 3/7/07, linux-os (Dick Johnson) <[EMAIL PROTECTED]> wrote:
> Interruptible_sleep_on is interruptible, but for your task to
> actually be awakened and your alarm handler to get some CPU,
> it needs to be scheduled. If the BKL (big kernel lock) is
> held, it won't be scheduled until it is released.

You can schedule while holding the BKL and it will be dropped and reacquired.

Lee

My hardware is PowerPC architecture. Does it have any thing to do with
the kernel locking? Also, I saw CONFIG_LOCK_KERNEL,
CONFIG_PREEMPT_BKL ans CONFIG_SMP in the file
include/linux/smp_lock.h, or CONFIG_PREEMPT  in lib/kernel_lock.c and
I don't have any of these macro defined, would that be the reason. I
could not find where these option when running make menuconfig either.

Thanks,
LNgo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

2007-03-07 Thread Jeremy Fitzhardinge

Thomas Gleixner wrote:
> Sigh. The cut zero hairball is already in mainline. :(
>   

Yes, there were a couple of unfortunate patches in that series, but they
got fast-tracked in with the promise they would get fixed asap.

> Sure. If the clockevent API is changed, then the users get fixed. This
> is not my main concern. The "oh we reuse the PIT interrupt" reachout is
> what makes life hard. VMI does this already extensive and I'm frightened
> by it.
>   

Well, I think they know what's expected of them now.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Bad regression v 2.6.19 from the ATA ACPI merge

2007-03-07 Thread Alan Cox

Every single non-PCI controller has been broken by this code.

pata_get_dev_handle() assumes that the passed ata_port is PCI. The
libata-core code does not do any checking. This causes everyone to
experience oopses with pata_pcmcia for example.

Multiple examples of the bug in our FC7test tree reports from end users
trying the new libata and kernels.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: e1000 oops on boot [Re: 2.6.21-rc2-mm2]

2007-03-07 Thread Kok, Auke


Randy Dunlap wrote:

On Wed, 7 Mar 2007 16:23:15 -0800 Andrew Morton wrote:


The below will apppear in -rc3-mm1 (hopefully later today) and it will
hopefully fix that crash.


From: Auke Kok <[EMAIL PROTECTED]>

---

 drivers/net/e1000/e1000_main.c |   66 +--
 1 files changed, 45 insertions(+), 21 deletions(-)

diff -puN 
drivers/net/e1000/e1000_main.c~e1000-fix-be-ready-for-incoming-irq-at-pci_request_irq
 drivers/net/e1000/e1000_main.c
--- 
a/drivers/net/e1000/e1000_main.c~e1000-fix-be-ready-for-incoming-irq-at-pci_request_irq
+++ a/drivers/net/e1000/e1000_main.c
@@ -522,14 +522,15 @@ e1000_release_manageability(struct e1000
}
 }


Auke:

Below, please s/@adapter =/@adapter:/ to make it be correct
kernel-doc notation.


ah, sorry about that :)

I'll adjust it later when doing some more cleanups. Thanks for the pointer.

Auke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Sam Vilain

Paul Menage wrote:
>> In the namespace world when we say container we mean roughly at the level
>> of nsproxy and container_group.
>> 
> So you're saying that a task can only be in a single system-wide container.
>   

Nope, we didn't make the mistake of nailing down what a "container" was
too far before it is implemented.  We talked before about
containers-within-containers because, inevitably if you provide a
feature you'll end up having to deal with virtualising systems that in
turn use that feature.

> My patch provides multiple potentially-independent ways of dividing up
> the tasks on the system - if the "container" is the set of all
> divisions that the process is in, what's an appropriate term for the
> sub-units?
>   

namespace, since 2.4.x

> That assumes the viewpoint that your terminology is "correct" and
> other people's needs "fixing". :-)
>   

Absolutely.  Please respect the semantics established so far; changing
them adds nothing at the cost of much confusion.

> But as I've said I'm not particularly wedded to the term "container"
> if that really turned out to be what's blocking acceptance from people
> like Andrew or Linus. Do you have a suggestion for a better name? To
> me, "process container" seems like the ideal name, since it's an
> abstraction that "contains" processes and associates them with some
> (subsystem-provided) state.
>   

It's not even really the term, it's the semantics.

Sam.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: e1000 oops on boot [Re: 2.6.21-rc2-mm2]

2007-03-07 Thread Randy Dunlap

On Wed, 7 Mar 2007 16:23:15 -0800 Andrew Morton wrote:

> The below will apppear in -rc3-mm1 (hopefully later today) and it will
> hopefully fix that crash.
> 
> 
> From: Auke Kok <[EMAIL PROTECTED]>
> 
> ---
> 
>  drivers/net/e1000/e1000_main.c |   66 +--
>  1 files changed, 45 insertions(+), 21 deletions(-)
> 
> diff -puN 
> drivers/net/e1000/e1000_main.c~e1000-fix-be-ready-for-incoming-irq-at-pci_request_irq
>  drivers/net/e1000/e1000_main.c
> --- 
> a/drivers/net/e1000/e1000_main.c~e1000-fix-be-ready-for-incoming-irq-at-pci_request_irq
> +++ a/drivers/net/e1000/e1000_main.c
> @@ -522,14 +522,15 @@ e1000_release_manageability(struct e1000
>   }
>  }

Auke:

Below, please s/@adapter =/@adapter:/ to make it be correct
kernel-doc notation.

> -int
> -e1000_up(struct e1000_adapter *adapter)
> +/**
> + * e1000_configure - configure the hardware for RX and TX
> + * @adapter = private board structure
> + **/
> +static void e1000_configure(struct e1000_adapter *adapter)
>  {


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

2007-03-07 Thread Thomas Gleixner

On Wed, 2007-03-07 at 15:33 -0800, Jeremy Fitzhardinge wrote:
> > On the other hand we yet see things like:
> >
> > /* We use normal irq0 handler on cpu0. */
> > time_init_hook();
> >
> > Which is just reaching into the kernel code directly and does not handle
> > the clock event interrupt self contained. clockevents is not bound to
> > IRQ0 and this kind of hackery is exactly what we need to avoid in order
> > to get this maintainable.
> >   
> 
> Yes, I'm definitely not arguing with you about this.  I think the first
> cut vmi time code was pretty questionable, but I have confidence they'll
> fix it up before submission.

Sigh. The cut zero hairball is already in mainline. :(

> The point is that when you put the xen and vmi implementations next to
> each other you find that 1) in each case there's a pretty small
> abstraction distance between the clock interface and the hypercall
> interface, and 2) there's very little code which can be shared between
> the two.  Which means that adding another layer of abstraction to
> protect the clock code from paravirtualized time devices is just going
> to add fat without much benefit.

Fair enough.

> > Yes, if they are used in a sane and self contained way without reaching
> > all over the place and expecting that those functions, which are not
> > part of the paravirt interfaces will work for ever.
> >   
> 
> 100% agree.  If the interfaces change, then we'll change the code using
> them like any other kernel code would.  If the new interfaces are hard
> to make work then that's a problem, but one would hope that would get
> shaken out as part of the normal kernel development process.

Sure. If the clockevent API is changed, then the users get fixed. This
is not my main concern. The "oh we reuse the PIT interrupt" reachout is
what makes life hard. VMI does this already extensive and I'm frightened
by it.

> The point is that this code under and around the paravirt_ops interface
> is just normal Linux code, and we expect to participate in the normal
> kernel development process, with all the usual
> discussions/arguments/negotiations over interface changes.  If the code
> loses all its maintainers and becomes orphaned, unresponsive to
> interface changes, then it's like any other dead driver: mark it
> CONFIG_BROKEN and wait for someone to fix it.  But for now and the
> foreseeable future these are going to be actively supported and
> maintained pieces of code.

Ack.

> > You are not increasing the entanglement with the rest of the system,
> > when you use a self contained device on top of an existing core kernel
> > infrastructure, which has a paravirt backend. Quite the contrary, you
> > have one piece of virtual hardware which is connected to the kernel and
> > interacts with the various incarnations on the other side, which can as
> > well live inside the kernel code. Granted it is another level of
> > indirection, but I'd be happy to have only to deal with one of those
> > beasts.
> >   
> 
> Right.  But at that point the interface doesn't really have much of a
> technical basis.  It's really a political border at which you can hand
> off responsibility and make it ours.  I quite understand your
> motivation, but I think you're solving a problem that hasn't happened
> yet, and one that we'd all like to avoid.

Granted.

> I know the vmi time code has coloured your view here, but I surely hope
> it can be got into a better state before posting.  I'm biased of course,
> but I would rather hope that all these drivers we're talking about will
> be as stylistically clean as the Xen time code (which has room for
> improvement, of course).
> 
> There is, however, a median solution which keeps the number of clock
> drivers down but also doesn't involve extending pv_ops.  We can just
> create paravirt_clocksource/paravirt_clockevent helper wrappers, with
> their own internal interfaces to act as a facade for the
> hypervisor-specific code.  I don't think there's much point in doing
> this now, but maybe it will become appealing once we start dealing with
> things like stolen time.

We'll see.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Replace misspelled "PRINTK" with "CONFIG_PRINTK".

2007-03-07 Thread Robert P. J. Day

On Wed, 7 Mar 2007, Dave Jones wrote:

> On Wed, Mar 07, 2007 at 06:38:32PM -0500, Robert P. J. Day wrote:
>  >
>  >   Replace the apparently misspelled preprocessor variable "PRINTK"
>  > with "CONFIG_PRINTK".
>
> this looks wrong.
>
>  > diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
>  > index 5554ada..0c09772 100644
>  > --- a/drivers/md/bitmap.c
>  > +++ b/drivers/md/bitmap.c
>  > @@ -53,7 +53,7 @@
>  >  //#define DPRINTK PRINTK /* set this NULL to avoid verbose debug output */
>  >  #define DPRINTK(x...) do { } while(0)
>  >
>  > -#ifndef PRINTK
>  > +#ifndef CONFIG_PRINTK
>  >  #  if DEBUG > 0
>  >  #define PRINTK(x...) printk(KERN_DEBUG x)
>  >  #  else
>
> the intention here is to only define 'PRINTK' if no-one else
> has defined it already.

oops, sorry, i misread that.

rday
-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1418 matches

Mail list logo