Re: [linux-usb-devel] ThinkPad T41 - Strange USB 2.0 behaviour

2007-06-11 Thread Robert de Rooy

Alan Stern wrote:

On Tue, 12 Jun 2007, Robert de Rooy wrote:

  

Yes that works.
I tried to plug and unplug the device repeatedly and each time it came 
up in full-speed mode.



Good!  I'm glad that "companion" attribute file has come in handy for 
someone.  :-)


Alan Stern
  


Any way of passing this as a boot parameter? Because I also encountered 
this when trying to boot Linux from a USB CD-ROM drive. The BIOS part 
worked fine, but the moment Linux loaded USB support it failed, due to 
the same problem. This is also why I asked if this failure mode could be 
handled automatically.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_nv adma issues

2007-06-11 Thread Robert Hancock

Charles Shannon Hendrix wrote:


My system has issues running adma mode with sata_nv.  I have an nforce4 
motherboard.


What is the current status of this problem?


You'll have to be a bit more specific than that. A few problems have 
arisen in the past, but currently I don't think there is anything known 
outstanding other than one hotplug issue which currently is lacking in 
information to debug.




Is there any information I can provide to help debug it?

I gave up trying various fixes about 6 months ago, and put 
"sata_nv.adma=0" on the kernel command line in LILO, and that fixed the 
problem.


However, recently I changed distributions and went back to 2.6.20 
(kubuntu 7.04).


This kernel says that sata_nv.adma=0 is an invalid kernel option.

I'm pretty puzzled by that, because it is supposed to disable adma mode 
in the sata_nv driver.


/proc/cmdline says:

root= ro sata_nv.adma=0 quiet splash

..so it seems I did give the parameter properly.

Any ideas appreciated.


If sata_nv is built modular, then you may need to put:

options sata_nv adma=0

in /etc/modprobe.conf instead.

However I should point out that adma=0 is a poor workaround, it would be 
better to find the real cause of the problem.




Is there a better way to deal with this?

Also, one more: does it hurt to wait until the sata_nv driver fails a 
few times (at which point it stops bitching) and use the machine?  Once it
fails about 6 times, I no longer have any issues, and speed is still 
good enough to use until a real fix can be had.


Please post the dmesg output from when this happens. If it starts 
working after the kernel disables NCQ, then it might mean that your 
drive has some problems with NCQ..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] [PATCH] INPUT: Sanitize PIT locking in pcspkr

2007-06-11 Thread Dmitry Torokhov
On Monday 11 June 2007 20:27, Chris Wright wrote:
> * Thomas Gleixner ([EMAIL PROTECTED]) wrote:
> > The PC-speaker code has a quite creative method to serialize access to
> > the PIT: It uses a local lock.
> > 
> > On i386 and x86_64 the access to the PIT is serialized by a lock in the
> > architecture code. The separate locking in the PC-speaker code ignores
> > the global lock and creates a nasty race between the PC-speaker and the
> > PIT clock source / events code on SMP machines.
> > 
> > Use the global i8253_lock instead of the local i8253_beep_lock, when
> > compiled for i386/x86_64.
> 
> Seems this one got lost?
>

Yep... Here is the patch I'd like to get in 2.6.22 but as it touches
couple of arches I can't test on I am a bit hesitant to push it
through my tree. Thomas OKed it but nobody else responded.
 
-- 
Dmitry

Input: pcspkr - use proper lock

On i386 and x86_64 the access to the PIT is serialized by a lock
in the architecture code. The separate locking in the PC-speaker
code ignores the global lock and creates a nasty race between the
PC-speaker and the PIT clock source/events code on SMP machines.

To fix this we architecture code attaches proper lock to the
pcspkr platform device and the driver uses it instead of it's
own private lock.

Noticed by Thomas Gleixner <[EMAIL PROTECTED]>

Also resore uevent generation for pcspkr devices so that the
driver can be loaded automatically by udev.

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>
---

 arch/i386/kernel/pcspeaker.c   |   20 ---
 arch/alpha/kernel/setup.c  |9 +++-
 arch/i386/kernel/Makefile  |1 
 arch/i386/kernel/i8253.c   |   23 ++
 arch/mips/kernel/i8253.c   |9 +++-
 arch/powerpc/kernel/setup-common.c |9 +++-
 arch/x86_64/kernel/Makefile|2 -
 arch/x86_64/kernel/time.c  |   38 +
 drivers/input/misc/pcspkr.c|   10 +
 9 files changed, 83 insertions(+), 38 deletions(-)

Index: work/arch/alpha/kernel/setup.c
===
--- work.orig/arch/alpha/kernel/setup.c
+++ work/arch/alpha/kernel/setup.c
@@ -1491,6 +1491,8 @@ alpha_panic_event(struct notifier_block 
 return NOTIFY_DONE;
 }
 
+static DEFINE_SPINLOCK(i8253_lock);
+
 static __init int add_pcspkr(void)
 {
struct platform_device *pd;
@@ -1500,9 +1502,14 @@ static __init int add_pcspkr(void)
if (!pd)
return -ENOMEM;
 
+   pd->dev.platform_data = _lock;
+   pd->dev.uevent_suppress = 0;
+
ret = platform_device_add(pd);
-   if (ret)
+   if (ret) {
+   pd->dev.platform_data = NULL;   /* so we don't try to free it */
platform_device_put(pd);
+   }
 
return ret;
 }
Index: work/arch/mips/kernel/i8253.c
===
--- work.orig/arch/mips/kernel/i8253.c
+++ work/arch/mips/kernel/i8253.c
@@ -10,6 +10,8 @@
 
 #include 
 
+static DEFINE_SPINLOCK(i8253_lock);
+
 static __init int add_pcspkr(void)
 {
struct platform_device *pd;
@@ -19,9 +21,14 @@ static __init int add_pcspkr(void)
if (!pd)
return -ENOMEM;
 
+   pd->dev.platform_data = _lock;
+   pd->dev.uevent_suppress = 0;
+
ret = platform_device_add(pd);
-   if (ret)
+   if (ret) {
+   pd->dev.platform_data = NULL;   /* so we don't try to free it */
platform_device_put(pd);
+   }
 
return ret;
 }
Index: work/arch/x86_64/kernel/time.c
===
--- work.orig/arch/x86_64/kernel/time.c
+++ work/arch/x86_64/kernel/time.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -185,7 +186,7 @@ void main_timer_handler(void)
set_rtc_mmss(xtime.tv_sec);
rtc_update = xtime.tv_sec + 660;
}
- 
+
write_sequnlock(_lock);
 }
 
@@ -226,7 +227,7 @@ static unsigned long get_cmos_time(void)
/*
 * We know that x86-64 always uses BCD format, no need to check the
 * config register.
-*/
+*/
 
BCD_TO_BIN(sec);
BCD_TO_BIN(min);
@@ -239,11 +240,11 @@ static unsigned long get_cmos_time(void)
BCD_TO_BIN(century);
year += century * 100;
printk(KERN_INFO "Extended CMOS year: %d\n", century * 100);
-   } else { 
+   } else {
/*
 * x86-64 systems only exists since 2002.
 * This will work up to Dec 31, 2100
-*/
+*/
year += 2000;
}
 
@@ -321,7 +322,7 @@ static unsigned int __init pit_calibrate
end = get_cycles_sync();
 
spin_unlock_irqrestore(_lock, flags);
-   
+
return (end - start) / 50;
 }
 
@@ -366,7 +367,7 @@ 

Re: [PATCH 1/1] UML: fix missing non-blocking I/O, now DEBUG_SHIRQ works

2007-06-11 Thread Eduard-Gabriel Munteanu

*This message was transferred with a trial version of CommuniGate(r) Pro*
Jeff Dike wrote:

No it won't.  UML builds without warnings here on x86_64.


Okay, I don't have an x86_64, sparc64 or something similar, as my 
computer is an x86, so I can't contradict this. If everything is fine on 
such arches, no fix is needed when nothing's broken... though I still 
think it's kind of ugly (though it's somehow ingenious as a hack) :)


Do you want me to resubmit the patch without these changes? I'll be back 
in a few hours (got to go now) and trim this out from the patch if 
that's necessary.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/6] Add group fairness to CFS - v1

2007-06-11 Thread Srivatsa Vaddagiri
On Mon, Jun 11, 2007 at 09:37:35PM +0200, Ingo Molnar wrote:
> > Patches 1-3 introduce the essential changes in CFS core to support 
> > this concept. They rework existing code w/o any (intended!) change in 
> > functionality.
> 
> i currently have these 3 patches applied to the CFS queue and it's 
> looking pretty good so far! If it continues to be problem-free i'll 
> release them as part of -v17, just to check that they truly have no bad 
> side-effects (they shouldnt). Then #4 can go into -v18.

ok. I am also most concerned about not upsetting current performance of
CFS when CONFIG_FAIR_GROUP_SCHED is turned off. Staging these patches in
incremental versions of CFS is a good idea to test that.

> i've attached my current -v17 tree - it should apply mostly cleanly 
> ontop of the -mm queue (with a minor number of fixups). Could you 
> refactor the remaining 3 patches ontop of this base? There's some 
> rejects in the last 3 patches due to the update_load_fair() change.

sure, i will rework them on this -v17 snapshot.

> > Patch 4 fixes some bad interaction between SCHED_RT and SCHED_NORMAL
> > tasks in current CFS.
> 
> btw., the plan here is to turn off 'bit 0' in sched_features: i.e. to 
> use the precise statistics to calculate lrq->cpu_load[], not the 
> timer-irq-sampled imprecise statistics. Dmitry has fixed a couple of 
> bugs in it that made it not work too well in previous CFS versions, but 
> now we are ready to turn it on for -v17. (indeed in my tree it's already 
> turned on - i.e. sched_features defaults to '14')

On Mon, Jun 11, 2007 at 09:39:31PM +0200, Ingo Molnar wrote:
> i mean bit 6, value 64. I flipped around its meaning in -v17-rc4, so the
> new precise stats code there is now default-enabled - making SMP
> load-balancing more accurate.

I must be missing something here. AFAICS, cpu_load calculation still is
timer-interrupt driven in the -v17 snapshot you sent me. Besides, there
is no change in default value of bit 6 b/n v16 and v17:

-unsigned int sysctl_sched_features __read_mostly = 1 | 2 | 4 | 8 | 0 | 0;
+unsigned int sysctl_sched_features __read_mostly = 0 | 2 | 4 | 8 | 0 | 0;

So where's this precise stats based calculation of cpu_load?

Anyway, do you agree that splitting the cpu_load/nr_running fields so that:

rq->nr_running= total count of -all- tasks in runqueue
rq->raw_weighted_load = total weight of -all- tasks in runqueue
rq->lrq.nr_running= total count of SCHED_NORMAL/BATCH tasks in runqueue
rq->lrq.raw_weighted_load = total weight of SCHED_NORMAL/BATCH tasks in runqueue

is a good thing to avoid SCHED_RT<->SCHED_NORMAL/BATCH mixup (as accomplished 
in Patch #4)?

If you don't agree, then I will make this split dependent on
CONFIG_FAIR_GROUP_SCHED 

> > Patch 5 introduces basic changes in CFS core to support group 
> > fairness.
> >
> > Patch 6 hooks up scheduler with container patches in mm (as an 
> > interface for task-grouping functionality).

Just to be clear, by container patches, I am referring to "process" container
patches from Paul Menage [1]. They aren't necessarily tied to
"virtualization-related" container support in -mm tree, although I
believe that "virtualization-related" container patches will make use of the 
same "process-related" container patches for their task-grouping requirements. 
Phew ..we need better names!

> ok. Kirill, how do you like Srivatsa's current approach? Would be nice 
> to kill two birds with the same stone, if possible :-)

One thing the current patches don't support is the notion of virtual
cpus (which Kirill and other "virtualization-related" container folks would 
perhaps want). IMHO, the current patches can still be usefull for
containers to load balance between those virtual cpus (as and when it is 
introduced).

> you'll get the best hackbench results by using SCHED_BATCH:
> 
>chrt -b 0 ./hackbench 10

thanks for this tip. Will try out and let you know how it fares for me.

> or indeed increasing the runtime_limit would work too.


References:

1.  https://lists.linux-foundation.org/pipermail/containers/2007-May/005261.html

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc[23]: blinking capslock led, stuck keys?

2007-06-11 Thread Dmitry Torokhov
On Monday 04 June 2007 16:57, Pavel Machek wrote:
> On Mon 2007-06-04 13:46:45, Dmitry Torokhov wrote:
> > On 6/4/07, Henrique de Moraes Holschuh <[EMAIL PROTECTED]> wrote:
> > >On Mon, 04 Jun 2007, Dmitry Torokhov wrote:
> > >> >...but I'm not quite sure it is a buggy keyboard. It happens _way_ too
> > >> >often. Launch the line above and try to do some typing...
> > >>
> > >> This used to work fine on my box last time I tried it (the switch
> > >> itself is offloaded to a keventd and shoud not get in the way) but
> > >> then they push all kind of ACPI/SMM crap together with KBC so who
> > >> knows... I should try it again when I get home.
> > >
> > >Err... in laptops, almost *always* the KDC is emulated by the embedded
> > >controller, so I bet you're right on the money, there.  It is not "a buggy
> > >KDC", it is a buggy EC firmware and/or buggy SMBIOS which is a lot more
> > >common.
> > >
> > >And DoS'ing the EC is very high on the Don't Do That list on a laptop.  If
> > >the X60 is only losing keypresses and producing no bigger fireworks, that's
> > >outstanding behavior (as far as I trust ThinkPad firmware, anyway).
> > >
> > >So please throttle anything that might access the KDC way too much (as
> > >compared to normal keyboard operation by an user).
> > 
> > What would be reasonable throttling? Once every 100 ms?
> 
> Well... this thread began with me having problems with leds blinking
> once per ten seconds. I do not think throttling is going to help.

For what it worth I finally tried that setleds loop on my laptop. I am
not getting any lost keypresses/releases. But then I don't have EC
(or at least it is not exported via ACPI). This is an old Dell notebook.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Input: Support for a less exclusive grab.

2007-06-11 Thread Zephaniah E. Hull
On Tue, Jun 12, 2007 at 01:35:05AM -0400, Dmitry Torokhov wrote:
> On Tuesday 12 June 2007 01:23, Zephaniah E. Hull wrote:
> > On Tue, Jun 12, 2007 at 01:19:59AM -0400, Dmitry Torokhov wrote:
> > > 
> > > Like I said I would love if xf86-input-evdev did not grab the
> > > device at all.
> > 
> > We have to disable the legacy input handlers somehow, not doing so
> > simply isn't an option.
> 
> I do not follow. If user's xorg.conf does not use /dev/input/mice and
> does not use "kbd" driver then grabbing is not required, is it? Now,
> as far as I understand, lack of hotplug support in X is the main
> obstacle for removing "mouse" and "kbd" drivers, correct?

Sadly, not quite.

The problem is that if the user is not using the mouse and kbd drivers
at all, but is instead using xf86-input-evdev, and no grabbing is done,
then all key presses end up going to the console.

Consider the effects of this when using things like alt-f1 or ctrl-c in
a program in X.

We have to keep the console itself from getting the events in question,
which means either unbinding the kbd interface, or some other sort of
grab, otherwise xf86-input-evdev is completely unusable for keyboards.

Grab support was my initial approach to the problem, in hindsight it
wasn't the right one, but it worked, and it's still needed for the
multi-seat people.
>  
> > > 
> > > But rfkill-input is not a legacy handler. My objection is that with your
> > > solution you still will rob handlers such rfkill-input of events.
> > 
> > Urgh.
> > 
> > So, any thoughts on how to identify legacy input handlers in the input
> > system?
> 
> I guess keyboard and mousedev will have to be flagged as such in kernel.

Ugly, but it works.
> 
> -- 
> Dmitry
> 

-- 
  1024D/E65A7801 Zephaniah E. Hull <[EMAIL PROTECTED]>
   92ED 94E4 B1E6 3624 226D  5727 4453 008B E65A 7801
CCs of replies from mailing lists are requested.

"Microsoft is a cross between the Borg and the Ferengi.  Unfortunately,
they use Borg to do their marketing and Ferengi to do their
programming."
  -- Simon Slavin in asr


signature.asc
Description: Digital signature


Re: [PATCH] Input: Support for a less exclusive grab.

2007-06-11 Thread Dmitry Torokhov
On Tuesday 12 June 2007 01:23, Zephaniah E. Hull wrote:
> On Tue, Jun 12, 2007 at 01:19:59AM -0400, Dmitry Torokhov wrote:
> > 
> > Like I said I would love if xf86-input-evdev did not grab the
> > device at all.
> 
> We have to disable the legacy input handlers somehow, not doing so
> simply isn't an option.

I do not follow. If user's xorg.conf does not use /dev/input/mice and
does not use "kbd" driver then grabbing is not required, is it? Now,
as far as I understand, lack of hotplug support in X is the main
obstacle for removing "mouse" and "kbd" drivers, correct?
 
> > 
> > But rfkill-input is not a legacy handler. My objection is that with your
> > solution you still will rob handlers such rfkill-input of events.
> 
> Urgh.
> 
> So, any thoughts on how to identify legacy input handlers in the input
> system?

I guess keyboard and mousedev will have to be flagged as such in kernel.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [KJ PATCH] Replacing memcpy(dest,src,PAGE_SIZE) with copy_page(dest,src) in arch/i386/kernel/machine_kexec.c

2007-06-11 Thread Eric W. Biederman
Shani Moideen <[EMAIL PROTECTED]> writes:

> Hi,
> Replacing memcpy(dest,src,PAGE_SIZE) with copy_page(dest,src) in
> arch/i386/kernel/machine_kexec.c.

Please no.

People get creative in copy_page (especially mmx_copy_page),
and this code path need something simple and stupid, that
will work every time, especially when things are messed up
elsewhere.

Ideally we would actually do all of the setup before this point.
but that is another issue entirely.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix possible leakage of blocks in UDF (try 2)

2007-06-11 Thread Cyrill Gorcunov
[Jan Kara - Mon, Jun 11, 2007 at 12:49:11PM +0200]
|   Hi Andrew,
| 
|   attached is a new version of the patch fixing possible leakage of
[SNIP]

Thanks, Jan

Cyrill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] input: fix broken behaviour of Dell Latitude special keys

2007-06-11 Thread Dmitry Torokhov
Hi,

On Sunday 10 June 2007 13:42, Giel de Nijs wrote:
> Hi,
> 
> Following up on http://thread.gmane.org/gmane.linux.kernel.input/1375 here's
> a new patch to fix the fact that most Fn+F? special keys on (at least) the
> Dell Latitude laptops don't generate a key release event.

Thank you for the patch. Is there any way I could see data coming from i8042
when you press on these special keys? If you could do:

echo 1 > /sys/module/i8042/parameters/debug
tehn presses and released all these keys
echo 0 > /sys/module/i8042/parameters/debug

and send me dmesg I woudl appreciate that.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] input: make 2 macros in gameport.c TSC-aware

2007-06-11 Thread Dmitry Torokhov
Hi,

On Monday 11 June 2007 00:12, Miltiadis Margaronis wrote:
> 
>   This makes DELTA and GET_TIME in drivers/input/gameport/gameport.c
>   similar to the ones in drivers/input/joystick/analog.c . Worked on
>   2.6.22-rc4-git2.
> 

I was told with the introduction of tickless kernels and such the best
option is to convert gameport to use hires timers.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Input: Support for a less exclusive grab.

2007-06-11 Thread Zephaniah E. Hull
*googles briefly for rfkill-input, looks for his brown paper bag*

On Tue, Jun 12, 2007 at 01:19:59AM -0400, Dmitry Torokhov wrote:
> On Tuesday 12 June 2007 01:12, Zephaniah E. Hull wrote:
> > On Tue, Jun 12, 2007 at 01:07:13AM -0400, Dmitry Torokhov wrote:
> > > Hi Zephaniah,
> > > 
> > > On Saturday 09 June 2007 04:48, Zephaniah E. Hull wrote:
> > > > EVIOCGRAB is nice and very useful, however over time I've gotten
> > > > multiple requests to make it possible for applications to get events
> > > > straight from the event device while xf86-input-evdev is getting events
> > > > from the same device.
> > > > 
> > > > Here is the least invasive patch I could think of, it changes the
> > > > behavior of EVIOCGRAB in some cases, specificly behavior is identical if
> > > > the argument is 0 or 1, however if the argument is true and != 1, then
> > > > it does a 'non exclusive grab', a better name might be handy.
> > > > 
> > > > What this does is allow the events to go to everything that's using
> > > > evdev to get events, but grabs it from anything else.  About as close to
> > > > what people want as I can get, and fairly non-invasive.
> > > 
> > > Unfortunately this also robs non-legacy input handlers (such as
> > > rfkill-input) of input events. Does xf86-input-evdev really needs to
> > > grab devices exclusively? I guess we can't abandon the standard
> > > keyboard driver until X supports hotplugging. How close is it to
> > > support devices coming and going?
> > 
> > Er, to explain.
> > 
> > The current EVIOCGRAB does an exclusive grab that prohibits rfkill-input
> > and friends from working.
> > 
> 
> I understand that.
> 
> > As it is the only way to disable the legacy input handlers,
> > xf86-input-evdev has been using it since we added it.
> >
> 
> Like I said I would love if xf86-input-evdev did not grab the
> device at all.

We have to disable the legacy input handlers somehow, not doing so
simply isn't an option.
>  
> > The patch is to let us cause only things that use /dev/input/event to
> > get events, thus, a non-exclusive grab.
> > 
> > This basicly disables the legacy input handlers, and it's the least
> > invasive patch I could think of.
> > 
> 
> But rfkill-input is not a legacy handler. My objection is that with your
> solution you still will rob handlers such rfkill-input of events.

Urgh.

So, any thoughts on how to identify legacy input handlers in the input
system?

This is a tricky case I had not even been aware of.
> 
> > Going for a separate ioctl would also work, but in some ways it would
> > make supporting it more of a pain.
> > 
> > I don't care _that_ much either way, as long as we can get a way to
> > disable the legacy events while allowing other things to get the events
> > too.
> > 
> > Zephaniah E. Hull.
> > >  
> > > If we can't remain as is until X hotplug is ready then I'd rather had
> > > a separate ioctl that disables legacy input handlers (keyboard, mousedev)
> > > for a given input device.
> > > 
> > > -- 
> > > Dmitry
> > > 
> > 
> 
> -- 
> Dmitry
> 

-- 
  1024D/E65A7801 Zephaniah E. Hull <[EMAIL PROTECTED]>
   92ED 94E4 B1E6 3624 226D  5727 4453 008B E65A 7801
CCs of replies from mailing lists are requested.

>> kinds of numbers the tobacco industry wishes it had, and Dell is very
>> very happy with the results.
>
>Do they come with a Surgeon General warning on the box?

The new ones have "Designed for Windows XP". Yes.
  -- Satya, Paul Martin, and Derek Balling in the Scary Devil Monastery.


signature.asc
Description: Digital signature


Re: [PATCH] Input: Support for a less exclusive grab.

2007-06-11 Thread Dmitry Torokhov
On Tuesday 12 June 2007 01:12, Zephaniah E. Hull wrote:
> On Tue, Jun 12, 2007 at 01:07:13AM -0400, Dmitry Torokhov wrote:
> > Hi Zephaniah,
> > 
> > On Saturday 09 June 2007 04:48, Zephaniah E. Hull wrote:
> > > EVIOCGRAB is nice and very useful, however over time I've gotten
> > > multiple requests to make it possible for applications to get events
> > > straight from the event device while xf86-input-evdev is getting events
> > > from the same device.
> > > 
> > > Here is the least invasive patch I could think of, it changes the
> > > behavior of EVIOCGRAB in some cases, specificly behavior is identical if
> > > the argument is 0 or 1, however if the argument is true and != 1, then
> > > it does a 'non exclusive grab', a better name might be handy.
> > > 
> > > What this does is allow the events to go to everything that's using
> > > evdev to get events, but grabs it from anything else.  About as close to
> > > what people want as I can get, and fairly non-invasive.
> > 
> > Unfortunately this also robs non-legacy input handlers (such as
> > rfkill-input) of input events. Does xf86-input-evdev really needs to
> > grab devices exclusively? I guess we can't abandon the standard
> > keyboard driver until X supports hotplugging. How close is it to
> > support devices coming and going?
> 
> Er, to explain.
> 
> The current EVIOCGRAB does an exclusive grab that prohibits rfkill-input
> and friends from working.
> 

I understand that.

> As it is the only way to disable the legacy input handlers,
> xf86-input-evdev has been using it since we added it.
>

Like I said I would love if xf86-input-evdev did not grab the
device at all.
 
> The patch is to let us cause only things that use /dev/input/event to
> get events, thus, a non-exclusive grab.
> 
> This basicly disables the legacy input handlers, and it's the least
> invasive patch I could think of.
> 

But rfkill-input is not a legacy handler. My objection is that with your
solution you still will rob handlers such rfkill-input of events.

> Going for a separate ioctl would also work, but in some ways it would
> make supporting it more of a pain.
> 
> I don't care _that_ much either way, as long as we can get a way to
> disable the legacy events while allowing other things to get the events
> too.
> 
> Zephaniah E. Hull.
> >  
> > If we can't remain as is until X hotplug is ready then I'd rather had
> > a separate ioctl that disables legacy input handlers (keyboard, mousedev)
> > for a given input device.
> > 
> > -- 
> > Dmitry
> > 
> 

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] via c3/2.6.20 -- ps/2 keyboard doesn't work with console

2007-06-11 Thread Dmitry Torokhov
On Monday 11 June 2007 14:54, Paul Albrecht wrote:
> 
> Is i8042.noaux a workaround or a fix?
> 

Just a workaround. Do you have a PS/2 mouse you could test with? If so could
you check if both keybioard and mouse work with mouse plugged in and without
i8042.noaux.

Also cxould you please try the patch below (again with mouse if you have one
and without).

Thanks a lot!

-- 
Dmitry

Input: i8042 - give more trust PNP data on i386

On some boxes that don't have PS/2 mice connected at startup BIOS
completely disables AUX port and attempts to access it result in
hosed keyboard. Historically we do not trust ACPI/PNP data on
i386 and try to poke AUX port even if we did not find an active
PNP node for it. However in cases when BIOS writers got KBD port
properly described we can assume that they did the right thing
for AUX port as well.

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>
---
 drivers/input/serio/i8042-x86ia64io.h |   36 +++---
 1 file changed, 29 insertions(+), 7 deletions(-)

Index: linux/drivers/input/serio/i8042-x86ia64io.h
===
--- linux.orig/drivers/input/serio/i8042-x86ia64io.h
+++ linux/drivers/input/serio/i8042-x86ia64io.h
@@ -356,6 +356,7 @@ static void i8042_pnp_exit(void)
 static int __init i8042_pnp_init(void)
 {
char kbd_irq_str[4] = { 0 }, aux_irq_str[4] = { 0 };
+   int pnp_data_busted = 0;
int err;
 
if (i8042_nopnp) {
@@ -403,27 +404,48 @@ static int __init i8042_pnp_init(void)
 #endif
 
if (((i8042_pnp_data_reg & ~0xf) == (i8042_data_reg & ~0xf) &&
- i8042_pnp_data_reg != i8042_data_reg) || !i8042_pnp_data_reg) {
-   printk(KERN_WARNING "PNP: PS/2 controller has invalid data port 
%#x; using default %#x\n",
+ i8042_pnp_data_reg != i8042_data_reg) ||
+   !i8042_pnp_data_reg) {
+   printk(KERN_WARNING
+   "PNP: PS/2 controller has invalid data port %#x; "
+   "using default %#x\n",
i8042_pnp_data_reg, i8042_data_reg);
i8042_pnp_data_reg = i8042_data_reg;
+   pnp_data_busted = 1;
}
 
if (((i8042_pnp_command_reg & ~0xf) == (i8042_command_reg & ~0xf) &&
- i8042_pnp_command_reg != i8042_command_reg) || 
!i8042_pnp_command_reg) {
-   printk(KERN_WARNING "PNP: PS/2 controller has invalid command 
port %#x; using default %#x\n",
+ i8042_pnp_command_reg != i8042_command_reg) ||
+   !i8042_pnp_command_reg) {
+   printk(KERN_WARNING
+   "PNP: PS/2 controller has invalid command port %#x; "
+   "using default %#x\n",
i8042_pnp_command_reg, i8042_command_reg);
i8042_pnp_command_reg = i8042_command_reg;
+   pnp_data_busted = 1;
}
 
if (!i8042_nokbd && !i8042_pnp_kbd_irq) {
-   printk(KERN_WARNING "PNP: PS/2 controller doesn't have KBD irq; 
using default %d\n", i8042_kbd_irq);
+   printk(KERN_WARNING
+   "PNP: PS/2 controller doesn't have KBD irq; "
+   "using default %d\n", i8042_kbd_irq);
i8042_pnp_kbd_irq = i8042_kbd_irq;
+   pnp_data_busted = 1;
}
 
if (!i8042_noaux && !i8042_pnp_aux_irq) {
-   printk(KERN_WARNING "PNP: PS/2 controller doesn't have AUX irq; 
using default %d\n", i8042_aux_irq);
-   i8042_pnp_aux_irq = i8042_aux_irq;
+   if (!pnp_data_busted && i8042_pnp_kbd_irq) {
+   printk(KERN_WARNING
+   "PNP: PS/2 appears to have AUX port disabled, "
+   "if this is incorrect please boot with "
+   "i8042.nopnp\n");
+   i8042_noaux = 1;
+   } else {
+   printk(KERN_WARNING
+   "PNP: PS/2 controller doesn't have AUX irq; "
+   "using default %d\n", i8042_aux_irq);
+   i8042_pnp_aux_irq = i8042_aux_irq;
+   }
}
 
i8042_data_reg = i8042_pnp_data_reg;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Input: Support for a less exclusive grab.

2007-06-11 Thread Zephaniah E. Hull
On Tue, Jun 12, 2007 at 01:07:13AM -0400, Dmitry Torokhov wrote:
> Hi Zephaniah,
> 
> On Saturday 09 June 2007 04:48, Zephaniah E. Hull wrote:
> > EVIOCGRAB is nice and very useful, however over time I've gotten
> > multiple requests to make it possible for applications to get events
> > straight from the event device while xf86-input-evdev is getting events
> > from the same device.
> > 
> > Here is the least invasive patch I could think of, it changes the
> > behavior of EVIOCGRAB in some cases, specificly behavior is identical if
> > the argument is 0 or 1, however if the argument is true and != 1, then
> > it does a 'non exclusive grab', a better name might be handy.
> > 
> > What this does is allow the events to go to everything that's using
> > evdev to get events, but grabs it from anything else.  About as close to
> > what people want as I can get, and fairly non-invasive.
> 
> Unfortunately this also robs non-legacy input handlers (such as
> rfkill-input) of input events. Does xf86-input-evdev really needs to
> grab devices exclusively? I guess we can't abandon the standard
> keyboard driver until X supports hotplugging. How close is it to
> support devices coming and going?

Er, to explain.

The current EVIOCGRAB does an exclusive grab that prohibits rfkill-input
and friends from working.

As it is the only way to disable the legacy input handlers,
xf86-input-evdev has been using it since we added it.

The patch is to let us cause only things that use /dev/input/event to
get events, thus, a non-exclusive grab.

This basicly disables the legacy input handlers, and it's the least
invasive patch I could think of.

Going for a separate ioctl would also work, but in some ways it would
make supporting it more of a pain.

I don't care _that_ much either way, as long as we can get a way to
disable the legacy events while allowing other things to get the events
too.

Zephaniah E. Hull.
>  
> If we can't remain as is until X hotplug is ready then I'd rather had
> a separate ioctl that disables legacy input handlers (keyboard, mousedev)
> for a given input device.
> 
> -- 
> Dmitry
> 

-- 
  1024D/E65A7801 Zephaniah E. Hull <[EMAIL PROTECTED]>
   92ED 94E4 B1E6 3624 226D  5727 4453 008B E65A 7801
CCs of replies from mailing lists are requested.

Welcome to [telco] hell. [...] You are in a maze of twisty little PVC's,
all alike.  A switching engineer throws a dart at you!
-- Chris Saunderson <[EMAIL PROTECTED]> in the scary.devil.monastery


signature.asc
Description: Digital signature


Re: [PATCH] Input: Support for a less exclusive grab.

2007-06-11 Thread Dmitry Torokhov
Hi Zephaniah,

On Saturday 09 June 2007 04:48, Zephaniah E. Hull wrote:
> EVIOCGRAB is nice and very useful, however over time I've gotten
> multiple requests to make it possible for applications to get events
> straight from the event device while xf86-input-evdev is getting events
> from the same device.
> 
> Here is the least invasive patch I could think of, it changes the
> behavior of EVIOCGRAB in some cases, specificly behavior is identical if
> the argument is 0 or 1, however if the argument is true and != 1, then
> it does a 'non exclusive grab', a better name might be handy.
> 
> What this does is allow the events to go to everything that's using
> evdev to get events, but grabs it from anything else.  About as close to
> what people want as I can get, and fairly non-invasive.

Unfortunately this also robs non-legacy input handlers (such as
rfkill-input) of input events. Does xf86-input-evdev really needs to
grab devices exclusively? I guess we can't abandon the standard
keyboard driver until X supports hotplugging. How close is it to
support devices coming and going?
 
If we can't remain as is until X hotplug is ready then I'd rather had
a separate ioctl that disables legacy input handlers (keyboard, mousedev)
for a given input device.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread linux
> I got this code from Nettle, originally, and I never looked at the SHA-1 
> round structure very closely.  I'll give that approach a try.

Attached is some (tested, working, and public domain) assembly code for
three different sha_transform implementations.  Compared to C code, the
timings to hash 10 MiB on a 600 MHz PIII are:

  One: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 564819 us
 Four: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 391086 us
  Two: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 399134 us
Three: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 345986 us
 Five: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 301152 us

  One: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 558652 us
 Four: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 390980 us
  Two: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 407661 us
Three: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 412434 us
 Five: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 266809 us

  One: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 559053 us
 Four: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 396506 us
  Two: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 401661 us
Three: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 349668 us
 Five: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 265861 us

  One: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 556082 us
 Four: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 392967 us
  Two: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 406381 us
Three: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 338959 us
 Five: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 274712 us

Um.. some more runs, nice --19, that come out a bit more stable:
  One: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 552971 us
 Four: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 388167 us
  Two: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 398721 us
Three: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 337220 us
 Five: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 259790 us

  One: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 551240 us
 Four: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 387812 us
  Two: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 398519 us
Three: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 336903 us
 Five: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 260161 us

  One: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 551934 us
 Four: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 387639 us
  Two: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 398094 us
Three: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 335860 us
 Five: e02e77e3 bb7af36d bbf79d4a 46956044 2aaea172 -- 259805 us

This is hot-cache testing; I haven't got around to writing macro
tricks that exapnd to a megabyte of object code.  The challenge is to
purge not only the I- and D-caches, but also the branch predictor!


The names are the order they were written in.  "One" is the lib/sha1.c
code (547 bytes with -Os).  "Four" is a 5x unrolled C version (1106 bytes).

"Two" is a space-optimized ASM version, 266 bytes long.  "Three" is 5x
unrolled, 722 bytes long.  "Five" is a fully unrolled version, 3558
bytes long.

(Further space savings are possible, but it doesn't seem worth it.)

I have noticed that every caller of sha_transform in the kernel tree
allocates the W[] array on the stack, so we might as well do that inside
sha_transform.  The point of passing in the buffer is to amortize the
wiping afterwards, but see sha_stackwipe for ideas on how to do that.
(It can even be done mostly portably in C, given a good guess about the
C function's stack usage.)


I also noticed a glaring BUG in the folding at the end of extract_buf at
drivers/char/random.c:797.  That should be:

/*
 * In case the hash function has some recognizable
 * output pattern, we fold it in half.
 */

buf[0] ^= buf[4];
buf[1] ^= buf[3];
buf[2] ^= rol32(buf[2], 16);// <--- Bug was here
memcpy(out, buf, EXTRACT_SIZE);
memset(buf, 0, sizeof(buf));

if the code is to match the comment.



=== sha1asm.S ===
#define A %eax
#define B %ebx
#define C %ecx
#define D %edx
#define E %ebp
#define I %esi
#define T %edi

# f1(x,y,z) = bitwise x ? y : z = (z ^ (x & (y ^ z)))
#define F1(x,y,z,dest)  \
movlz,T;\
xorly,T;\
andlx,T;\
xorlz,T

# f2(x,y,z) = x ^ y ^ z
#define F2(x,y,z,dest)  \
movlz,T;\
xorlx,T;\
xorly,T

# f3(x,y,z) = majority(x,y,z) = ((x & z) + (y & (x ^ z)))
#define F3(x,y,z,dest)  \
movlz,T;\
andlx,T;\
addlT,dest; \
movlz,T;\
xorlx,T;\
andly,T

#define K1  0x5A827999  /* Rounds  0-19: sqrt(2) * 2^30 */
#define K2  0x6ED9EBA1  /* Rounds 20-39: sqrt(3) * 2^30 */
#define K3  0x8F1BBCDC  /* Rounds 40-59: sqrt(5) * 2^30 

Re: [shm][hugetlb] Fix get_policy for stacked shared memory files

2007-06-11 Thread William Lee Irwin III
On Mon, Jun 11, 2007 at 09:30:20PM -0700, Andrew Morton wrote:
> Can we just double-check the refcounting please?

The refcounting for mpol's doesn't look good in general. I'm more
curious as to what releases the refcounts. alloc_page_vma(), for
instance, does get_vma_policy() which eventually takes a reference,
without ever releasing the reference it acquires. get_vma_policy()
itself uses a similar idiom to that used in aglitke's patch. I think
mpol refcounting needs to be addressed elsewhere besides this patch.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] AHCI Link Power Management

2007-06-11 Thread Jeff Garzik
We will do AHCI link PM -- presuming that I can be convinced that it 
does not repeatedly park the hard drive heads, or something similarly 
annoying on PATA<->SATA bridges and similar setups.


IF it works as advertised -- a big if considering all the AHCI silicon 
implementations out there -- we definitely want to use it.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] AHCI Link Power Management

2007-06-11 Thread Tejun Heo
Arjan van de Ven wrote:
>>> The data we have from this patch is that it saves typically a Watt of
>>> power (depends on the machine of course, but the range is 0.5W to
>>> 1.5W). If you want to also have an even more agressive thing where
>>> you want to start disabling the entire controller... I don't see how
>>> this is in conflict with saving power on the link level by "just"
>>> enabling a hardware feature 
>>
>> Well, both implement about the same thing.  I prefer software
>> implementation because it's more generic and ALPE/ASP seems too
>> aggressive to me. 
> 
> Too aggressive in what way?

There are devices which lock up hard if PHY enters PS mode (only
physical power removal can reset it) and I wouldn't be surprised if some
devices aren't happy with PS being too aggressive.  Well, I actually
expect to see such devices.  It's ATA after all.  This is unknown
territory and that's why I was using 'seems ... to me'.

> There are tradeoffs on either side. Doing things in software is more
> work for the cpu, and depending on the implementation, will consume more
> power on the CPU side. (for example if you need regular timers that just
> consumes the power you are saving back up). The hardware can obviously
> switch very fast (because it's independent of any software), yet of
> course the software has higher level knowledge about how idle the link
> really is (like it knows if any files are open etc etc).
> 
> To be honest, I would be surprised if software could do significantly
> better than hardware though; it seems a simple problem: Idle -> go to
> low power, and estimating idle isn't all that hard on a link level...
> there's not all THAT much the kernel can estimate better I suspect.

I don't think the end result will vary in any significant way.  My
biggest argument for sw implementation is it can be used for other
controllers.

> This debate is very similar to the cpufreq debate from 4 years ago,
> where there were 3 levels: do it in the CPU, do it in the kernel or do
> it in userspace. All three are valid; whichever is best depends on the
> exact hardware that you have...
> (and you can argue that first everyone started in userspace, then the
> hardware improved that made a kernelspace implementation better
> (ondemand) and now Turbo Mode is more or less moving this to the
> hardware... I wouldn't be surprised if the sata side will show a similar
> trend)

Currently, ahci is the only one which has controller-side automatic PS
but some ATA devices (hdds) implement device initiated PS (DIPS).  The
sw implementation supports SW HIPS and DIPS.  We can add HW HIPS support
and hook ALPE/ASP support there but I don't think it would have benefits
over SW implementation.

I think it's a bit different from cpufreq.  ATA is cheaper and more
broken and much more diverse.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] readahead: on-demand readahead logic

2007-06-11 Thread Rusty Russell
On Thu, 2007-05-17 at 06:47 +0800, Fengguang Wu wrote:
> +static unsigned long
> +ondemand_readahead(struct address_space *mapping,
> +struct file_ra_state *ra, struct file *filp,
> +struct page *page, pgoff_t offset,
> +unsigned long req_size)
> +{
> + unsigned long max;  /* max readahead pages */
> + pgoff_t ra_index;   /* readahead index */
> + unsigned long ra_size;  /* readahead size */
> + unsigned long la_size;  /* lookahead size */
> + int sequential;
> +
> + max = ra->ra_pages;
> + sequential = (offset - ra->prev_index <= 1UL) || (req_size > max);

Hi again!

This <= 1UL seems weird.  prev_index is end of last request, so I'd
expect offset == prev_index + 1 for sequential reads?  Does offset ==
ra->prev_index happen?  If not, this would be clearer as (offset ==
ra->prev_index + 1).

(prev_index is not a great name either, but that's not your patch 8).

> + /*
> +  * Lookahead/readahead hit, assume sequential access.
> +  * Ramp up sizes, and push forward the readahead window.
> +  */
> + if (offset && (offset == ra->lookahead_index ||
> + offset == ra->readahead_index)) {
> + ra_index = ra->readahead_index;
> + ra_size = get_next_ra_size2(ra, max);
> + la_size = ra_size;
> + goto fill_ra;
> + }

Will offset hit lookahead_index or readahead_index exactly?  Should this
be checking the range from offset to offset + req_size?

> + ra_index = offset;
> + ra_size = get_init_ra_size(req_size, max);
> + la_size = ra_size > req_size ? ra_size - req_size : ra_size;

So if we're doing a big sequential read, ra_size < req_size, so next
time offset will be > ra->readahead_index and the "ramp up sizes" code
won't get run?

> + /*
> +  * Hit on a lookahead page without valid readahead state.
> +  * E.g. interleaved reads.
> +  * Not knowing its readahead pos/size, bet on the minimal possible one.
> +  */
> + if (page) {
> + ra_index++;
> + ra_size = min(4 * ra_size, max);
> + }

If I understand correctly, it's expected to happen when we have multiple
streams: we previously marked the lookahead page, but then the other
stream changed the ra to somewhere else in the file.  We now change it
back to our stream, but we've lost information so we make it up.

This seems a little like two functions crammed into one.  Do you think
page_cache_readahead_ondemand() should be split into
"page_cache_readahead()" which doesn't take a page*, and
"page_cache_check_readahead_page()" which is an inline which does the
PageReadahead(page) check as well?

Thanks,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [shm][hugetlb] Fix get_policy for stacked shared memory files

2007-06-11 Thread Andrew Morton
On Mon, 11 Jun 2007 16:34:54 -0500 Adam Litke <[EMAIL PROTECTED]> wrote:

> Here's another breakage as a result of shared memory stacked files :(
> 
> The NUMA policy for a VMA is determined by checking the following (in the 
> order
> given):
> 
> 1) vma->vm_ops->get_policy() (if defined)
> 2) vma->vm_policy (if defined)
> 3) task->mempolicy (if defined)
> 4) Fall back to default_policy
> 
> By switching to stacked files for shared memory, get_policy() is now always 
> set
> to shm_get_policy which is a wrapper function.  This causes us to stop at step
> 1, which yields NULL for hugetlb instead of task->mempolicy which was the
> previous (and correct) result.
> 
> This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the
> wrapped vm_ops.  Andi and Christoph, does this look right to you?
> 

Can we just double-check the refcounting please?

> index 4fefbad..8d2672d 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -254,8 +254,10 @@ struct mempolicy *shm_get_policy(struct vm_area_struct 
> *vma, unsigned long addr)
>  
>   if (sfd->vm_ops->get_policy)
>   pol = sfd->vm_ops->get_policy(vma, addr);

afacit this takes a ref on the underlying policy

> - else
> + else if (vma->vm_policy)
>   pol = vma->vm_policy;
> + else
> + pol = current->mempolicy;

but these two do not.

>   return pol;
>  }
>  #endif

Is is all correct?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] AHCI Link Power Management

2007-06-11 Thread Arjan van de Ven

Tejun Heo wrote:

do you have data to support this?


Yeah, it was some Lenovo notebook.  Pavel is more familiar with the
hardware.  Pavel, what was the notebook which didn't save much power
with standard SATA power save but needed port to be completely turned off?


Pavel, if you have time, could you measure this with Kristen's patch?




The data we have from this patch is that it saves typically a Watt of
power (depends on the machine of course, but the range is 0.5W to
1.5W). If you want to also have an even more agressive thing where
you want to start disabling the entire controller... I don't see how
this is in conflict with saving power on the link level by "just"
enabling a hardware feature 


Well, both implement about the same thing.  I prefer software
implementation because it's more generic and ALPE/ASP seems too
aggressive to me. 


Too aggressive in what way?

There are tradeoffs on either side. Doing things in software is more 
work for the cpu, and depending on the implementation, will consume 
more power on the CPU side. (for example if you need regular timers 
that just consumes the power you are saving back up). The hardware can 
obviously switch very fast (because it's independent of any software), 
yet of course the software has higher level knowledge about how idle 
the link really is (like it knows if any files are open etc etc).


To be honest, I would be surprised if software could do significantly 
better than hardware though; it seems a simple problem: Idle -> go to 
low power, and estimating idle isn't all that hard on a link level... 
there's not all THAT much the kernel can estimate better I suspect.



This debate is very similar to the cpufreq debate from 4 years ago, 
where there were 3 levels: do it in the CPU, do it in the kernel or do 
it in userspace. All three are valid; whichever is best depends on the 
exact hardware that you have...
(and you can argue that first everyone started in userspace, then the 
hardware improved that made a kernelspace implementation better 
(ondemand) and now Turbo Mode is more or less moving this to the 
hardware... I wouldn't be surprised if the sata side will show a 
similar trend)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]: check_region cleanup in sbpcd.c

2007-06-11 Thread Surya
> > + *
> > + *
> 
> Why two blank comment lines?  Isn't one enough?
cleaned it up.
> 
>  sti();
> Just in case you are bored, at some point those cli()/sti() need to go
> away as well and be replaced with proper locking. But that's a
> different patch :-)
sure will take it up in the next patch.
> 
> > Kindly update if this is Ok. I will proceed with the original patch.
> This looks a lot better to me. I have only given it a quick look, but
> I don't see any major problems.
> 
I am sending with all the corrections, if its ok to acknowledge it?

> Please tell your mail client not to include crap like that when
> sending to public mailing lists.
Really sorry, I have no control over this.

Attaching the new patch with all the corrections.. thanks.


Signed-off-by: Surya Prabhakar <[EMAIL PROTECTED]>
--- 

diff --git a/drivers/cdrom/sbpcd.c b/drivers/cdrom/sbpcd.c
index a1283b1..5414172 100644
--- a/drivers/cdrom/sbpcd.c
+++ b/drivers/cdrom/sbpcd.c
@@ -358,6 +358,9 @@
  * Add bio/kdev_t changes for 2.5.x required to make it work again. 
  * Still room for improvement in the request handling here if anyone
  * actually cares.  Bring your own chainsaw.Paul G.  02/2002
+ *
+ * Deprecated check_region cleanup to request_region 
+ * -Surya Prabhakar N08/07/2007
  */
 
 
@@ -555,6 +558,7 @@ static struct cdrom_read_audio read_audio;
 static unsigned char msgnum;
 static char msgbuf[80];
 
+static int addr[2] = {1, CDROM_PORT};
 static int max_drives = MAX_DRIVES;
 module_param(max_drives, int, 0);
 #ifndef MODULE
@@ -5638,7 +5642,6 @@ int __init sbpcd_init(void)
 #endif
 {
int i=0, j=0;
-   int addr[2]={1, CDROM_PORT};
int port_index;
 
sti();
@@ -5670,9 +5673,9 @@ int __init sbpcd_init(void)
{
addr[1]=sbpcd[port_index];
if (addr[1]==0) break;
-   if (check_region(addr[1],4))
+   if (!request_region(addr[1],4, "sbpcd driver"))
{
-   msg(DBG_INF,"check_region: %03X is not 
free.\n",addr[1]);
+   msg(DBG_INF,"request_region: %03X is not 
free.\n",addr[1]);
continue;
}
if (sbpcd[port_index+1]==2) type=str_sp;
@@ -5699,6 +5702,7 @@ int __init sbpcd_init(void)
if (ndrives==0)
{
msg(DBG_INF, "No drive found.\n");
+   release_region(addr[1],4);
 #ifdef MODULE
return -EIO;
 #else
@@ -5775,6 +5779,7 @@ int __init sbpcd_init(void)
if (!request_region(CDo_command,4,major_name))
{
printk(KERN_WARNING "sbpcd: Unable to request region 0x%x\n", 
CDo_command);
+   release_region(addr[1],4);
return -EIO;
}
 
@@ -5788,6 +5793,8 @@ int __init sbpcd_init(void)
 #endif /* SOUND_BASE */
 
if (register_blkdev(MAJOR_NR, major_name)) {
+   release_region(CDo_command,4);
+   release_region(addr[1],4);
 #ifdef MODULE
return -EIO;
 #else
@@ -5801,6 +5808,7 @@ int __init sbpcd_init(void)
sbpcd_queue = blk_init_queue(do_sbpcd_request, _lock);
if (!sbpcd_queue) {
release_region(CDo_command,4);
+   release_region(addr[1],4);
unregister_blkdev(MAJOR_NR, major_name);
return -ENOMEM;
}
@@ -5834,6 +5842,7 @@ int __init sbpcd_init(void)
printk("Can't unregister %s\n", major_name);
}
release_region(CDo_command,4);
+   release_region(addr[1],4);
blk_cleanup_queue(sbpcd_queue);
return -EIO;
}
@@ -5850,6 +5859,7 @@ int __init sbpcd_init(void)
if (sbpcd_infop == NULL)
{
 release_region(CDo_command,4);
+   release_region(addr[1],4);
blk_cleanup_queue(sbpcd_queue);
 return -ENOMEM;
}
@@ -5894,6 +5904,7 @@ static void sbpcd_exit(void)
return;
}
release_region(CDo_command,4);
+   release_region(addr[1],4);
blk_cleanup_queue(sbpcd_queue);
for (j=0;jhttp://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 3/6] core changes in CFS

2007-06-11 Thread Srivatsa Vaddagiri
On Tue, Jun 12, 2007 at 07:59:22AM +0530, Balbir Singh wrote:
> > +#define entity_is_task(se) 1
> 
> Could you add some comments as to what this means?

sure. Basically this macro tests whether a given schedulable entity is
task or not. Other possible schedulable entities could be process, user,
container etc. These various entities form a hierarchy with task being
at the bottom of the hierarchy.

> Should be it boolean instead (true)

I don't have a good opinion on this. Would it make sparse friendly?

> > + * Enqueue a entity into the rb-tree:
> 
> Enqueue an entity

yes

> 
> > -static void limit_wait_runtime(struct rq *rq, struct task_struct *p)
> > +static void limit_wait_runtime(struct lrq *lrq, struct sched_entity *p)
> 
> p is a general convention for tasks in the code, could we use something
> different -- may be "e"?

'se' perhaps as is used elsewhere. I avoided making that change so that
people will see less diff o/p in the patch :) I agree though a better
name is needed.

> >  static s64 div64_s(s64 divident, unsigned long divisor)
> > @@ -183,49 +219,51 @@
> >   * Update the current task's runtime statistics. Skip current tasks that
> >   * are not in our scheduling class.
> >   */
> > -static inline void update_curr(struct rq *rq, u64 now)
> > +static inline void update_curr(struct lrq *lrq, u64 now)
> >  {
> > -   unsigned long load = rq->lrq.raw_weighted_load;
> > +   unsigned long load = lrq->raw_weighted_load;
> > u64 delta_exec, delta_fair, delta_mine;
> > -   struct task_struct *curr = rq->curr;
> > +   struct sched_entity *curr = lrq_curr(lrq);
> 
> How about curr_entity?

I prefer its current name, but will consider your suggestion in next
iteration.

> > +   struct rq *rq = lrq_rq(lrq);
> > +   struct task_struct *curtask = rq->curr;
> > 
> > -   if (curr->sched_class != _sched_class || curr == rq->idle || !load)
> > +   if (!curr || curtask == rq->idle || !load)
> 
> Can !curr ever be true? shoudn't we look into the sched_class of the task
> that the entity belongs to?

Couple of cases that we need to consider here:

CONFIG_FAIR_GROUP_SCHED disabled:

lrq_curr() essentially returns NULL if currently running task
doesnt belong to fair_sched_class, else it returns >curr->se
So the check for fair_sched_class is taken care in that
function.

CONFIG_FAIR_GROUP_SCHED enabled:

lrq_curr() returns lrq->curr. I introduced ->curr field in lrq
to optimize on not having to update lrq's fair_clock
(update_curr upon enqueue/dequeue task) if it was not currently 
"active".

Lets say that there are two groups 'vatsa' and 'guest'
with their own lrqs on each cpu. If CPU0 is currently running a
task from group 'vatsa', then lrq_vatsa->curr will point to
the currently running task, while lrq_guest->curr will be 
NULL. While the task from 'vatsa' is running, if we were to
enqueue/dequeue task from group 'guest', we need not 
update lrq_guest's fair_clock (as it is not active currently).
This optimization in update_curr is made possible by maintaining
a 'curr' field in lrq.

Hope this answers your question.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] AHCI Link Power Management

2007-06-11 Thread Tejun Heo
Arjan van de Ven wrote:
>> I'm not sure about this.  We need better PM framework to support 
>> powersaving in other controllers and some ahcis don't save much
>> when only link power management is used,
> 
> do you have data to support this?

Yeah, it was some Lenovo notebook.  Pavel is more familiar with the
hardware.  Pavel, what was the notebook which didn't save much power
with standard SATA power save but needed port to be completely turned off?

> The data we have from this patch is that it saves typically a Watt of
> power (depends on the machine of course, but the range is 0.5W to
> 1.5W). If you want to also have an even more agressive thing where
> you want to start disabling the entire controller... I don't see how
> this is in conflict with saving power on the link level by "just"
> enabling a hardware feature 

Well, both implement about the same thing.  I prefer software
implementation because it's more generic and ALPE/ASP seems too
aggressive to me.  Here are reasons why sw implementation wasn't merged.

1. It didn't have proper interface with userland.  This was mainly
because of missing ATA sysfs nodes.  I'm not sure whether adding this to
scsi node is a good idea.

2. It was focused on SATA link PS and couldn't cover the Lenovo case.

I think we need something at the block layer.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/3] Enable Aggressive Link Power management for AHCI controllers.

2007-06-11 Thread Arjan van de Ven

Henrique de Moraes Holschuh wrote:

On Mon, 11 Jun 2007, Jeff Garzik wrote:

on/off doesn't really make sense if the question is "do you favor power
or do you favor performance"...


Actually, it does if you think of it as "do you need hotplug right now or
not?".


that's a temporary shortcoming; even with these power savings you can 
do hotplug as long as you're willing to poll for it at a reasonable 
interval and are willing to wait the time between polls for a hotplug 
to take effect..

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/3] Enable Aggressive Link Power management for AHCI controllers.

2007-06-11 Thread Henrique de Moraes Holschuh
On Mon, 11 Jun 2007, Jeff Garzik wrote:
> >>on/off doesn't really make sense if the question is "do you favor power
> >>or do you favor performance"...

Actually, it does if you think of it as "do you need hotplug right now or
not?".

> >How about just making it a numeric scale with 0 meaning no power saving
> >and then some fixed number of levels (e.g 0-9)?
> 
> The original proposal seems far more intuitive than these alternatives.

There is nothing intuitive about the text or the levels.  All cases need
proper documentation.  I'd never expect link powersaving to kill hotplug,
unless I read the AHCI docs.

And enable/disable ain't intuitive either :(  But enable/disable is useful
to get stuff like SATA bay hotplug, dock/undock and other stuff that needs
hotplug to be working right, unless we can make that automatic so that power
saving is always disabled in all situations we'd need hotplug to be working?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][AGPGART]intel-agp: save whole config space in suspend/resume

2007-06-11 Thread Dave Jones
On Tue, Jun 12, 2007 at 11:34:25AM +0800, Wang Zhenyu wrote:

 > I understand. Before James reported his problem on i915, I have thought
 > the basic restore on that chip should already be enough, but he proved I was
 > wrong and I'm not sure if this also happens on other i915 board with 
 > different
 > BIOS. 
 > 
 > And with my patch it has already removed the restore cases for 440BX like 
 > type,
 > coz it's gmch_chip_id == 0 and intel_private.pcidev is NULL, so it won't save
 > extra space on those chips.

The 440BX was one example, for all we know there are similar ordering
issues with other chipsets.  We hit this problem with the code that
restores the first 64 bytes first of all. Then we found out we had
to restore them in reverse order to be safe.  We were able to do
this generically, because those bytes are standardised across devices.

The upper config space isn't standardised, so we have to obey the
per-device rules as to what order we read/write things.
Writing back an "enable" bit somewhere before we've written back
addresses in later registers for example may result in really
bizarre things happening.  These are the kind of bugs that aren't
obvious, and turn out to be "that weird reboot that happens
every 3rd tuesday" six months after we've merged the changes
and everyones forgotten all about the potential problems.

The AGP code has had more than its fair share of really nasty
bugs like this to track down, so I'm strongly opposed to introducing
hacks that may trip us up later.

Whilst I'm not a huge fan of the 815 patch in -mm as it stands,
I think it's a better direction to go in to have per-chipset
save/restore routines.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [shm][hugetlb] Fix get_policy for stacked shared memory files

2007-06-11 Thread William Lee Irwin III
On Mon, Jun 11, 2007 at 04:34:54PM -0500, Adam Litke wrote:
> Here's another breakage as a result of shared memory stacked files :(
> The NUMA policy for a VMA is determined by checking the following (in
> the order given):
> 1) vma->vm_ops->get_policy() (if defined)
> 2) vma->vm_policy (if defined)
> 3) task->mempolicy (if defined)
> 4) Fall back to default_policy
> By switching to stacked files for shared memory, get_policy() is now
> always set to shm_get_policy which is a wrapper function.  This
> causes us to stop at step 1, which yields NULL for hugetlb instead of
> task->mempolicy which was the previous (and correct) result.
> This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the
> wrapped vm_ops.  Andi and Christoph, does this look right to you?
> Signed-off-by: Adam Litke <[EMAIL PROTECTED]>

Thanks for fielding this. The fix is certainly clear enough.

Acked-by: William Irwin <[EMAIL PROTECTED]>


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH 1/6] Introduce struct sched_entity and struct lrq

2007-06-11 Thread Srivatsa Vaddagiri
On Tue, Jun 12, 2007 at 07:45:59AM +0530, Balbir Singh wrote:
> > +/* CFS-related fields in a runqueue */
> > +struct lrq {
> > +   unsigned long raw_weighted_load;
> > +   #define CPU_LOAD_IDX_MAX 5
> > +   unsigned long cpu_load[CPU_LOAD_IDX_MAX];
> > +   unsigned long nr_load_updates;
> > +
> > +   u64 fair_clock, delta_fair_clock;
> > +   u64 exec_clock, delta_exec_clock;
> > +   s64 wait_runtime;
> > +   unsigned long wait_runtime_overruns, wait_runtime_underruns;
> > +
> > +   struct rb_root tasks_timeline;
> > +   struct rb_node *rb_leftmost;
> > +   struct rb_node *rb_load_balance_curr;
> > +};
> > +
> 
> Shouldn't the rq->lock move into lrq?

Right now, the per-cpu rq lock protects all (local) runqueues attached with the 
cpu. At some point, for scalability reasons, we may want to split that to
be per-cpu per-local runqueue (as you point out). I will put that in my todo
list of things to consider. Thanks for the review!

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [shm][hugetlb] Fix get_policy for stacked shared memory files

2007-06-11 Thread dean gaudet
On Mon, 11 Jun 2007, Adam Litke wrote:

> Here's another breakage as a result of shared memory stacked files :(
> 
> The NUMA policy for a VMA is determined by checking the following (in the 
> order
> given):
> 
> 1) vma->vm_ops->get_policy() (if defined)
> 2) vma->vm_policy (if defined)
> 3) task->mempolicy (if defined)
> 4) Fall back to default_policy
> 
> By switching to stacked files for shared memory, get_policy() is now always 
> set
> to shm_get_policy which is a wrapper function.  This causes us to stop at step
> 1, which yields NULL for hugetlb instead of task->mempolicy which was the
> previous (and correct) result.
> 
> This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the
> wrapped vm_ops.  Andi and Christoph, does this look right to you?

thanks for the patch -- seems to do the trick for me.  it seems like it 
would be a candidate for stable series as well.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][AGPGART]intel-agp: save whole config space in suspend/resume

2007-06-11 Thread Wang Zhenyu
On 2007.06.11 22:23:21 +, Dave Jones wrote:
> 
> I'd feel much safer if we only did this on chipsets where we know we 
> have to do it.   Doing this for *every* Intel chipset ever made _will_
> bite us.  There are some early chipsets (440BX era iirc) that would just
> hang the box when you tried to read from various write-only registers.
> 
> But even on the chipsets where we _do_ need to save/restore something in
> the upper part of the config space, surely it'd be a lot safer
> to just save/restore what we need to rather than risk all sorts
> of mayhem by writing to bits that may trigger hardware events.
> 

I understand. Before James reported his problem on i915, I have thought
the basic restore on that chip should already be enough, but he proved I was
wrong and I'm not sure if this also happens on other i915 board with different
BIOS. 

And with my patch it has already removed the restore cases for 440BX like type,
coz it's gmch_chip_id == 0 and intel_private.pcidev is NULL, so it won't save
extra space on those chips.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/9] readahead: data structure and routines

2007-06-11 Thread Rusty Russell
On Thu, 2007-05-17 at 06:47 +0800, Fengguang Wu wrote: 
> /*
>   * Track a single file's readahead state
> + *
> + *  #|==#==|
> + *  ^^  ^  ^
> + *  file_ra_state.la_index.ra_index   .lookahead_index   .readahead_index
>   */
>  struct file_ra_state {
>   unsigned long start;/* Current window */
> @@ -711,6 +715,12 @@ struct file_ra_state {
>   unsigned long prev_index;   /* Cache last read() position */
>   unsigned long ahead_start;  /* Ahead window */
>   unsigned long ahead_size;
> +
> + pgoff_t la_index;   /* enqueue time */
> + pgoff_t ra_index;   /* begin offset */
> + pgoff_t lookahead_index;/* time to do next readahead */
> + pgoff_t readahead_index;/* end offset */
> +

Hi Fengguang,

I found these variables a little confusing.  la_index is the last offset
passed to ondemand_readahead, so perhaps "last_request_start" is a
better name?  The comment "enqueue time" seems strange, too.

ra_index seems ok, although "readahead_start" might be better.  Perhaps
readahead_index should be expressed as readahead_size, which is how it
seems to be used.  Perhaps "lookahead_index" is best expressed as a
buffer at the end of the readahead zone (readahead_min?).

ie:
pgoff_t last_request_start; /* start of req which triggered 
readahead */
pgoff_t readahead_start;/* Where readahead started */
pgoff_t readahead_size; /* PAGE_CACHE_SIZE units of readahead */
pgoff_t readahead_min;  /* readahead_size left before we recalc 
*/

This gets rid of many of the accessors, I think, and avoids introducing
a new term to understand (lookahead).

> +/*
> + * Where is the old read-ahead and look-ahead?
> + */
> +static inline void ra_set_index(struct file_ra_state *ra,
> + pgoff_t la_index, pgoff_t ra_index)
> +{
> + ra->la_index = la_index;
> + ra->ra_index = ra_index;
> +}
> +
> +/*
> + * Where is the new read-ahead and look-ahead?
> + */
> +static inline void ra_set_size(struct file_ra_state *ra,
> + unsigned long ra_size, unsigned long la_size)
> +{
> + ra->readahead_index = ra->ra_index + ra_size;
> + ra->lookahead_index = ra->ra_index + ra_size - la_size;
> +}

These are only called in one place, so I think it's clearer to do this
there directly.  But I see you exported ra_submit, too, even though it's
only used in the same file.  Are there plans for other users?

Thanks,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/6] include linux pci_id-h add amd northbridge defines

2007-06-11 Thread Dave Jones
On Mon, Jun 11, 2007 at 04:49:47PM -0700, Greg Kroah-Hartman wrote:
 > On Mon, Jun 11, 2007 at 04:30:11PM -0700, Doug Thompson wrote:
 > > I am working with the k8 driver and its dealing with a race with the 
 > > mcelog device as both access
 > > the K8 NB. The K8 driver does use these regs and it currently has #ifndef 
 > > s in it for both of
 > > them.
 > > 
 > > I guess I could have submitted the patch when the K8 driver was submitted.
 > 
 > That would be preferable, thanks.

Even better (IMO), if they're not used by any other driver (which seems
to be the case), keep the defines local to the driver.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata and legacy ide pcmcia failure

2007-06-11 Thread Tejun Heo
Mark Lord wrote:
> Robert de Rooy wrote:
>> (after applying the ide-polling experimental patch)
>>
>> With this I can declare success!! I was able to read and write to the
>> card without any problems, although I did not try to stress it.
>>
>> Jun 12 00:19:42 localhost kernel: pccard: PCMCIA card inserted into
>> slot 0
>> Jun 12 00:19:42 localhost kernel: cs: memory probe
>> 0xe800-0xefff: excluding 0xe800-0xefff
>> Jun 12 00:19:42 localhost kernel: cs: memory probe
>> 0xc020-0xcfff: excluding 0xc020-0xc11f
>> 0xc1a0-0xc21f 0xc2a0-0xc31f 0xc3a0-0xcc1f
>> 0xcca0-0xcd1f 0xcda0-0xce1f 0xcea0-0xcf1f
>> 0xcfa0-0xd01f
>> Jun 12 00:19:42 localhost kernel: pcmcia: registering new device
>> pcmcia0.0
>> Jun 12 00:19:42 localhost kernel: Uniform Multi-Platform E-IDE driver
>> Revision: 7.00alpha2
>> Jun 12 00:19:42 localhost kernel: ide: Assuming 33MHz system bus speed
>> for PIO modes; override with idebus=xx
>> Jun 12 00:19:45 localhost kernel: hda: Memory Card Adapter, CFA DISK
>> drive
>> Jun 12 00:19:45 localhost kernel: ide0 at 0x4100-0x4107,0x410e on irq 3
>> Jun 12 00:19:45 localhost kernel: ide-cs: hda: Vpp = 0.0
>> Jun 12 00:19:45 localhost udevd-event[20730]: udev_rules_apply_format:
>> unknown format variable '$modalias'
>> Jun 12 00:19:45 localhost kernel: hda: max request size: 128KiB
>> Jun 12 00:19:45 localhost kernel: hda: 253696 sectors (129 MB) w/1KiB
>> Cache, CHS=991/16/16
>> Jun 12 00:19:45 localhost kernel:  hda: hda1
>> Jun 12 00:19:48 localhost hald: mounted /dev/hda1 on behalf of uid 0
> 
> Okay, Tejun / Bart / Alan:
> 
> This proves that the device does work correctly in most respects
> except for interrupt delivery.  The status bits are working and
> it can be probed for, configured, and used.

libata can do most of this too by using ATA_FLAG_PIO_POLLING (doesn't
cover nodata commands tho).

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc4-mm2: kvm compile breakage with X86_CMPXCHG64=n

2007-06-11 Thread Dave Jones
On Tue, Jun 12, 2007 at 02:07:18AM +0200, Adrian Bunk wrote:
 
 > I'm getting the following compile error with CONFIG_X86_CMPXCHG64=n 
 > (with -Werror-implicit-function-declaration - otherwise it would be a 
 > link error):

We really should just get that flag into mainline so that it breaks
for people before they submit patches.  We run into this constantly.


Add -Werror-implicit-function-declaration
This makes builds fail sooner if something is implicitly defined instead
of having to wait half an hour for it to fail at the linking stage.

Signed-off-by: Dave Jones <[EMAIL PROTECTED]>

--- linux-2.6/Makefile~ 2007-06-04 16:46:24.0 -0400
+++ linux-2.6/Makefile  2007-06-04 16:46:53.0 -0400
@@ -313,7 +313,8 @@ LINUXINCLUDE:= -Iinclude \
 CPPFLAGS:= -D__KERNEL__ $(LINUXINCLUDE)
 
 CFLAGS  := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
-   -fno-strict-aliasing -fno-common
+  -fno-strict-aliasing -fno-common \
+  -Werror-implicit-function-declaration
 AFLAGS  := -D__ASSEMBLY__
 
 # Read KERNELRELEASE from include/config/kernel.release (if it exists)

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[KJ PATCH] Replacing memcpy(dest,src,PAGE_SIZE) with copy_page(dest,src) in arch/i386/mm/init.c

2007-06-11 Thread Shani Moideen
Hi,
Replacing memcpy(dest,src,PAGE_SIZE) with copy_page(dest,src) in 
arch/i386/mm/init.c.

Signed-off-by: Shani Moideen <[EMAIL PROTECTED]>



diff --git a/arch/i386/mm/init.c b/arch/i386/mm/init.c
index ae43688..7dc3d46 100644
--- a/arch/i386/mm/init.c
+++ b/arch/i386/mm/init.c
@@ -397,7 +397,7 @@ char __nosavedata swsusp_pg_dir[PAGE_SIZE]
 
 static inline void save_pg_dir(void)
 {
-   memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE);
+   copy_page(swsusp_pg_dir, swapper_pg_dir);
 }
 #else
 static inline void save_pg_dir(void)

-- 
Regards,
Shani 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: memory/cpu hotplug section mismatch.

2007-06-11 Thread Paul Mundt
On Tue, Jun 12, 2007 at 10:50:33AM +0900, Yasunori Goto wrote:
> > > 
> > > If CONFIG_MEMORY_HOTPLUG=n __meminit == __init, and if
> > > CONFIG_HOTPLUG_CPU=n __cpuinit == __init. However, with one set and the
> > > other disabled, you end up with a reference between __init and a regular
> > > non-init function.
> > 
> > My plan is to define dedicated sections for both __devinit and __meminit.
> > Then we can apply the checks no matter the definition of CONFIG_HOTPLUG*
> 
> I prefer defining "__nodeinit" for __cpuinit and __meminit case to
> __devinit.   __devinit is used many devices like I/O, and it is
> useful for many desktop users. But, cpu/memory hotpluggable box
> is very rare. And it should be in init section for many people.
> 
> This kind of issue is caused by initialization of pgdat/zone.
> I think __nodeinit is enough and desirable.
> 
A #define __nodeinit __devinit is probably reasonable for clarity
purposes. But whatever we want to call it, the current __cpuinit for
zone_batchsize() has to be changed, as it will be freed with the rest of
the init code if CPU hotplug is disabled. If we want to do something
cleaner in the long run, that's fine, but changing to __devinit now at
least gets the semantics right for both the memory and cpu hotplug cases.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] AHCI Link Power Management

2007-06-11 Thread Jeff Garzik

Arjan van de Ven wrote:

Jeff Garzik wrote:
SATA standard defines lower power phy states.  So the same argument 
you're using for AHCI applies there too -- "just" enabling an existing 
hardware feature.


yes I'm not arguing against that. I was trying to find out (and 
suggest-unless-proven-otherwise) that the 2 are not exclusive or 
conflicting... in fact I assume both are wanted concurrently.


Yes and no.  As I understand it, AHCI's capability is an automatic 
version of what standard SATA phys provide manually.  In AHCI's case, 
the hardware automatically manages the link power, possibly cycling it 
hundreds of times per second.  In the standard case, software must 
determine when a different power state is appropriate based on current 
conditions, and update the phy appropriately.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at mm/slub.c:3689!

2007-06-11 Thread Christoph Lameter
> and I can't do that over VPN. I'll test it first thing in the morning.

Here is a more general fix



SLUB: minimum alignment fixes

If ARCH_KMALLOC_MIN_ALIGN is set to a value greater than 8 (SLUBs smallest
kmalloc cache) then SLUB may generate duplicate slabs in sysfs (yes again).
No arch sets ARCH_KMALLOC_MINALIGN larger than 8 though excepts mips which
needs a 128 byte cache.

This patch increases the size of the smallest cache if ARCH_KMALLOC_MINALIGN
is greater than 8. In that case more and more of the smallest caches are
disabled.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/slub_def.h |   11 +--
 mm/slub.c|   18 ++
 2 files changed, 19 insertions(+), 10 deletions(-)

Index: vps/include/linux/slub_def.h
===
--- vps.orig/include/linux/slub_def.h   2007-06-11 15:56:37.0 -0700
+++ vps/include/linux/slub_def.h2007-06-11 19:31:15.0 -0700
@@ -28,7 +28,7 @@ struct kmem_cache {
int size;   /* The size of an object including meta data */
int objsize;/* The size of an object without meta data */
int offset; /* Free pointer offset. */
-   unsigned int order;
+   int order;
 
/*
 * Avoid an extra cache line for UP, SMP and for the node local to
@@ -56,7 +56,11 @@ struct kmem_cache {
 /*
  * Kmalloc subsystem.
  */
-#define KMALLOC_SHIFT_LOW 3
+#ifdef ARCH_KMALLOC_MIN_ALIGN
+#define KMALLOC_MIN_SIZE max(8, ARCH_KMALLOC_MIN_ALIGN)
+#else
+#define KMALLOC_MIN_SIZE 8
+#endif
 
 /*
  * We keep the general caches in an array of slab caches that are used for
@@ -76,6 +80,9 @@ static inline int kmalloc_index(size_t s
if (size > KMALLOC_MAX_SIZE)
return -1;
 
+   if (size <= KMALLOC_MIN_SIZE)
+   return ilog2(KMALLOC_MIN_SIZE);
+
if (size > 64 && size <= 96)
return 1;
if (size > 128 && size <= 192)
Index: vps/mm/slub.c
===
--- vps.orig/mm/slub.c  2007-06-11 15:56:37.0 -0700
+++ vps/mm/slub.c   2007-06-11 19:13:41.0 -0700
@@ -2193,11 +2193,11 @@ EXPORT_SYMBOL(kmem_cache_destroy);
  * Kmalloc subsystem
  ***/
 
-struct kmem_cache kmalloc_caches[KMALLOC_SHIFT_HIGH + 1] __cacheline_aligned;
+struct kmem_cache kmalloc_caches[ilog2(KMALLOC_MAX_SIZE) + 1] 
__cacheline_aligned;
 EXPORT_SYMBOL(kmalloc_caches);
 
 #ifdef CONFIG_ZONE_DMA
-static struct kmem_cache *kmalloc_caches_dma[KMALLOC_SHIFT_HIGH + 1];
+static struct kmem_cache *kmalloc_caches_dma[ilog2(KMALLOC_MAX_SIZE) + 1];
 #endif
 
 static int __init setup_slub_min_order(char *str)
@@ -2284,7 +2284,7 @@ static struct kmem_cache *get_slab(size_
if (!x)
panic("Unable to allocate memory for dma cache\n");
 
-   if (index <= KMALLOC_SHIFT_HIGH)
+   if (index <= ilog2(KMALLOC_MAX_SIZE))
realsize = 1 << index;
else {
if (index == 1)
@@ -2529,19 +2529,21 @@ void __init kmem_cache_init(void)
slab_state = PARTIAL;
 
/* Caches that are not of the two-to-the-power-of size */
-   create_kmalloc_cache(_caches[1],
+   if (KMALLOC_MIN_SIZE < 96)
+   create_kmalloc_cache(_caches[1],
"kmalloc-96", 96, GFP_KERNEL);
-   create_kmalloc_cache(_caches[2],
+   if (KMALLOC_MIN_SIZE < 192)
+   create_kmalloc_cache(_caches[2],
"kmalloc-192", 192, GFP_KERNEL);
 
-   for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++)
+   for (i = ilog2(KMALLOC_MIN_SIZE); i <= ilog2(KMALLOC_MAX_SIZE); i++)
create_kmalloc_cache(_caches[i],
"kmalloc", 1 << i, GFP_KERNEL);
 
slab_state = UP;
 
/* Provide the correct kmalloc names now that the caches are up */
-   for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++)
+   for (i = ilog2(KMALLOC_MIN_SIZE); i <= ilog2(KMALLOC_MAX_SIZE); i++)
kmalloc_caches[i]. name =
kasprintf(GFP_KERNEL, "kmalloc-%d", 1 << i);
 
@@ -2554,7 +2556,7 @@ void __init kmem_cache_init(void)
 
printk(KERN_INFO "SLUB: Genslabs=%d, HWalign=%d, Order=%d-%d, 
MinObjects=%d,"
" Processors=%d, Nodes=%d\n",
-   KMALLOC_SHIFT_HIGH, cache_line_size(),
+   ilog2(KMALLOC_MAX_SIZE), cache_line_size(),
slub_min_order, slub_max_order, slub_min_objects,
nr_cpu_ids, nr_node_ids);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [patch 0/3] AHCI Link Power Management

2007-06-11 Thread Arjan van de Ven

Jeff Garzik wrote:

Arjan van de Ven wrote:

Tejun Heo wrote:

Kristen Carlson Accardi wrote:

Hi,
This series of patches enables Aggressive Link Power Management for 
AHCI devices, as documented in the AHCI spec.  On my laptop (a 
Lenovo X60), this
saves me a full watt of power.  On other systems, reported power 
savings
range from .5-1.5 Watts.  It has been tested by the kind folks at 
#powertop
with similar results.  Please give it a try and let me know what you 
think.


I'm not sure about this.  We need better PM framework to support
powersaving in other controllers and some ahcis don't save much when
only link power management is used, 


do you have data to support this? The data we have from this patch is 
that it saves typically a Watt of power (depends on the machine of 
course, but the range is 0.5W to 1.5W). If you want to also have an 
even more agressive thing where you want to start disabling the entire 
controller... I don't see how this is in conflict with saving power on 
the link level by "just" enabling a hardware feature 


SATA standard defines lower power phy states.  So the same argument 
you're using for AHCI applies there too -- "just" enabling an existing 
hardware feature.


yes I'm not arguing against that. I was trying to find out (and 
suggest-unless-proven-otherwise) that the 2 are not exclusive or 
conflicting... in fact I assume both are wanted concurrently.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ia64: Scalability improvement of gettimeofday with jitter compensation

2007-06-11 Thread Hidetoshi Seto
Hi all,

This is a proposal with a patch to improve scalability
of time handling.

As described in comment at arch/ia64/kernel/time.c:

[arch/ia64/kernel/time.c]
> #ifdef CONFIG_SMP
>   /* On IA64 in an SMP configuration ITCs are never accurately synchronized.
>* Jitter compensation requires a cmpxchg which may limit
>* the scalability of the syscalls for retrieving time.
>* The ITC synchronization is usually successful to within a few
>* ITC ticks but this is not a sure thing. If you need to improve
>* timer performance in SMP situations then boot the kernel with the
>* "nojitter" option. However, doing so may result in time fluctuating 
> (maybe
>* even going backward) if the ITC offsets between the individual CPUs
>* are too large.
>*/
>   if (!nojitter) itc_interpolator.jitter = 1;
> #endif

ia64 uses jitter compensation to prevent time from going backward.

This jitter compensation logic which keep track of cycle value
recently returned is provided as generic code (and copied to
arch/ia64/kernel/fsys.S).
It seems that there is no user (setting jitter = 1) other than ia64.

[kernel/timer.c]
> static inline u64 time_interpolator_get_counter(int writelock)
> {
>   unsigned int src = time_interpolator->source;
>
>   if (time_interpolator->jitter)
>   {
> cycles_t lcycle;
> cycles_t now;
>
> do {
>   lcycle = time_interpolator->last_cycle;
>   now = time_interpolator_get_cycles(src);
>   if (lcycle && time_after(lcycle, now))
> return lcycle;
>
>   /* When holding the xtime write lock, there's no need
>* to add the overhead of the cmpxchg.  Readers are
>* force to retry until the write lock is released.
>*/
>   if (writelock) {
> time_interpolator->last_cycle = now;
> return now;
>   }
>   /* Keep track of the last timer value returned. The use of cmpxchg here
>* will cause contention in an SMP environment.
>*/
> } while (unlikely(cmpxchg(_interpolator->last_cycle, lcycle, now) != 
> lcycle));
> return now;
>   }
>   else
> return time_interpolator_get_cycles(src);
> }

The current logic is consist of do-while loop with cmpxchg.

The cmpxchg is known to take long time in an SMP environment but
it is easy way to guarantee atomic operation.
I think this is acceptable while there are no better alternatives.

OTOH, the do-while forces retry if cmpxchg fails (no exchanges).
This means that if there are N threads trying to do cmpxchg at
same time, only 1 can out from this loop and N-1 others will be
trapped in the loop. This also means that a thread could loop
N times in worst case.

Obviously this is a scalability issue.
To avoid this retry loop, I'd like to propose new logic that
removes do-while here.

The basic idea is "use winner's cycle instead of retrying."
Assuming that there are N threads trying to do cmpxchg, it also
be assumed that they are trying to update last_cycle by its own
new value while all values are almost same.
Therefore, it will work that treating threads as a group and
deciding a group's return value by picking up one from the group.

Fortunately, cmpxchg mechanism can help this logic. Only first
one in group can exchange the last_cycle successfully, so this
"winner" gets previous last_cycle as the return value of cmpxchg.
The rests in group will fail to exchange since last_cycle is
already updated by winner, so these "loser" gets current
last_cycle on cmpxchg's return. This means that all thread in
the group can know the winner's cycle.

  ret = cmpxchg(_cycle, last, new);
  if (ret == last)
return new; /* you win! */
  else
return ret; /* you lose. ret is winner's new */

I had a test running gettimeofday() processes at 1.5GHz*4way box.
It shows followings:

 - x1 process:
0.15us / 1 gettimeofday() call
0.15us / 1 gettimeofday() call with patch
 - x2 process:
0.31us / 1 gettimeofday() call
0.24us / 1 gettimeofday() call with patch
 - x3 process:
1.59us / 1 gettimeofday() call
1.11us / 1 gettimeofday() call with patch
 - x4 process:
2.34us / 1 gettimeofday() call
1.29us / 1 gettimeofday() call with patch

I know that this patch could not help quite huge system since
such system like having 1024CPUs should have better clocksource
instead of doing cmpxchg. Even though this patch will work good
on middle-sized box (4~8way, possibly 16~64way?).

Thanks,
H.Seto

Signed-off-by: Hidetoshi Seto <[EMAIL PROTECTED]>

-
 arch/ia64/kernel/fsys.S |   10 +++---
 kernel/timer.c  |   45 +
 2 files changed, 32 insertions(+), 23 deletions(-)

Index: linux-2.6.21/arch/ia64/kernel/fsys.S
===
--- linux-2.6.21.orig/arch/ia64/kernel/fsys.S
+++ linux-2.6.21/arch/ia64/kernel/fsys.S
@@ -271,18 +271,22 @@
 (p6)   sub r10 = r25,r26   // time we got was less than last_cycle
 (p7)   mov ar.ccv = r25

Allow softlockup to be runtime disabled.

2007-06-11 Thread Dave Jones
It's useful sometimes to disable the softlockup checker at boottime.
Especially if it triggers during a distro install.

Signed-off-by: Dave Jones <[EMAIL PROTECTED]>

--- linux-2.6/init/main.c~  2006-03-05 00:45:51.0 -0500
+++ linux-2.6/init/main.c   2006-03-05 00:49:41.0 -0500
@@ -732,6 +732,15 @@ static void __init do_basic_setup(void)
do_initcalls();
 }
 
+static int __initdata nosoftlockup;
+
+static int __init nosoftlockup_setup(char *str)
+{
+   nosoftlockup = 1;
+   return 1;
+}
+__setup("nosoftlockup", nosoftlockup_setup);
+
 static void __init do_pre_smp_initcalls(void)
 {
extern int spawn_ksoftirqd(void);
@@ -649,7 +657,8 @@ static void do_pre_smp_initcalls(void)
migration_init();
 #endif
spawn_ksoftirqd();
-   spawn_softlockup_task();
+   if (!nosoftlockup)
+   spawn_softlockup_task();
 }
 
 static void run_init_process(char *init_filename)

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] AHCI Link Power Management

2007-06-11 Thread Jeff Garzik

Arjan van de Ven wrote:

Tejun Heo wrote:

Kristen Carlson Accardi wrote:

Hi,
This series of patches enables Aggressive Link Power Management for 
AHCI devices, as documented in the AHCI spec.  On my laptop (a Lenovo 
X60), this

saves me a full watt of power.  On other systems, reported power savings
range from .5-1.5 Watts.  It has been tested by the kind folks at 
#powertop
with similar results.  Please give it a try and let me know what you 
think.


I'm not sure about this.  We need better PM framework to support
powersaving in other controllers and some ahcis don't save much when
only link power management is used, 


do you have data to support this? The data we have from this patch is 
that it saves typically a Watt of power (depends on the machine of 
course, but the range is 0.5W to 1.5W). If you want to also have an even 
more agressive thing where you want to start disabling the entire 
controller... I don't see how this is in conflict with saving power on 
the link level by "just" enabling a hardware feature 


SATA standard defines lower power phy states.  So the same argument 
you're using for AHCI applies there too -- "just" enabling an existing 
hardware feature.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] readahead: move synchronous readahead call out of splice loop

2007-06-11 Thread Fengguang Wu
Move synchronous page_cache_readahead_ondemand() call out of splice loop.

This avoids one pointless page allocation/insertion in case of non-zero
ra_pages, or many pointless readahead calls in case of zero ra_pages.

Note that if a user sets ra_pages to less than PIPE_BUFFERS=16 pages, he will
not get expected readahead behavior anyway.  The splice code works in batches
of 16 pages, which can be taken as another form of synchronous readahead.

Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 fs/splice.c |   11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

--- linux-2.6.22-rc3-mm1.orig/fs/splice.c
+++ linux-2.6.22-rc3-mm1/fs/splice.c
@@ -299,12 +299,16 @@ __generic_file_splice_read(struct file *
 * Lookup the (hopefully) full range of pages we need.
 */
spd.nr_pages = find_get_pages_contig(mapping, index, nr_pages, pages);
+   index += spd.nr_pages;
 
/*
 * If find_get_pages_contig() returned fewer pages than we needed,
-* allocate the rest.
+* readahead/allocate the rest.
 */
-   index += spd.nr_pages;
+   if (spd.nr_pages < nr_pages)
+   page_cache_readahead_ondemand(mapping, >f_ra, in,
+   NULL, index, nr_pages - spd.nr_pages);
+
while (spd.nr_pages < nr_pages) {
/*
 * Page could be there, find_get_pages_contig() breaks on
@@ -312,9 +316,6 @@ __generic_file_splice_read(struct file *
 */
page = find_get_page(mapping, index);
if (!page) {
-   page_cache_readahead_ondemand(mapping, >f_ra, in,
-   NULL, index, nr_pages - spd.nr_pages);
-
/*
 * page didn't exist, allocate one.
 */

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] readahead update on splice reads

2007-06-11 Thread Fengguang Wu
Andrew,

The two patches optimizes readahead invocations in splice reads:

readahead: move synchronous readahead call out of splice loop
readahead: pass real splice size

They can be appended to readahead-convert-splice-invocations.patch in -mm tree.


diffstat:

 fs/splice.c |   21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

Regards,
Fengguang
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] readahead: pass real splice size

2007-06-11 Thread Fengguang Wu
Pass real splice size to page_cache_readahead_ondemand().

The splice code works in chunks of 16 pages internally.
The readahead code should be told of the overall splice size, instead of
the internal chunk size. Otherwize bad things may happen. Imagine some
17-page random splice reads. The code before this patch will result in
two readahead calls: readahead(16); readahead(1); That leads to one
16-page I/O and one 32-page I/O: one extra I/O and 31 readahead miss pages.

Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 fs/splice.c |   12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

--- linux-2.6.22-rc3-mm1.orig/fs/splice.c
+++ linux-2.6.22-rc3-mm1/fs/splice.c
@@ -267,7 +267,7 @@ __generic_file_splice_read(struct file *
   unsigned int flags)
 {
struct address_space *mapping = in->f_mapping;
-   unsigned int loff, nr_pages;
+   unsigned int loff, nr_pages, req_pages;
struct page *pages[PIPE_BUFFERS];
struct partial_page partial[PIPE_BUFFERS];
struct page *page;
@@ -284,10 +284,8 @@ __generic_file_splice_read(struct file *
 
index = *ppos >> PAGE_CACHE_SHIFT;
loff = *ppos & ~PAGE_CACHE_MASK;
-   nr_pages = (len + loff + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-
-   if (nr_pages > PIPE_BUFFERS)
-   nr_pages = PIPE_BUFFERS;
+   req_pages = (len + loff + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+   nr_pages = min(req_pages, (unsigned)PIPE_BUFFERS);
 
/*
 * Now fill in the holes:
@@ -307,7 +305,7 @@ __generic_file_splice_read(struct file *
 */
if (spd.nr_pages < nr_pages)
page_cache_readahead_ondemand(mapping, >f_ra, in,
-   NULL, index, nr_pages - spd.nr_pages);
+   NULL, index, req_pages - spd.nr_pages);
 
while (spd.nr_pages < nr_pages) {
/*
@@ -363,7 +361,7 @@ __generic_file_splice_read(struct file *
 
if (PageReadahead(page))
page_cache_readahead_ondemand(mapping, >f_ra, in,
-   page, index, nr_pages - page_nr);
+   page, index, req_pages - page_nr);
 
/*
 * If the page isn't uptodate, we may need to start io on it

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[KJ PATCH] Replacing memcpy(dest,src,PAGE_SIZE) with copy_page(dest,src) in arch/i386/kernel/machine_kexec.c

2007-06-11 Thread Shani Moideen
Hi,
Replacing memcpy(dest,src,PAGE_SIZE) with copy_page(dest,src) in 
arch/i386/kernel/machine_kexec.c.

Signed-off-by: Shani Moideen <[EMAIL PROTECTED]>


diff --git a/arch/i386/kernel/machine_kexec.c b/arch/i386/kernel/machine_kexec.c
index 91966ba..ce79a44 100644
--- a/arch/i386/kernel/machine_kexec.c
+++ b/arch/i386/kernel/machine_kexec.c
@@ -110,7 +110,7 @@ NORET_TYPE void machine_kexec(struct kimage *image)
local_irq_disable();
 
control_page = page_address(image->control_code_page);
-   memcpy(control_page, relocate_kernel, PAGE_SIZE);
+   copy_page(control_page, relocate_kernel);
 
page_list[PA_CONTROL_PAGE] = __pa(control_page);
page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;

-- 
Shani 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/9] readahead: introduce PG_readahead

2007-06-11 Thread Fengguang Wu
Hi Rusty,

On Tue, Jun 12, 2007 at 11:04:54AM +1000, Rusty Russell wrote:
> On Thu, 2007-05-17 at 06:47 +0800, Fengguang Wu wrote:
> > plain text document attachment (mm-introduce-pg_readahead.patch)
> > Introduce a new page flag: PG_readahead.
> > 
> > It acts as a look-ahead mark, which tells the page reader:
> > Hey, it's time to invoke the read-ahead logic.  For the sake of I/O 
> > pipelining,
> > don't wait until it runs out of cached pages!
> 
> Hi Fengguang!
> 
>   I've been reading your patches, and I have some (possibly dumb!)
> questions.
> 
> For this patch: why set a bit in the page, rather than keep a value
> inside the "struct file_ra_state"?

Good question. I should have documented it in the patch ;)

The short answer:
there can be multiple read streams per fd, i.e.  interleaved reads.

file_ra_state can not easily track all of the streams. Solaris zfs
does it by managing a list in struct dnode. While PG_readahead plus
the context based readahead(http://lkml.org/lkml/2006/11/15/54) makes
another solution. The two solutions are comparable in complexity.
The context based readahead is a bit more feature rich: it is sensible
to memory pressure, and supports an unlimited number of streams. Note
that random reads can be regarded as a huge number of one-shot streams,
which can interfere with zfs's list-based readahead states badly.

Fengguang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/3] Enable Aggressive Link Power management for AHCI controllers.

2007-06-11 Thread Dagfinn Ilmari Mannsåker
Arjan van de Ven <[EMAIL PROTECTED]> writes:

> Henrique de Moraes Holschuh wrote:
>> On Mon, 11 Jun 2007, Kristen Carlson Accardi wrote:
>>> Setting Effect
>>> --
>>> min_power   ALPM is enabled, and link set to enter  lowest
>>> power state (SLUMBER) when idle
>>> Hot plug not allowed.
>>>
>>> max_performance ALPM is disabled, Hot Plug is allowed
>>>
>>> medium_powerALPM is enabled, and link set to enter
>>> second lowest power state (PARTIAL) when
>>> idle.  Hot plug not allowed.
>> Just some food for thought:
>> If you split it into a enable/disable (0/1) attribute, and a level
>> attribute
>
> on/off doesn't really make sense if the question is "do you favor power
> or do you favor performance"...

How about just making it a numeric scale with 0 meaning no power saving
and then some fixed number of levels (e.g 0-9)?

-- 
ilmari
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] AHCI Link Power Management

2007-06-11 Thread Arjan van de Ven

Tejun Heo wrote:

Kristen Carlson Accardi wrote:

Hi,
This series of patches enables Aggressive Link Power Management for AHCI 
devices, as documented in the AHCI spec.  On my laptop (a Lenovo X60), this

saves me a full watt of power.  On other systems, reported power savings
range from .5-1.5 Watts.  It has been tested by the kind folks at #powertop
with similar results.  Please give it a try and let me know what you think.


I'm not sure about this.  We need better PM framework to support
powersaving in other controllers and some ahcis don't save much when
only link power management is used, 


do you have data to support this? The data we have from this patch is 
that it saves typically a Watt of power (depends on the machine of 
course, but the range is 0.5W to 1.5W). If you want to also have an 
even more agressive thing where you want to start disabling the entire 
controller... I don't see how this is in conflict with saving power on 
the link level by "just" enabling a hardware feature 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] AHCI Link Power Management

2007-06-11 Thread Jeff Garzik

Tejun Heo wrote:

Kristen Carlson Accardi wrote:

Hi,
This series of patches enables Aggressive Link Power Management for AHCI 
devices, as documented in the AHCI spec.  On my laptop (a Lenovo X60), this

saves me a full watt of power.  On other systems, reported power savings
range from .5-1.5 Watts.  It has been tested by the kind folks at #powertop
with similar results.  Please give it a try and let me know what you think.


I'm not sure about this.  We need better PM framework to support
powersaving in other controllers and some ahcis don't save much when
only link power management is used, they need to be turned off


A better PM framework would definitely be nice :)



completely && I don't think scsi sysfs is the right place to export this
interface.


That's about the only place we have right now.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 3/6] core changes in CFS

2007-06-11 Thread Balbir Singh
Srivatsa Vaddagiri wrote:
> +#define entity_is_task(se)   1

Could you add some comments as to what this means? Should be it boolean instead
(true)


>  /*
> - * Enqueue a task into the rb-tree:
> + * Enqueue a entity into the rb-tree:

Enqueue an entity

> -static void limit_wait_runtime(struct rq *rq, struct task_struct *p)
> +static void limit_wait_runtime(struct lrq *lrq, struct sched_entity *p)

p is a general convention for tasks in the code, could we use something
different -- may be "e"?

> 
>  static s64 div64_s(s64 divident, unsigned long divisor)
> @@ -183,49 +219,51 @@
>   * Update the current task's runtime statistics. Skip current tasks that
>   * are not in our scheduling class.
>   */
> -static inline void update_curr(struct rq *rq, u64 now)
> +static inline void update_curr(struct lrq *lrq, u64 now)
>  {
> - unsigned long load = rq->lrq.raw_weighted_load;
> + unsigned long load = lrq->raw_weighted_load;
>   u64 delta_exec, delta_fair, delta_mine;
> - struct task_struct *curr = rq->curr;
> + struct sched_entity *curr = lrq_curr(lrq);

How about curr_entity?

> + struct rq *rq = lrq_rq(lrq);
> + struct task_struct *curtask = rq->curr;
> 
> - if (curr->sched_class != _sched_class || curr == rq->idle || !load)
> + if (!curr || curtask == rq->idle || !load)

Can !curr ever be true? shoudn't we look into the sched_class of the task
that the entity belongs to?


-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][AGPGART]intel-agp: save whole config space in suspend/resume

2007-06-11 Thread Dave Jones
On Tue, Jun 12, 2007 at 09:54:59AM +0800, Wang Zhenyu wrote:

 > It looks that config space save/restore for intel-agp still has problem
 > that might affect some chip models. Andreas Mohr's work on his i815 
 > suspend/resume
 > support showed that we need to save extra bits in config space on this old 
 > chip type. 
 > His patch is in -mm tree, 
 > http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc4/2.6.22-rc4-mm2/broken-out/working-3d-dri-intel-agpko-resume-for-i815-chip.patch
 > 
 > And recently James Bottomley also reported that save/restore whole 256 bytes 
 > config 
 > space for gfx device can fix his s3 issue on Fujitsu P7120 i915. 
 > http://lists.freedesktop.org/archives/xorg/2007-June/025346.html
 > 
 > So here's my suggested patch for save whole 256 bytes config space for 
 > intel-agp,
 > which could fix these issues. I tested it on my 965GM, that s3 works fine 
 > with X. 

I'd feel much safer if we only did this on chipsets where we know we 
have to do it.   Doing this for *every* Intel chipset ever made _will_
bite us.  There are some early chipsets (440BX era iirc) that would just
hang the box when you tried to read from various write-only registers.

But even on the chipsets where we _do_ need to save/restore something in
the upper part of the config space, surely it'd be a lot safer
to just save/restore what we need to rather than risk all sorts
of mayhem by writing to bits that may trigger hardware events.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Can we get rid of zImage this time?

2007-06-11 Thread Bruce Ashfield

On 6/11/07, H. Peter Anvin <[EMAIL PROTECTED]> wrote:

I brought this up a few years ago, and had it shot down, because of a
few poorly substantiated claims of zImage-only machines; those claims
really need to be debugged since they might indicate A20-related failures.


These beasts are still alive and kicking. I boot a handful of
arm and ppc boards on a frequent basis that are zImage
only.  The kicker is I don't have the bootloader source for
most of them, so changing to a different image format is
tough at best.

Not a vote one way or the other, just some observations
from my day to day.

Bruce




Anyway...

Can we please kill zImage?  In addition to be completely useless for
modern kernels, it causes unnecessary complexity in boot loaders.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




--
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH 2/6] task's cpu information needs to be always correct

2007-06-11 Thread Balbir Singh
Srivatsa Vaddagiri wrote:
> We rely very much on task_cpu(p) to be correct at all times, so that we
> can correctly find the runqueue from which the task has to be removed or
> added to.
> 
> There is however one place in the scheduler where this assumption of
> task_cpu(p) being correct is broken. This patch fixes that piece of
> code.
> 
> (Thanks to Balbir Singh for pointing this out to me)
> 
> Signed-off-by : Srivatsa Vaddagiri <[EMAIL PROTECTED]>
> 

Acked-by: Balbir Singh <[EMAIL PROTECTED]>

> ---
>  kernel/sched.c |8 +---
>  1 files changed, 5 insertions(+), 3 deletions(-)
> 
> Index: current/kernel/sched.c
> ===
> --- current.orig/kernel/sched.c   2007-06-09 15:07:17.0 +0530
> +++ current/kernel/sched.c2007-06-09 15:07:32.0 +0530
> @@ -4624,7 +4624,7 @@
>  static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
>  {
>   struct rq *rq_dest, *rq_src;
> - int ret = 0;
> + int ret = 0, on_rq;
> 
>   if (unlikely(cpu_is_offline(dest_cpu)))
>   return ret;
> @@ -4640,9 +4640,11 @@
>   if (!cpu_isset(dest_cpu, p->cpus_allowed))
>   goto out;
> 
> - set_task_cpu(p, dest_cpu);
> - if (p->se.on_rq) {
> + on_rq = p->se.on_rq;
> + if (on_rq)
>   deactivate_task(rq_src, p, 0);
> + set_task_cpu(p, dest_cpu);
> + if (on_rq) {
>   activate_task(rq_dest, p, 0);
>   check_preempt_curr(rq_dest, p);
>   }


-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: why does the macro "ZERO_PAGE" take an argument?

2007-06-11 Thread Nick Piggin

H. Peter Anvin wrote:

William Lee Irwin III wrote:


Robert P. J. Day wrote:


although it's not clear where in the source tree are the invocations
that would actually make a difference to a MIPS system, which is why
i've CC'ed ralf on this.  i'm sure he can clear this up. :-)


On Thu, Jun 07, 2007 at 10:32:29AM -0700, H. Peter Anvin wrote:


x86 could also benefit from coloured zeropages.  In fact, I thought it
already had them (K8 wants as many as 8.)


How would one demonstrate the beneficial effect of such?



Dean Gaudet at Transmeta did some benchmarking using SPEC.  If I recall
his numbers correctly (this is from memory, mind you) on Transmeta
Efficeon, which has 2-way virtual cache tagging with hardware recovery,
zeropage coloring was a 1.5% performance improvement.


I'm surprised that the benchmark made such use of zero pages so as to
be worthwhile. I'm sitting on a patch which removes the zero page from
the page fault fastpath completely which I'd like to try out in -mm...

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [RFC][PATCH 1/6] Introduce struct sched_entity and struct lrq

2007-06-11 Thread Balbir Singh
Srivatsa Vaddagiri wrote:
> This patch introduces two new structures:
> 
> struct sched_entity
> stores essential attributes/execution-history used by CFS core
> to drive fairness between 'schedulable entities' (tasks, users etc)
> 
> struct lrq
> runqueue used to hold ready-to-run entities
> 
> These new structures are formed by grouping together existing fields in
> existing structures (task_struct and rq) and hence represents rework
> with zero functionality change.
> 
> Signed-off-by : Srivatsa Vaddagiri <[EMAIL PROTECTED]>
[snip]

> 
> +/* CFS-related fields in a runqueue */
> +struct lrq {
> + unsigned long raw_weighted_load;
> + #define CPU_LOAD_IDX_MAX 5
> + unsigned long cpu_load[CPU_LOAD_IDX_MAX];
> + unsigned long nr_load_updates;
> +
> + u64 fair_clock, delta_fair_clock;
> + u64 exec_clock, delta_exec_clock;
> + s64 wait_runtime;
> + unsigned long wait_runtime_overruns, wait_runtime_underruns;
> +
> + struct rb_root tasks_timeline;
> + struct rb_node *rb_leftmost;
> + struct rb_node *rb_load_balance_curr;
> +};
> +

Shouldn't the rq->lock move into lrq?

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] driver core: multithreaded device matching with dependency

2007-06-11 Thread Huang, Ying
> > Index: linux-2.6.22-rc4/include/linux/device.h
> > ===
> > --- linux-2.6.22-rc4.orig/include/linux/device.h2007-06-08
> > 18:26:11.0 +0800
> > +++ linux-2.6.22-rc4/include/linux/device.h 2007-06-09
> > 19:41:03.0 +0800
> > @@ -413,12 +413,17 @@
> > struct klist_node   knode_driver;
> > struct klist_node   knode_bus;
> > struct device   *parent;
> > +   struct device   *depend;
> 
> Is this core-internal now?  If not, add a comment what it does and how
> it has to be written and read.
> 
> I'm curious how this has to be used; perhaps it will turn out that the
> whole thing is over-engineered this way.  I.e. there might be easier
> options like:
>   - Let subsystems which don't want multithreaded probing at all
> disable multithreaded probing globally by a susbsystem flag.
> (Or rather, let subsystems which want multithreaded probing
> enable it globally by a subsystem flag. IOW make singlethreaded
> probing the default.  Multithreaded probing has to be tested
> thoroughly for each subsystem.)
>   - Let subsystems which want only partially multithreaded probing
> serialize the necessary regions on their own by subsystem-internal
> mutexes.
> 
> However, this really depends on what the actual cost of dev->depend is.

Maybe it is a little over-engineered. I think it may be used for some
random dependency between devices. Such as in some embedded system, the
power of USB controller may be controlled by some GPIO, so the USB
controller driver should access the GPIO driver.

Now, I made the signalthreaded probing the default. But the probing of
different subsystems are still parallelized.

Thanks for your comments. There are many problems in my patch. But for
now, the feasibility may be the biggest problem. The patch can be seen
as the helper for my description.

The summary of dependency rule is as follow:

1. A flag as follow is added to struct device.
 unsigned multithreaded_probe:1;
If it is set, the devices sub-tree with this device as root will be
probed parallelized with other devices sub-tree. If it is clear, the
device belongs to the devices sub-tree of the parent of the device, and
should be probed serially with other devices in the devices sub-tree.
The root devices (without parent) is assumed to have this flag set (in
spite of the actual value). With this flag, the PCI subsystem can be
probed serially, while IEEE 1394 subsystem can be probed parallelized
among different device node, but serially among different unit in a
node.

2. A field as follow is added to struct device.
 struct device *depend;
The device will not start probing unless the probing of the device
pointed by depend has been finished. This is used to control some random
dependency between devices.

3. The probing of the device will not be started, unless the probing of
the parent of the device has been finished.

I revised my patch to reflect the rule above, so resend it.

 drivers/base/base.h|5 +
 drivers/base/core.c|   35 +--
 drivers/base/dd.c  |  230 ++---
 include/linux/device.h |7 +
 init/main.c|3 
 5 files changed, 259 insertions(+), 21 deletions(-)
Index: linux-2.6.22-rc4/init/main.c
===
--- linux-2.6.22-rc4.orig/init/main.c   2007-06-11 11:24:27.0 +0800
+++ linux-2.6.22-rc4/init/main.c2007-06-11 11:25:07.0 +0800
@@ -652,6 +652,7 @@
initcall_t *call;
int count = preempt_count();
 
+   device_match_freeze();
for (call = __initcall_start; call < __initcall_end; call++) {
ktime_t t0, t1, delta;
char *msg = NULL;
@@ -703,6 +704,8 @@
}
}
 
+   device_match_thaw(0);
+
/* Make sure there is no pending stuff from the initcall sequence */
flush_scheduled_work();
 }
Index: linux-2.6.22-rc4/drivers/base/dd.c
===
--- linux-2.6.22-rc4.orig/drivers/base/dd.c 2007-06-11 11:24:27.0 
+0800
+++ linux-2.6.22-rc4/drivers/base/dd.c  2007-06-11 21:51:06.0 +0800
@@ -25,6 +25,7 @@
 
 #define to_drv(node) container_of(node, struct device_driver, kobj.entry)
 
+static atomic_t device_match_freezed = ATOMIC_INIT(0);
 
 static void driver_bound(struct device *dev)
 {
@@ -220,6 +221,25 @@
return ret;
 }
 
+/* This function must be called with dev->sem held. */
+static inline int real_device_attach(struct device * dev)
+{
+   int ret = 0;
+
+   if (dev->driver) {
+   ret = device_bind_driver(dev);
+   if (ret == 0)
+   ret = 1;
+   else {
+   dev->driver = NULL;
+   ret = 0;
+   }
+   } else {
+   ret = 

Re: [Intel-IOMMU 02/10] Library routine for pre-allocat pool handling

2007-06-11 Thread Siddha, Suresh B
On Mon, Jun 11, 2007 at 06:55:46PM -0700, Arjan van de Ven wrote:
> Andrew Morton wrote:
> >On Mon, 11 Jun 2007 18:10:40 -0700 Arjan van de Ven 
> ><[EMAIL PROTECTED]> wrote:
> >
> >>Andrew Morton wrote:
> Where as resource pool is exactly opposite of mempool, where each 
> time it looks for an object in the pool and if it exist then we 
> return that object else we try to get the memory for OS while 
> scheduling the work to grow the pool objects. In fact, the  work
> is schedule to grow the pool when the low threshold point is hit.
> >>>I realise all that.  But I'd have thought that the mempool approach is
> >>>actually better: use the page allocator and only deplete your reserve 
> >>>pool
> >>>when the page allocator fails.
> >>the problem with that is that if anything downstream from the iommu 
> >>layer ALSO needs memory, we've now eaten up the last free page and 
> >>things go splat.
> >
> >If that happens, we still have the mempool reserve to fall back to.
> 
> we do, except that we just ate the memory the downstream code would 
> use and get ... so THAT can't get any.

Then this problem can happen, irrespective of the changes we are
reviewing in this patch set. Is n't it?

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/3] Enable Aggressive Link Power management for AHCI controllers.

2007-06-11 Thread Jeff Garzik

Dagfinn Ilmari Mannsåker wrote:

Arjan van de Ven <[EMAIL PROTECTED]> writes:


Henrique de Moraes Holschuh wrote:

On Mon, 11 Jun 2007, Kristen Carlson Accardi wrote:

Setting Effect
--
min_power   ALPM is enabled, and link set to enter  lowest
power state (SLUMBER) when idle
Hot plug not allowed.

max_performance ALPM is disabled, Hot Plug is allowed

medium_powerALPM is enabled, and link set to enter
second lowest power state (PARTIAL) when
idle.  Hot plug not allowed.

Just some food for thought:
If you split it into a enable/disable (0/1) attribute, and a level
attribute

on/off doesn't really make sense if the question is "do you favor power
or do you favor performance"...


How about just making it a numeric scale with 0 meaning no power saving
and then some fixed number of levels (e.g 0-9)?


The original proposal seems far more intuitive than these alternatives.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] AHCI Link Power Management

2007-06-11 Thread Tejun Heo
Kristen Carlson Accardi wrote:
> Hi,
> This series of patches enables Aggressive Link Power Management for AHCI 
> devices, as documented in the AHCI spec.  On my laptop (a Lenovo X60), this
> saves me a full watt of power.  On other systems, reported power savings
> range from .5-1.5 Watts.  It has been tested by the kind folks at #powertop
> with similar results.  Please give it a try and let me know what you think.

I'm not sure about this.  We need better PM framework to support
powersaving in other controllers and some ahcis don't save much when
only link power management is used, they need to be turned off
completely && I don't think scsi sysfs is the right place to export this
interface.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][AGPGART]intel-agp: save whole config space in suspend/resume

2007-06-11 Thread Wang Zhenyu

Dave,

It looks that config space save/restore for intel-agp still has problem
that might affect some chip models. Andreas Mohr's work on his i815 
suspend/resume
support showed that we need to save extra bits in config space on this old chip 
type. 
His patch is in -mm tree, 
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc4/2.6.22-rc4-mm2/broken-out/working-3d-dri-intel-agpko-resume-for-i815-chip.patch

And recently James Bottomley also reported that save/restore whole 256 bytes 
config 
space for gfx device can fix his s3 issue on Fujitsu P7120 i915. 
http://lists.freedesktop.org/archives/xorg/2007-June/025346.html

So here's my suggested patch for save whole 256 bytes config space for 
intel-agp,
which could fix these issues. I tested it on my 965GM, that s3 works fine with 
X. 

This patch bases on latest kernel git and my intel-agp patch set on agpgart.git 
tip.
http://git.kernel.org/?p=linux/kernel/git/davej/agpgart.git;a=summary


Signed-off-by: Wang Zhenyu <[EMAIL PROTECTED]>
---
 drivers/char/agp/intel-agp.c |   44 +-
 1 files changed, 43 insertions(+), 1 deletions(-)

diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c
index d383168..bc18241 100644
--- a/drivers/char/agp/intel-agp.c
+++ b/drivers/char/agp/intel-agp.c
@@ -110,6 +110,7 @@ static struct _intel_private {
 * popup and for the GTT.
 */
int gtt_entries;/* i830+ */
+   u32 extra_saved_config[48]; /* suspend/resume */
 } intel_private;
 
 static int intel_i810_fetch_size(void)
@@ -1974,9 +1975,33 @@ static void __devexit agp_intel_remove(struct pci_dev 
*pdev)
 }
 
 #ifdef CONFIG_PM
+static int agp_intel_suspend (struct pci_dev *pdev, pm_message_t state)
+{
+   int i;
+
+   pci_save_state(pdev);
+
+   if (intel_private.pcidev) {
+   pci_save_state(intel_private.pcidev);
+
+   for (i = 0; i < 48; i++)
+   pci_read_config_dword(intel_private.pcidev, i*4+64,
+   _private.extra_saved_config[i]);
+
+   pci_set_power_state(intel_private.pcidev,
+   pci_choose_state(intel_private.pcidev, state));
+   }
+
+   pci_set_power_state(pdev, pci_choose_state(pdev, state));
+
+   return 0;
+}
+
 static int agp_intel_resume(struct pci_dev *pdev)
 {
struct agp_bridge_data *bridge = pci_get_drvdata(pdev);
+   int i = 0;
+   u32 val;
 
pci_restore_state(pdev);
 
@@ -1984,8 +2009,24 @@ static int agp_intel_resume(struct pci_dev *pdev)
 * as host bridge (00:00) resumes before graphics device (02:00),
 * then our access to its pci space can work right.
 */
-   if (intel_private.pcidev)
+   if (intel_private.pcidev) {
+   pci_set_power_state(intel_private.pcidev, PCI_D0);
pci_restore_state(intel_private.pcidev);
+   for (i = 0; i < 48; i++) {
+   pci_read_config_dword(intel_private.pcidev, i*4+64,
+   );
+   if (val != intel_private.extra_saved_config[i]) {
+   printk(KERN_DEBUG "intel-agp: Writing back"
+   "config space at offset %x"
+   " (was %x, writing %x)\n",
+   i*4+64, val,
+   intel_private.extra_saved_config[i]);
+   pci_write_config_dword(intel_private.pcidev,
+   i*4+64,
+   intel_private.extra_saved_config[i]);
+   }
+   }
+   }
 
if (bridge->driver == _generic_driver)
intel_configure();
@@ -2062,6 +2103,7 @@ static struct pci_driver agp_intel_pci_driver = {
.probe  = agp_intel_probe,
.remove = __devexit_p(agp_intel_remove),
 #ifdef CONFIG_PM
+   .suspend= agp_intel_suspend,
.resume = agp_intel_resume,
 #endif
 };
-- 
1.4.4.4
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Intel-IOMMU 02/10] Library routine for pre-allocat pool handling

2007-06-11 Thread Arjan van de Ven

Andrew Morton wrote:

On Mon, 11 Jun 2007 18:10:40 -0700 Arjan van de Ven <[EMAIL PROTECTED]> wrote:


Andrew Morton wrote:
Where as resource pool is exactly opposite of mempool, where each 
time it looks for an object in the pool and if it exist then we 
return that object else we try to get the memory for OS while 
scheduling the work to grow the pool objects. In fact, the  work

is schedule to grow the pool when the low threshold point is hit.

I realise all that.  But I'd have thought that the mempool approach is
actually better: use the page allocator and only deplete your reserve pool
when the page allocator fails.
the problem with that is that if anything downstream from the iommu 
layer ALSO needs memory, we've now eaten up the last free page and 
things go splat.


If that happens, we still have the mempool reserve to fall back to.


we do, except that we just ate the memory the downstream code would 
use and get ... so THAT can't get any.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] fs/buffer.c:1821 in 2.6.22-rc4-mm2

2007-06-11 Thread Nick Piggin

Andrew Morton wrote:

On Sun, 10 Jun 2007 17:57:14 +0200 Eric Sesterhenn / Snakebyte <[EMAIL 
PROTECTED]> wrote:



hi,

i got the following BUG while running the syscalls.sh
from ltp-full-20070531 on an ext3 partition, it is easily reproducible
for me

[  476.338068] [ cut here ]
[  476.338223] kernel BUG at fs/buffer.c:1821!
[  476.338324] invalid opcode:  [#1]
[  476.338423] PREEMPT 
[  476.338665] Modules linked in:

[  476.338833] CPU:0
[  476.338836] EIP:0060:[]Not tainted VLI
[  476.338840] EFLAGS: 00010202   (2.6.22-rc4-mm2 #1)
[  476.339206] EIP is at __block_prepare_write+0x64/0x410
[  476.339311] eax: 0001   ebx: c136fbb8   ecx: c07faf28   edx:
0001
[  476.339417] esi: c1dc9040   edi: c32d2dfc   ebp: c3733db8   esp:
c3733d50
[  476.339584] ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
[  476.339690] Process vmsplice01 (pid: 7680, ti=c3733000 task=c351ed60
task.ti=c3733000)
[  476.339796] Stack: c3733d70 c0143e76 c1a0eab0 0046 
c2509d64 0cd8 c136fbb8 
[  476.340675]c32d2dfc 0296 c02313b6 c1086088 0050
c02313b6 c1dc9040 c2509d50 
[  476.341491]c1dc9054 c3733dc4 c02313e9 c3733dbc c015728d
c32d2f0c  c136fbb8 
[  476.342371] Call Trace:

[  476.342565]  [] block_write_begin+0x83/0xf0
[  476.342804]  [] ext3_write_begin+0xc8/0x1c0
[  476.342987]  [] pagecache_write_begin+0x4f/0x150
[  476.343243]  [] pipe_to_file+0x9b/0x170
[  476.343418]  [] __splice_from_pipe+0x70/0x260
[  476.343654]  [] splice_from_pipe+0x48/0x70
[  476.343828]  [] generic_file_splice_write+0x88/0x130
[  476.344066]  [] do_splice_from+0xb7/0xc0
[  476.344240]  [] sys_splice+0x1a1/0x230
[  476.344474]  [] sysenter_past_esp+0x5f/0x99
[  476.344656]  [] 0xe410
[  476.344882]  ===
[  476.344984] INFO: lockdep is turned off.
[  476.345084] Code: 00 0f 97 c2 e8 ee 2f 22 00 85 c0 74 04 0f 0b eb fe
31 d2 b8 28 af 7f c0 81 7d 08 00 10 00 00 0f 97 c2 e8 d0 2f 22 00 85 c0
74 04 <0f> 0b eb fe 8b 55 08 39 55 b0 0f 97 c0 0f b6 d0 b8 0c af 7f c0 
[  476.350365] EIP: [] __block_prepare_write+0x64/0x410 SS:ESP

0068:c3733d50



Yep, vmsplice01 is not supported on -mm kernels ;)

Nick has a protofix but I don't think it's been tested yet.


Yeah, sorry I didn't catch that after you merged :P
This should be the correct bugfix attached -- it is just a typo.

--
SUSE Labs, Novell Inc.
Index: linux-2.6/fs/splice.c
===
--- linux-2.6.orig/fs/splice.c
+++ linux-2.6/fs/splice.c
@@ -570,7 +570,7 @@ static int pipe_to_file(struct pipe_inod
if (this_len + offset > PAGE_CACHE_SIZE)
this_len = PAGE_CACHE_SIZE - offset;
 
-   ret = pagecache_write_begin(file, mapping, sd->pos, sd->len,
+   ret = pagecache_write_begin(file, mapping, sd->pos, this_len,
AOP_FLAG_UNINTERRUPTIBLE, , );
if (unlikely(ret))
goto out;
@@ -583,11 +583,12 @@ static int pipe_to_file(struct pipe_inod
char *dst = kmap_atomic(page, KM_USER1);
 
memcpy(dst + offset, src + buf->offset, this_len);
+   flush_dcache_page(page);
kunmap_atomic(dst, KM_USER1);
buf->ops->unmap(pipe, buf, src);
}
 
-   ret = pagecache_write_end(file, mapping, sd->pos, sd->len, sd->len, 
page, fsdata);
+   ret = pagecache_write_end(file, mapping, sd->pos, this_len, this_len, 
page, fsdata);
 
 out:
 


Re: mm: memory/cpu hotplug section mismatch.

2007-06-11 Thread Yasunori Goto
> > 
> > If CONFIG_MEMORY_HOTPLUG=n __meminit == __init, and if
> > CONFIG_HOTPLUG_CPU=n __cpuinit == __init. However, with one set and the
> > other disabled, you end up with a reference between __init and a regular
> > non-init function.
> 
> My plan is to define dedicated sections for both __devinit and __meminit.
> Then we can apply the checks no matter the definition of CONFIG_HOTPLUG*

I prefer defining "__nodeinit" for __cpuinit and __meminit case to
__devinit.   __devinit is used many devices like I/O, and it is
useful for many desktop users. But, cpu/memory hotpluggable box
is very rare. And it should be in init section for many people.

This kind of issue is caused by initialization of pgdat/zone.
I think __nodeinit is enough and desirable.

Bye.

-- 
Yasunori Goto 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Intel-IOMMU 02/10] Library routine for pre-allocat pool handling

2007-06-11 Thread Andrew Morton
On Mon, 11 Jun 2007 18:10:40 -0700 Arjan van de Ven <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> >> Where as resource pool is exactly opposite of mempool, where each 
> >> time it looks for an object in the pool and if it exist then we 
> >> return that object else we try to get the memory for OS while 
> >> scheduling the work to grow the pool objects. In fact, the  work
> >> is schedule to grow the pool when the low threshold point is hit.
> > 
> > I realise all that.  But I'd have thought that the mempool approach is
> > actually better: use the page allocator and only deplete your reserve pool
> > when the page allocator fails.
> 
> the problem with that is that if anything downstream from the iommu 
> layer ALSO needs memory, we've now eaten up the last free page and 
> things go splat.

If that happens, we still have the mempool reserve to fall back to.

I don't see why it is better to consume the reserves before going to the
page allocator instead of holding them, err, in reserve.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Intel-IOMMU 02/10] Library routine for pre-allocat pool handling

2007-06-11 Thread Christoph Lameter
On Mon, 11 Jun 2007, Arjan van de Ven wrote:

> the problem with that is that if anything downstream from the iommu layer ALSO
> needs memory, we've now eaten up the last free page and things go splat.

Hmmm... We need something like a reservation system right? Higher levels 
in a atomic context could register their future needs. Then we can avoid 
overallocating in the iommu layer.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/3] Enable Aggressive Link Power management for AHCI controllers.

2007-06-11 Thread Arjan van de Ven

Henrique de Moraes Holschuh wrote:

On Mon, 11 Jun 2007, Kristen Carlson Accardi wrote:

Setting Effect
--
min_power	ALPM is enabled, and link set to enter 
		lowest power state (SLUMBER) when idle

Hot plug not allowed.

max_performance ALPM is disabled, Hot Plug is allowed

medium_powerALPM is enabled, and link set to enter
second lowest power state (PARTIAL) when
idle.  Hot plug not allowed.


Just some food for thought:

If you split it into a enable/disable (0/1) attribute, and a level attribute


on/off doesn't really make sense if the question is "do you favor 
power or do you favor performance"...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext2 on flash memory

2007-06-11 Thread Bernd Eckenfels
In article <[EMAIL PROTECTED]> you wrote:
> All of the posts fail to address the question here: what is the
> correct file system, or does one exist yet, for wear leveling flash
> storage.  JFFS2 and logfs are nice for MTD, but for better flash
> memories that are likely to be used in the future like solid state
> hard disks, what is the answer?

FAT - you can stick it into Windows Boxes on the road.

Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Intel-IOMMU 02/10] Library routine for pre-allocat pool handling

2007-06-11 Thread Arjan van de Ven

Andrew Morton wrote:
Where as resource pool is exactly opposite of mempool, where each 
time it looks for an object in the pool and if it exist then we 
return that object else we try to get the memory for OS while 
scheduling the work to grow the pool objects. In fact, the  work

is schedule to grow the pool when the low threshold point is hit.


I realise all that.  But I'd have thought that the mempool approach is
actually better: use the page allocator and only deplete your reserve pool
when the page allocator fails.


the problem with that is that if anything downstream from the iommu 
layer ALSO needs memory, we've now eaten up the last free page and 
things go splat.


in terms of deadlock avoidance... I wonder if we need something 
similar to the swap token; once a process dips into the emergency 
pool, it becomes the only one that gets to use this pool, so that it's 
entire chain of allocations will succeed, rather than each process 
only getting halfway through...


But yeah it's minute details and you can argue either way is the right 
approach.


You can even argue for the old highmem.c approach; go into half the 
pool before going to the VM, then to kmalloc() and if that fails dip 
into the second half of the pool.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/3] Enable Aggressive Link Power management for AHCI controllers.

2007-06-11 Thread Henrique de Moraes Holschuh
On Mon, 11 Jun 2007, Kristen Carlson Accardi wrote:
> Setting   Effect
> --
> min_power ALPM is enabled, and link set to enter 
>   lowest power state (SLUMBER) when idle
>   Hot plug not allowed.
> 
> max_performance   ALPM is disabled, Hot Plug is allowed
> 
> medium_power  ALPM is enabled, and link set to enter
>   second lowest power state (PARTIAL) when
>   idle.  Hot plug not allowed.

Just some food for thought:

If you split it into a enable/disable (0/1) attribute, and a level attribute
(some sort of integer scale, or "min", "medium", "max" if you must use
strings.  You could use four levels to mimic the PCI device power state, for
example), it might make its usage more generic, and easier from userspace,
as it decouples the need to turn it on/off from the need to know which level
the user wants it set to when you turn it on.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] Documentation/HOWTO translated into Japanese

2007-06-11 Thread IKEDA Munehiro

Tsugikazu Shibata wrote:

Here is a patch of Japanese translated HOWTO.
Thank you very much for lots of comment and suggestions.>all
Bellow is what I have done:

- Added Hiroyuki's comment with my own modifications as top notes.
- Character encoding is UTF-8
- The file is put in Documentation/ja_JP
- The patch is attached, not inlined because of MUA will sometimes
  force change the character encoding. Sorry.

I think We may need to discuss with JF team about much better JF
headers (written in Japanese) later.


And next, this is a patch-form Japanese translation of 
stable_api_nonsense.txt.

Current temporary state is same as Tsugikazu Shibata's made.

My former post was translation of 2.6.19 but this is of 2.6.22-rc4.
I modified translation along with the change at 2.6.20, which is the
latest version.
(And corrected typo of "volatile".  Thanks Alistair!)


--
IKEDA, Munehiro

>From 99e9f85036418b0f30825f4635bb224b592be57c Mon Sep 17 00:00:00 2001
From: IKEDA, Munehiro <[EMAIL PROTECTED]>
Date: Tue, 12 Jun 2007 09:46:04 +0900
Subject: [PATCH] add Japanese translated stable_api_nonsense.txt

This is first trial to merge translated documentations into the mainline.
The directory where the file is should be discussed.
---
 Documentation/ja_JP/stable_api_nonsense.txt |  263 +++
 1 files changed, 263 insertions(+), 0 deletions(-)

diff --git a/Documentation/ja_JP/stable_api_nonsense.txt 
b/Documentation/ja_JP/stable_api_nonsense.txt
new file mode 100644
index 000..0b130e8
--- /dev/null
+++ b/Documentation/ja_JP/stable_api_nonsense.txt
@@ -0,0 +1,263 @@
+NOTE:
+This is a Japanese translated version of
+"Documentation/stable_api_nonsense.txt".
+This one is maintained by
+IKEDA, Munehiro <[EMAIL PROTECTED]>
+and JF Project team .
+If you find difference with original file or problem in translation,
+please contact the maintainer of this file or JF project.
+
+Please also note that purpose of this file is easier to read for non
+English natives and not to be intended to fork. So, if you have any
+comments or updates of this file, please try to update
+Original(English) file at first.
+
+==
+これは、
+linux-2.6.22-rc4/Documentation/stable_api_nonsense.txt の和訳
+です。
+翻訳団体: JF プロジェクト < http://www.linux.or.jp/JF/ >
+翻訳日 : 2007/06/11
+原著作者: Greg Kroah-Hartman < greg at kroah dot com >
+翻訳者 : 池田 宗広 < m-ikeda at ds dot jp dot nec dot com >
+校正者 : Masanori Kobayashi さん < zap03216 at nifty dot ne dot jp >
+  Seiji Kaneko さん < skaneko at a2 dot mbn dot or dot jp >
+==
+
+
+
+Linux カーネルのドライバインターフェース
+(あなたの質問すべてに対する回答とその他諸々)
+
+Greg Kroah-Hartman 
+
+
+この文書は、なぜ Linux 
ではバイナリカーネルインターフェースが定義
+されていないのか、またはなぜ不変のカーネルインターフェースを持たな
+いのか、ということを説明するために書かれた。ここでの話題は「カーネ
+ル内部の」インターフェースについてであり、ユーザー空間とのインター
+フェースではないことを理解してほしい。カーネルとユーザー空間とのイ
+ンターフェースとはアプリケーションプログラムが使用するものであり、
+つまりシステムコールのインターフェースがこれに当たる。これは今まで
+長きに渡り、かつ今後も「まさしく」不変である。私は確か
 0.9 か何か
+より前のカーネルを使ってビルドした古いプログラムを持っているが、そ
+れは最新の 2.6 
カーネルでもきちんと動作する。ユーザー空間とのイン
+ターフェースは、ユーザーとアプリケーションプログラマが不変性を信頼
+してよいものの一つである。
+
+
+要旨
+
+
+あなたは不変のカーネルインターフェースが必要だと考えているかもしれ
+ないが、実際のところはそうではない。あなたは必要としているものが分
+かっていない。あなたが必要としているものは安定して動作するドライバ
+であり、それはドライバがメインのカーネルツリーに含まれる場合のみ得
+ることができる。ドライバがメインのカーネルツリーに含まれていると、
+他にも多くの良いことがある。それは、Linux 
をより強固で、安定な、成
+熟したオペレーティングシステムにすることができるということだ。これ
+こそ、そもそもあなたが Linux を使う理由のはずだ。
+
+
+はじめに
+
+
+カーネル内部のインターフェース変更を心配しなければならないドライバ
+を書きたいなどというのは、変わり者だ

Re: divorce CONFIG_X86_PAE from CONFIG_HIGHMEM64G

2007-06-11 Thread Adrian Bunk
On Mon, Jun 11, 2007 at 05:00:26PM -0700, William Lee Irwin III wrote:
> On Thu, Jun 07, 2007 at 07:35:51PM -0700, William Lee Irwin III wrote:
> >> +PAE is required for NX support, and furthermore enables
> >> +larger swapspace support for non-overcommit purposes. It
> >> +has the cost of more pagetable lookup overhead, and also
> >> +consumes more pagetable space per process.
> 
> On Tue, Jun 12, 2007 at 01:52:35AM +0200, Adrian Bunk wrote:
> > It's not specific to this help text, but I start becoming a bit picky 
> > about this issues:
> > If you understand this help text after reading it, you don't need a help 
> > text for this option...  ;-)
> > What is "NX support"?
> > What are "non-overcommit purposes"?
> > What is "pagetable lookup overhead"?
> > And if in doubt, should I say Y or N?
> > "System administrator who knows which hardware components he put into 
> > the computer and which filesystems his data is on" might be a good 
> > description for the average kconfig user, and these are the people who 
> > should understand this help text.
> 
> I would like to have some place to explain issues such as those, but
> there are as of yet no designated places for tutorial-level information.
>  
> If such a place were provided, I would provide storybook commentary to
> explain all those. Similarly actually holds for kernel function docbook.

There's no 4 line limit for Kconfig entries. If it takes a 10 line 
paragraph for a short explanation what the NX bit is and when enabling 
NX support makes sense (because it will be used) then that's completely 
appropriate here. The same goes for the other parts.

The kconfig help should give anyone running "make oldconfig" a rough 
understanding what this option is about and a clear understanding what 
to answer for this option ("If unsure, say Y/N." is a standard text we 
use for the latter).

> -- wli

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PCI: also read revision ID for sparc64, ppc, read class at the same time

2007-06-11 Thread Michael Ellerman
On Sun, 2007-06-10 at 14:17 +0200, Segher Boessenkool wrote:
> >> +  dev->revision = get_int_prop(node, "revision-id", 0);
> >
> > It's not clear to me in the spec if nodes are required to have the
> > "revision-id" property.
> 
> It is required for every PCI node.

Yep. I was reading the wrong spec :)

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


[PATCH 001 of 2] md: Fix two raid10 bugs.

2007-06-11 Thread NeilBrown

1/ When resyncing a degraded raid10 which has more than 2 copies of each block,
  garbage can get synced on top of good data.

2/ We round the wrong way in part of the device size calculation, which
  can cause confusion.

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid10.c |6 ++
 1 file changed, 6 insertions(+)

diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c
--- .prev/drivers/md/raid10.c   2007-06-12 10:19:04.0 +1000
+++ ./drivers/md/raid10.c   2007-06-12 10:20:31.0 +1000
@@ -1866,6 +1866,7 @@ static sector_t sync_request(mddev_t *md
int d = r10_bio->devs[i].devnum;
bio = r10_bio->devs[i].bio;
bio->bi_end_io = NULL;
+   clear_bit(BIO_UPTODATE, >bi_flags);
if (conf->mirrors[d].rdev == NULL ||
test_bit(Faulty, >mirrors[d].rdev->flags))
continue;
@@ -2036,6 +2037,11 @@ static int run(mddev_t *mddev)
/* 'size' is now the number of chunks in the array */
/* calculate "used chunks per device" in 'stride' */
stride = size * conf->copies;
+
+   /* We need to round up when dividing by raid_disks to
+* get the stride size.
+*/
+   stride += conf->raid_disks - 1;
sector_div(stride, conf->raid_disks);
mddev->size = stride  << (conf->chunk_shift-1);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 000 of 2] md: Introduction - bugfixes for md/raid{1,10}

2007-06-11 Thread NeilBrown

Following are a couple of bugfixes for raid10 and raid1.  They only
affect fairly uncommon configurations (more than 2 mirrors) and can
cause data corruption.  Thay are suitable for 2.6.22 and 21-stable.

Thanks,
NeilBrown


 [PATCH 001 of 2] md: Fix two raid10 bugs.
 [PATCH 002 of 2] md: Fix bug in error handling during raid1 repair.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 002 of 2] md: Fix bug in error handling during raid1 repair.

2007-06-11 Thread NeilBrown

From: Mike Accetta <[EMAIL PROTECTED]>

If raid1/repair (which reads all block and fixes any differences
it finds) hits a read error, it doesn't reset the bio for writing
before writing correct data back, so the read error isn't fixed,
and the device probably gets a zero-length write which it might
complain about.

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid1.c |   21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
--- .prev/drivers/md/raid1.c2007-06-12 10:48:57.0 +1000
+++ ./drivers/md/raid1.c2007-06-12 10:49:05.0 +1000
@@ -1240,17 +1240,24 @@ static void sync_request_write(mddev_t *
}
r1_bio->read_disk = primary;
for (i=0; iraid_disks; i++)
-   if (r1_bio->bios[i]->bi_end_io == end_sync_read &&
-   test_bit(BIO_UPTODATE, _bio->bios[i]->bi_flags)) 
{
+   if (r1_bio->bios[i]->bi_end_io == end_sync_read) {
int j;
int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
struct bio *pbio = r1_bio->bios[primary];
struct bio *sbio = r1_bio->bios[i];
-   for (j = vcnt; j-- ; )
-   if 
(memcmp(page_address(pbio->bi_io_vec[j].bv_page),
-  
page_address(sbio->bi_io_vec[j].bv_page),
-  PAGE_SIZE))
-   break;
+
+   if (test_bit(BIO_UPTODATE, >bi_flags)) {
+   for (j = vcnt; j-- ; ) {
+   struct page *p, *s;
+   p = pbio->bi_io_vec[j].bv_page;
+   s = sbio->bi_io_vec[j].bv_page;
+   if (memcmp(page_address(p),
+  page_address(s),
+  PAGE_SIZE))
+   break;
+   }
+   } else
+   j = 0;
if (j >= 0)
mddev->resync_mismatches += 
r1_bio->sectors;
if (j < 0 || test_bit(MD_RECOVERY_CHECK, 
>recovery)) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] [PATCH] ACPI: Move timer broadcast and pmtimer access before C3 arbiter shutdown

2007-06-11 Thread Chris Wright
* Andrew Morton ([EMAIL PROTECTED]) wrote:
> hm, this needs a bit of help to get it to work against Len's current tree.

Here's some help, compile tested only.  Udo/Thomas, was this found to
be root cause of a real bug?  I didn't want this to get lost if it's
still meant to be relevant for -stable.

thanks,
-chris
--

Subject: ACPI: Move timer broadcast and pmtimer access before C3 arbiter 
shutdown

From: Udo A. Steinberg <[EMAIL PROTECTED]>

The chip set doc for IHC4 says:

1.In general, software should not attempt any non-posted accesses during
arbiter disable except to the ICH4's power management registers. This
implies that interrupt handlers for any unmasked hardware interrupts and
SMI/NMI should check ARB_DIS status before reading from ICH devices.

So it's not a good idea to access ICH devices after arbiter shut down. 

Signed-off-by: Udo A. Steinberg <[EMAIL PROTECTED]>
Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
[chrisw: rebase against Len's changes in -mm]
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>

---

 drivers/acpi/processor_idle.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 2c6a3cb..15db3e8 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -978,6 +978,11 @@ static int acpi_idle_enter_c3(struct cpuidle_device *dev,
return 0;
}
 
+   /* Get start time (ticks) */
+   t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
+   acpi_state_timer_broadcast(pr, cx, 1);
+   acpi_idle_do_entry(cx);
+
/* disable bus master */
if (pr->flags.bm_check) {
spin_lock(_lock);
@@ -995,10 +1000,6 @@ static int acpi_idle_enter_c3(struct cpuidle_device *dev,
ACPI_FLUSH_CPU_CACHE();
}
 
-   /* Get start time (ticks) */
-   t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
-   acpi_state_timer_broadcast(pr, cx, 1);
-   acpi_idle_do_entry(cx);
t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
 
if (pr->flags.bm_check) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/9] readahead: introduce PG_readahead

2007-06-11 Thread Rusty Russell
On Thu, 2007-05-17 at 06:47 +0800, Fengguang Wu wrote:
> plain text document attachment (mm-introduce-pg_readahead.patch)
> Introduce a new page flag: PG_readahead.
> 
> It acts as a look-ahead mark, which tells the page reader:
> Hey, it's time to invoke the read-ahead logic.  For the sake of I/O 
> pipelining,
> don't wait until it runs out of cached pages!

Hi Fengguang!

I've been reading your patches, and I have some (possibly dumb!)
questions.

For this patch: why set a bit in the page, rather than keep a value
inside the "struct file_ra_state"?

Thanks,
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] Documentation/HOWTO translated into Japanese

2007-06-11 Thread Tsugikazu Shibata
On Sun, 10 Jun 2007 23:07:59 -0700, gregkh wrote:
> On Sun, Jun 10, 2007 at 07:56:52PM +0200, Matthias Schniedermeyer wrote:
> >  Greg KH wrote:
> > > On Sun, Jun 10, 2007 at 02:24:51PM +0200, Jesper Juhl wrote:
> > >>  Since the common language of most kernel contributors is english I
> > >>  personally feel that we should stick to just that one language in the
> > >>  tree and then perhaps keep translations on a website somewhere. So the
> > >>  authoritative docs stay in the tree, in english, so that as many
> > >>  contributors as possible can read and update them. It would then be a
> > >>  seperate project to generate translations and keep them updated
> > >>  according to what's in the tree.  Perhaps we could get the kernel.org
> > >>  people to create an official space for that and then place a pointer
> > >>  to that site in Documentation/ somewhere.
> > > No, I think the translated files should be in the tree proper, we have
> > > the space :)
> > 
> >  Frankly i don't see the difference between this and the annual reoccurring 
> >  "why can't the kernel messages be localized" discussion.
> >  (Which is a little overdue, but maybe this replaces it this time.)
> 
> It is _vastly_ different.
> 
> >  I could see the point in ONE "HOWTO" file per language to get people 
> >  started, but everything else is a pointless exercise.
> >  A developer/bug-reporter has to be able to express him-/herself in English 
> >  and understand English, otherwise you can not accomplish very much.
> 
> Yes, but this file, and the stable-api-nonsense.txt files are there to
> help people understand both the kernel's philosophy, as well as
> encourage them to help contribute.
> 
> That is totally different from internationalizing the internal kernel
> messages (which, btw, some people are working on...)  That I would not
> agree to as it's just too hard to keep up with and would be pointless in
> a way.
> 
> So I really do want to see a translated copy of the HOWTO,
> stable-api-nonsense.txt, and possibly a few other files in the main
> kernel tree (SubmittingPatches, CodingStyle, and SubmittingDrivers might
> all be good canidates for this.)  These files change relativly
> infrequently (the HOWTO file has had only 7 changes in 1 and 1/2 years,
> and they were very minor ones) and should be easy for the translators to
> keep up with.
> 
> So, Tsugikazu, care to resend this file as a patch that I can apply to
> the Documentation directory of the kernel tree?  I think it would be
> good to have there.

Here is a patch of Japanese translated HOWTO.
Thank you very much for lots of comment and suggestions.>all
Bellow is what I have done:

- Added Hiroyuki's comment with my own modifications as top notes.
- Character encoding is UTF-8
- The file is put in Documentation/ja_JP
- The patch is attached, not inlined because of MUA will sometimes
  force change the character encoding. Sorry.

I think We may need to discuss with JF team about much better JF
headers (written in Japanese) later.

Thanks,
Tsugikazu Shibata
Signed-off-by: Tsugikazu Shibata <[EMAIL PROTECTED]>

diff -uNpr linux-2.6.22-rc4.orig/Documentation/ja_JP/HOWTO 
linux-2.6.22-rc4/Documentation/ja_JP/HOWTO 
--- linux-2.6.22-rc4.orig/Documentation/ja_JP/HOWTO 1970-01-01 
09:00:00.0 +0900
+++ linux-2.6.22-rc4/Documentation/ja_JP/HOWTO  2007-06-12 09:35:50.0 
+0900
@@ -0,0 +1,650 @@
+NOTE:
+This is Japanese translated version of "Documentation/HOWTO".
+This one is maintained by Tsugikazu Shibata <[EMAIL PROTECTED]>
+and JF Project team .
+If you find difference with original file or problem in translation,
+please contact maintainer of this file or JF project.
+
+Please also note that purpose of this file is easier to read for non
+English natives and not to be intended to fork. So, if you have any
+comments or updates of this file, please try to update Original(English)
+file at first.
+
+Last Updated: 2007/06/04
+==
+これは、
Signed-off-by: Tsugikazu Shibata <[EMAIL PROTECTED]>

+linux-2.6.21/Documentation/HOWTO
+の和訳です。
+
+翻訳団体: JF プロジェクト < http://www.linux.or.jp/JF/ >
+翻訳日: 2007/06/04
+翻訳者: Tsugikazu Shibata 
+校正者: 松倉さん 
+ 小林 雅典さん (Masanori Kobayasi) 
+ 武井伸光さん、
+ かねこさん (Seiji Kaneko) 
+ 野口さん (Kenji Noguchi) 
+ 河内さん (Takayoshi Kochi) 
+ 岩本さん (iwamoto) 
+==
+
+Linux カーネル開発のやり方
+---
+
+これは上のトピック( Linux カーネル開発のやり方)の重要な事柄を網羅した
+ドキュメントです。ここには Linux カーネル開発者になるための方法と
+Linux カーネル開発コミュニティと共に活動するやり方を学ぶ方法が含まれて
+います。カーネルプログラミングに関する技術的な項目に関することは何も含
+めないようにしていますが、カーネル開発者となるための正しい方向に向かう
+手助けになります。
+
+もし、このドキュメントのどこかが古くなっていた場合には、このドキュメン
+トの最後にリストしたメンテナーにパッチを送ってください。
+
+はじめに
+-
+
+あなたは Linux カーネルの開発者になる方法を学びたいのでしょうか? そ
+れともあなたは上司から「このデバイスの Linux ドライバを書くように」と
+言われているのでしょうか? 
+この文書の目的は、あなたが踏むべき手順と、コミュニティと一緒にうまく働
+くヒントを書き下すことで、あなたが知るべき全てのことを教えることです。
+また、このコミュニティがなぜ今うまくまわっているのかという理由の一部も
+説明しようと試みています。
+
+カーネルは 

Re: [PATCHSET 2.6.22-rc4] sysfs: fix race conditions

2007-06-11 Thread Greg KH
On Mon, Jun 11, 2007 at 03:15:39PM +0900, Tejun Heo wrote:
> Andrew Morton wrote:
> >> This patchset contains three minimal backports of fixes in -mm.  With
> >> all patches in the patchset and sysfs-races.patch applied, kernel
> >> survived ~20 hours of stress test without any problem.
> > 
> > So these are being proposed for 2.6.22?
> 
> Yeap.
> 
> > I do wonder about Rafael's bug which he bisected down to
> > gregkh-driver-sysfs-use-singly-linked-list-for-sysfs_dirent-tree.patch.
> > 
> > If that won't be a problem in this patchset then I spose it's probably best
> > to go ahead with a 2.6.22 merge, but it's more a Greg thing than a me
> > thing.
> 
> I'm currently debugging that and it's irrelevant to these fixes.  The
> bug is introduced far after the fixes.
> 
> > I don't have a tree to merge these patches into, unless I drop all the
> > patches which are in Greg's tree.
> > 
> > Greg, can I leave it up to you to decide how we are to proceed here?

Ok, I'll test them out, and if look sane pass them to Linus.

> I can rebase all sysfs patches in -mm on top of linus#master + these
> fixes if necessary.

Yeah, I'll need that if these look good enough, otherwise my tree will
stop applying :)

Give me some time tonight to do this...

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] UML: fix missing non-blocking I/O, now DEBUG_SHIRQ works

2007-06-11 Thread Jeff Dike
On Tue, Jun 12, 2007 at 01:39:25AM +0300, Eduard-Gabriel Munteanu wrote:
> The cast isn't done right. Doing "fd = (long) dev_id;" doesn't help, 
> since you pass fd to mconsole_get_request() as is. And 
> mconsole_get_request() expects an integer:
> int mconsole_get_request(int fd, struct mc_request *req)

gcc will trim a long to an integer correctly.  You can pass a long to
an integer without casting.

> This will generate at least a warning on arches where sizeof(int) != 
> sizeof(long). 

No it won't.  UML builds without warnings here on x86_64.

> And by the way, AFAIK, GCC has the habit of breaking compatibility with 
> some userspace apps when a new GCC major version is released. And, 
> AFAIK, they're moving towards standards, so relying on GCC's 
> "guarantees" may backfire.

The GCC guarantee I'm talking about it LP64 - I'm highly confident
that's not changing any time soon.

> >>You're calling glibc functions
> >>with that fd as a parameter. On some arches, compiling will issue
> >>warnings or simply fail. 
> >
> >Which ones?
> >
> 
> An example is sparc64:
> quote from
> >>http://lxr.linux.no/source/include/asm-sparc64/types.h#L49

What warnings does this produce?  These are the same sizes as x86_64
(and every other 64-bit arch that Linux runs on, I bet), where this
code compiles without warning.

> Really, a kmalloc() isn't such a big deal, it only happens once and 
> we're not in interrupt context.

It's not the runtime cost - it's the extra code.

> One the other hand, ensuring safety and 
> portability on other arches is something that needs to be taken care
> of.

You haven't demonstrated any safety or portability problems yet.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Intel-IOMMU 02/10] Library routine for pre-allocat pool handling

2007-06-11 Thread Christoph Lameter
On Mon, 11 Jun 2007, Keshavamurthy, Anil S wrote:

> slab allocators don;t reserve the memory, in other words this memory 
> can be consumed by VM under memory pressure which we don;t want in
> IOMMU case.

So mempools

> Nope,they both are exactly opposite. 
> mempool with GFP_ATOMIC, first tries to get memory from OS and
> if that fails, it looks for the object in the pool and returns.

How does the difference matter? In both cases you get the memory you want.

> Where as resource pool is exactly opposite of mempool, where each 
> time it looks for an object in the pool and if it exist then we 
> return that object else we try to get the memory for OS while 
> scheduling the work to grow the pool objects. In fact, the  work
> is schedule to grow the pool when the low threshold point is hit.

Grow the mempool when the low level point is hit? Or equip mempools with 
the functionality that you want?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 001 of 2] Fix read/truncate race.

2007-06-11 Thread Andrew Morton
On Tue, 12 Jun 2007 10:16:22 +1000 Neil Brown <[EMAIL PROTECTED]> wrote:

> > please consider incorporating scripts/checkpatch.pl into your patch
> > preparation toolchain.
> 
> Done... Any reason that it isn't executable (chmod +x)?

It is executable now (Linus did a chmod).

However I think it was wrong to do this.  Because lots of people will lose
that X bit (say, people who download and use patch-2.6.22.gz).  But _some_
people will have their X bit set, so they will go write scripts which
assume X permissions, only to find that those scripts break on other
people's systems.

So to force the lowest-common-denominator, we should have left X unset.  Oh
well.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext2 on flash memory

2007-06-11 Thread Kevin Bowling

On 6/11/07, Kevin K <[EMAIL PROTECTED]> wrote:


On Jun 11, 2007, at 5:13 AM, DervishD wrote:

> Hi all :)
>
> I was wondering: is there any reason not to use ext2 on an USB
> pendrive? Really my question is not only about USB pendrives, but any
> device whose storage is flash based. Let's assume that the device
> has a
> good quality flash memory with wear leveling and the like...
>
> Thanks a lot in advance :)
>
> Raúl Núñez de Arenas Coronado
>


My opinion is that, unless the flash is really cheap, or it is being
written to excessively, that it probably doesn't matter too much.
With the growth in size of flash, just how long do you think it will
continue to be used before you go to something larger?

A 256MB flash of a few years ago has been supplanted in many cases by
today's 2-4gb memory.

One suggestion with ext2 might be to mount it with the noatime
option, so it doesn't update the last access time for directories and
files.  Otherwise, you are doing a write even when you only plan to
read a file.


All of the posts fail to address the question here: what is the
correct file system, or does one exist yet, for wear leveling flash
storage.  JFFS2 and logfs are nice for MTD, but for better flash
memories that are likely to be used in the future like solid state
hard disks, what is the answer?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Intel-IOMMU 02/10] Library routine for pre-allocat pool handling

2007-06-11 Thread Andrew Morton
On Mon, 11 Jun 2007 16:52:08 -0700 "Keshavamurthy, Anil S" <[EMAIL PROTECTED]> 
wrote:

> On Mon, Jun 11, 2007 at 02:14:49PM -0700, Andrew Morton wrote:
> > On Mon, 11 Jun 2007 13:44:42 -0700
> > "Keshavamurthy, Anil S" <[EMAIL PROTECTED]> wrote:
> > 
> > > In the first implementation of ours, we had used mempools api's to 
> > > allocate memory and we were told that mempools with GFP_ATOMIC is
> > > useless and hence in the second implementation we came up with
> > > resource pools ( which is preallocate pools) and again as I understand
> > > the argument is why create another when we have slab allocation which
> > > is similar to this resource pools.
> > 
> > Odd.  mempool with GFP_ATOMIC is basically equivalent to your
> > resource-pools, isn't it?: we'll try the slab allocator and if that failed,
> > fall back to the reserves.
> 
> slab allocators don;t reserve the memory, in other words this memory 
> can be consumed by VM under memory pressure which we don;t want in
> IOMMU case.
> 
> Nope,they both are exactly opposite. 
> mempool with GFP_ATOMIC, first tries to get memory from OS and
> if that fails, it looks for the object in the pool and returns.
> 
> Where as resource pool is exactly opposite of mempool, where each 
> time it looks for an object in the pool and if it exist then we 
> return that object else we try to get the memory for OS while 
> scheduling the work to grow the pool objects. In fact, the  work
> is schedule to grow the pool when the low threshold point is hit.

I realise all that.  But I'd have thought that the mempool approach is
actually better: use the page allocator and only deplete your reserve pool
when the page allocator fails.

The refill-the-pool-in-the-background feature sounds pretty worthless to
me.  On a uniprocessor machine (for example), the kernel thread may not get
scheduled for tens of milliseconds (easily), which is far, far more than is
needed for that reserve pool to become fully consumed.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


call for more SD versus CFS comparisons (was: Re: [ck] Mainline plans)

2007-06-11 Thread Miguel Figueiredo
Hi all,

some results based on massing_intr.c by Satoru, can be found on
http://people.redhat.com/mingo/cfs-scheduler/tools/massive_intr.c

Runned several times like:
$ massing_intr 5 2 >> results-kernel-5.2
$ massing_intr 300 300 >> results-kernel-300.300

To calculate average and standard deviation:

$ original-awk -f awkscript results-file

awkscript file included.
(for debian users: apt-get install original-awk)

Here's the data, values and facts:

kernel  run as  average stddev
==  ==  === ==
2.6.22-rc4-ck1  5 2 34  0
2.6.22-rc4-ck1  5 2 22  0
2.6.22-rc4-ck1  5 2 24.60.219
2.6.22-rc4-ck1  5 2 31.40.219
2.6.22-rc4-ck1  5 2 40  0

2.6.22-rc4-cfs-v16  5 2 36  0
2.6.22-rc4-cfs-v16  5 2 30  0
2.6.22-rc4-cfs-v16  5 2 27.60.219
2.6.22-rc4-cfs-v16  5 2 29.60.219
2.6.22-rc4-cfs-v16  5 2 42  0

2.6.22-rc4-cfs-v16  300 300 126.427 0.289
2.6.22-rc4-cfs-v16  300 300 125.35  0.275
2.6.22-rc4-cfs-v16  300 300 127,797 0,028
2.6.22-rc4-cfs-v16  300 300 125,367 0,028
2.6.22-rc4-cfs-v16  300 300 125,213 0,024

2.6.22-rc4-ck1  300 300 125.413 0,028
2.6.22-rc4-ck1  300 300 125,34  0,027
2.6.22-rc4-ck1  300 300 124,69  0,027
2.6.22-rc4-ck1  300 300 125,093 0,017
2.6.22-rc4-ck1  300 300 125,597 0,028

* "run as" it's the parameters passed to the program massive_intr.

All the files and data can be found on
http://www.debianpt.org/~elmig/pool/kernel/20070611/

Just one note, the first time this test was run:
-cfs-v16 i got this values: 44, 23, 19, 16, 42;
-2.6.21-debian: 29, 25, 22, 16, 32;
-ck1: 37 37 37 37 37

The machine was a Sempron64 3.0 GHz.


I know that other people, who read lkml, also tested the same way, it
would be nice if they also post their data.

-- 

Com os melhores cumprimentos/Best regards,

Miguel Figueiredo
http://www.DebianPT.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] [PATCH] INPUT: Sanitize PIT locking in pcspkr

2007-06-11 Thread Chris Wright
* Thomas Gleixner ([EMAIL PROTECTED]) wrote:
> The PC-speaker code has a quite creative method to serialize access to
> the PIT: It uses a local lock.
> 
> On i386 and x86_64 the access to the PIT is serialized by a lock in the
> architecture code. The separate locking in the PC-speaker code ignores
> the global lock and creates a nasty race between the PC-speaker and the
> PIT clock source / events code on SMP machines.
> 
> Use the global i8253_lock instead of the local i8253_beep_lock, when
> compiled for i386/x86_64.

Seems this one got lost?

thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


sata_nv adma issues

2007-06-11 Thread Charles Shannon Hendrix


My system has issues running adma mode with sata_nv.  I have an nforce4 
motherboard.


What is the current status of this problem?

Is there any information I can provide to help debug it?

I gave up trying various fixes about 6 months ago, and put "sata_nv.adma=0" on 
the kernel command line in LILO, and that fixed the problem.


However, recently I changed distributions and went back to 2.6.20 (kubuntu 
7.04).

This kernel says that sata_nv.adma=0 is an invalid kernel option.

I'm pretty puzzled by that, because it is supposed to disable adma mode in the 
sata_nv driver.


/proc/cmdline says:

root= ro sata_nv.adma=0 quiet splash

...so it seems I did give the parameter properly.

Any ideas appreciated.

Is there a better way to deal with this?

Also, one more: does it hurt to wait until the sata_nv driver fails a few 
times (at which point it stops bitching) and use the machine?  Once it
fails about 6 times, I no longer have any issues, and speed is still good 
enough to use until a real fix can be had.


Thanks.



--
shannon   | An Irishman is never drunk as long as he can hold onto
  | one blade of grass and not fall off the face of the earth.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext2 on flash memory

2007-06-11 Thread Arnd Bergmann
On Monday 11 June 2007, Tomasz Chmielewski wrote:
> Also, ext2 provides a nice feature other filesystems lack: xip. 
> Especially, if a pendrive is used as a rootfs for a small device.

Well, xip cannot work on NAND flash media, including USB pen drives,
because the data is not mapped into the addressable memory space,
so that is not really an interesting argument.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 001 of 2] Fix read/truncate race.

2007-06-11 Thread Neil Brown
On Thursday June 7, [EMAIL PROTECTED] wrote:
> On Fri, 8 Jun 2007 12:48:48 +1000 Neil Brown <[EMAIL PROTECTED]> wrote:
> 
> > The following patch will remove the extra seqlock except when we
> > actually need it and remove the extra arithmetic - but I haven't
> > tested it or reviewed it properly.  I can do that if you think it is
> > the right direction.
> 
> Yes, the optimisation is valid and looks useful.
> 
> >  ./mm/filemap.c |   34 --
> 
> It didn't apply - your tree seems different from mine.

Odd.  I had no other changes to that file in my tree.  I'll wait until
the next -mm(?).
It's just as well really, the patch was buggy - didn't even compile
and (as I said) totally untested.  I'll get you a tested patch after I
can rebase.

> 
> > +*
> > +* NOTE: This access of inode->i_size is not protected
> > +*  and if there is a concurrent update on a 32bit machine,
> > +*  it could return the wrong value.  This could only be a 
> > problem
> > +*  if i_size has actually changed to a smaller value before the
> > +*  page became uptodate, and at this point it still has a 
> > smaller
> > +*  value, but due to a race while reading, it appears 
> > unchanged.
> > +*  The chances of this happening are so small and the 
> > consequence
> > +*  sufficiently minor, that the cost of the seqlock seems
> > +*  not to be justified.
> 
> please consider incorporating scripts/checkpatch.pl into your patch
> preparation toolchain.

Done... Any reason that it isn't executable (chmod +x)?

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata passthru: support PIO multi commands

2007-06-11 Thread Jeff Garzik

Alan Cox wrote:

+   if (is_multi_taskfile(tf)) {
+   unsigned int multi_count = 1 << (cdb[1] >> 5);
+
+   /* compare the passed through multi_count
+* with the cached multi_count of libata
+*/
+   if (multi_count != dev->multi_count)
+   ata_dev_printk(dev, KERN_WARNING,
+  "invalid multi_count %u ignored\n",
+  multi_count);
+   }


What limits log spamming here ?


Intelligence of the user with privs?



Also shouldn't we error this
situation not proceed and hope that enough data was supplied not
to leave us stuck half way through a command having made a nasty
mess on disk ?


Is that English?  Can you be more specific and more clear?

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22-rc4-mm2: kvm compile breakage with X86_CMPXCHG64=n

2007-06-11 Thread Adrian Bunk
On Wed, Jun 06, 2007 at 10:03:13PM -0700, Andrew Morton wrote:
>...
> Changes since 2.6.22-rc4-mm1:
>...
>  git-kvm.patch
>...
>  git trees
>...

I'm getting the following compile error with CONFIG_X86_CMPXCHG64=n 
(with -Werror-implicit-function-declaration - otherwise it would be a 
link error):

<--  snip  -->

...
  CC [M]  drivers/kvm/mmu.o
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc4-mm2/drivers/kvm/mmu.c: In function 
‘set_shadow_pte’:
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc4-mm2/drivers/kvm/mmu.c:199: error: 
implicit declaration of function ‘set_64bit’
make[3]: *** [drivers/kvm/mmu.o] Error 1

<--  snip  -->

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 10/18] fs/logfs/inode.c

2007-06-11 Thread Jörn Engel
On Tue, 12 June 2007 01:51:34 +0200, Arnd Bergmann wrote:
> On Tuesday 12 June 2007, Jörn Engel wrote:
> > The initial storm of review comments has calmed down.  I get the
> > impression that people either lose interest or run out of simple things
> > to point out.  Maybe I should wait a bit before resending?
> 
> Your last series had a todo list of 11 items, plus more stuff
> found in the review. How about you submit logfs for inclusion in -mm
> when all easy stuff is gone from that list?

I was hoping to get more comments still.  But maybe your are right.
Half of your last comments looked more like test reports than code
review.

Jörn

-- 
Linux is more the core point of a concept that surrounds "open source"
which, in turn, is based on a false concept. This concept is that
people actually want to look at source code.
-- Rob Enderle
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext2 on flash memory

2007-06-11 Thread Kevin K


On Jun 11, 2007, at 5:13 AM, DervishD wrote:


Hi all :)

I was wondering: is there any reason not to use ext2 on an USB
pendrive? Really my question is not only about USB pendrives, but any
device whose storage is flash based. Let's assume that the device  
has a

good quality flash memory with wear leveling and the like...

Thanks a lot in advance :)

Raúl Núñez de Arenas Coronado




My opinion is that, unless the flash is really cheap, or it is being  
written to excessively, that it probably doesn't matter too much.   
With the growth in size of flash, just how long do you think it will  
continue to be used before you go to something larger?


A 256MB flash of a few years ago has been supplanted in many cases by  
today's 2-4gb memory.


One suggestion with ext2 might be to mount it with the noatime  
option, so it doesn't update the last access time for directories and  
files.  Otherwise, you are doing a write even when you only plan to  
read a file.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: divorce CONFIG_X86_PAE from CONFIG_HIGHMEM64G

2007-06-11 Thread William Lee Irwin III
On Thu, Jun 07, 2007 at 07:35:51PM -0700, William Lee Irwin III wrote:
>> +  PAE is required for NX support, and furthermore enables
>> +  larger swapspace support for non-overcommit purposes. It
>> +  has the cost of more pagetable lookup overhead, and also
>> +  consumes more pagetable space per process.

On Tue, Jun 12, 2007 at 01:52:35AM +0200, Adrian Bunk wrote:
> It's not specific to this help text, but I start becoming a bit picky 
> about this issues:
> If you understand this help text after reading it, you don't need a help 
> text for this option...  ;-)
> What is "NX support"?
> What are "non-overcommit purposes"?
> What is "pagetable lookup overhead"?
> And if in doubt, should I say Y or N?
> "System administrator who knows which hardware components he put into 
> the computer and which filesystems his data is on" might be a good 
> description for the average kconfig user, and these are the people who 
> should understand this help text.

I would like to have some place to explain issues such as those, but
there are as of yet no designated places for tutorial-level information.

If such a place were provided, I would provide storybook commentary to
explain all those. Similarly actually holds for kernel function docbook.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 10/18] fs/logfs/inode.c

2007-06-11 Thread Arnd Bergmann
On Tuesday 12 June 2007, Jörn Engel wrote:
> The initial storm of review comments has calmed down.  I get the
> impression that people either lose interest or run out of simple things
> to point out.  Maybe I should wait a bit before resending?

Your last series had a todo list of 11 items, plus more stuff
found in the review. How about you submit logfs for inclusion in -mm
when all easy stuff is gone from that list?

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   >