date:20050711

Re: PREEMPT_RT and I-PIPE: the numbers, part 4

2005-07-11 Thread Ingo Molnar

* Karim Yaghmour <[EMAIL PROTECTED]> wrote:

> With ping floods, as with other things, there is room for improvement, 
> but keep in mind that these are standard tests [...]

the problem is that ping -f isnt what it used to be. If you are using a 
recent distribution with an updated ping utility, these days the 
equivalent of 'ping -f' is something like:

ping -q -l 500 -A -s 10 

and even this variant (and the old variant) needs to be carefully 
validated for actual workload generated. Note that this is true for 
workloads against vanilla kernels too. (Also note that i did not claim 
that the flood ping workload you used is invalid - you have not 
published packet rates or interrupt rates that could help us judge how 
constant the workload was. I only said that according to my measurements 
it's quite unstable, and that you should double-check it.  Just running 
it and ACK-ing that the packet rates are stable and identical amongst 
all of these kernels would be enough to put this concern to rest.)

to see why i think there might be something wrong with the measurement, 
just look at the raw numbers:

 LMbench running times:
 ++---+---+---+---+---+
 | Kernel | plain | IRQ   | ping  | IRQ & | IRQ & |
 ||   | test  | flood | ping  |  hd   |
 ++===+===+===+===+===+
 | Vanilla-2.6.12 | 152 s | 150 s | 188 s | 185 s | 239 s |
 ++===+===+===+===+===+
 | with RT-V0.7.51-02 | 152 s | 153 s | 203 s | 201 s | 239 s |
 ++===+===+===+===+===+

note that both the 'IRQ' and 'IRQ & hd' test involves interrupts, and 
PREEMPT_RT shows overhead within statistical error, but only the 'flood 
ping' workload created a ~8% slowdown.

my own testing (whatever it's worth) shows that during flood-pings, the 
maximum overhead PREEMPT_RT caused was 4%. I.e. PREEMPT_RT used 4% more 
system-time than the vanilla UP kernel when the CPU was 99% dedicated to 
handling ping replies. But in your tests not the full CPU was dedicated 
to flood ping replies (of course). Your above numbers suggest that under 
the vanilla kernel 23% of CPU time was used up by flood pinging.  
(188/152 == +23.6%)

Under PREEMPT_RT, my tentative guesstimation would be that it should go 
from 23.6% to 24.8% - i.e. a 1.2% less CPU time for lmbench - which 
turns into roughly +1 seconds of lmbench wall-clock time slowdown. Not 
15 seconds, like your test suggests. So there's a more than an order of 
magnitude difference in the numbers, which i felt appropriate sharing :)

_And_ your own hd and stable-rate irq workloads suggest that PREEMPT_RT 
and vanilla are very close to each other. Let me repeat the table, with 
only the numbers included where there was no flood pinging going on:

 LMbench running times:
 ++---+---+---+---+---+
 | Kernel | plain | IRQ   |   |   | IRQ & |
 ||   | test  |   |   |  hd   |
 ++===+===+===+===+===+
 | Vanilla-2.6.12 | 152 s | 150 s |   |   | 239 s |
 ++===+===+===+===+===+
 | with RT-V0.7.51-02 | 152 s | 153 s |   |   | 239 s |
 ++===+===+===+===+===+
 | with Ipipe-0.7 | 149 s | 150 s |   |   | 236 s |
 ++===+===+===+===+===+

these numbers suggest that outside of ping-flooding all IRQ overhead 
results are within statistical error.

So why do your "ping flood" results show such difference? It really is 
just another type of interrupt workload and has nothing special in it.

> but keep in mind that these are standard tests used as-is by others 
> [...]

are you suggesting this is not really a benchmark but a way to test how 
well a particular system withholds against extreme external load?

> For one thing, the heavy fluctuation in ping packets may actually 
> induce a state in the monitored kernel which is more akin to the one 
> we want to measure than if we had a steady flow of packets.

so you can see ping packet flow fluctuations in your tests? Then you 
cannot use those results as any sort of benchmark metric.

under PREEMPT_RT, if you wish to tone down the effects of an interrupt 
source then all you have to do is something like:

 P=$(pidof "IRQ "$(grep eth1 /proc/interrupts | cut -d: -f1 | xargs echo))

 chrt -o -p 0 $P   # net irq thread
 renice -n 19 $P
 chrt -o -p 0 5# softirq-tx
 renice -n 19 5
 chrt -o -p 0 6# softirq-rx
 renice -n 19 6

and from this point on you should see zero lmbench overhead from flood 
pinging. Can vanilla or I-PIPE do that?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/major

2.6.12.2 tg3 driver doesn't ARP on 8021q 802.1q dot1q VLAN interfaces?

2005-07-11 Thread Marc Haber

Hi,

this morning, I tried upgrading a firewall from Debian woody to Debian
sarge, and in the course upgrading from a locally compiled vanilla 2.4.30
to a locally compiled vanilla 2.6.12.2. The box is a hp DL 140 which has
two tg3-based Interfaces on board, and a dual-Interface E1000 PCI card in
its PCI slot:

$ lspci
:00:00.0 Host bridge: ServerWorks GCNB-LE Host Bridge (rev 32)
:00:00.1 Host bridge: ServerWorks GCNB-LE Host Bridge
:00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev
27)
:00:0f.0 ISA bridge: ServerWorks CSB6 South Bridge (rev a0)
:00:0f.1 IDE interface: ServerWorks CSB6 RAID/IDE Controller (rev a0)
:00:0f.2 USB Controller: ServerWorks CSB6 OHCI USB Controller (rev 05)
:00:0f.3 Host bridge: ServerWorks GCLE-2 Host Bridge
:00:10.0 Host bridge: ServerWorks CIOB-E I/O Bridge with Gigabit
Ethernet (rev 12)
:00:10.2 Host bridge: ServerWorks CIOB-E I/O Bridge with Gigabit
Ethernet (rev 12)
:01:06.0 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet
Controller (Copper) (rev 01)
:01:06.1 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet
Controller (Copper) (rev 01)
:02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
Gigabit Ethernet (rev 02)
:02:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
Gigabit Ethernet (rev 02)
$

The box uses dot1q VLANs on all interfaces.

After rebooting, the VLANs on the Intel-based interfaces worked fine, while
the tg3-based interfaces didn't answer to tagged ARP requests. The untagged
VLAN on the tg3-based interfaces was fine as well. When tcpdumping the
subinterfaces, I saw all traffic on the network, and especially the
incoming ARP requests, but no ARP replies went out.

Rebooting back to 2.4 solved the issue immediately, all VLANs were OK as
well.

Conclusion: Since the e1000-based VLANs work fine even with 2.6, the dot1q
code seems to be ok. So, the issue most probably lies with the tg3 driver.

I am pretty well aware that there are issues with tg3 on _Debian_ kernels,
which is one of the reasons why I use locally compiled kernels built from
vanilla kernel.org sources.

Any hints?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 621 72739835

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] eventpoll : Suppress a short lived lock from struct file

2005-07-11 Thread Eric Dumazet


Hi Davide

I found in my tests that there is no need to have a f_ep_lock spinlock
attached to each struct file, using 8 bytes on 64bits platforms. The
lock is hold for a very short time period and can be global, with almost
no change in performance for applications using epoll, and a gain for
all others.

Thank you
Eric Dumazet

[PATCH] eventpoll : Suppress a short lived lock from struct file

Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>

--- linux-2.6.12/fs/eventpoll.c 2005-06-17 21:48:29.0 +0200
+++ linux-2.6.12-ed/fs/eventpoll.c  2005-07-11 08:56:07.0 +0200
@@ -179,6 +179,8 @@
spinlock_t lock;
 };
 
+static DEFINE_SPINLOCK(f_ep_lock);
+
 /*
  * This structure is stored inside the "private_data" member of the file
  * structure and rapresent the main data sructure for the eventpoll
@@ -426,7 +428,6 @@
 {
 
INIT_LIST_HEAD(&file->f_ep_links);
-   spin_lock_init(&file->f_ep_lock);
 }
 
 
@@ -967,9 +968,9 @@
goto eexit_2;
 
/* Add the current item to the list of active epoll hook for this file 
*/
-   spin_lock(&tfile->f_ep_lock);
+   spin_lock(&f_ep_lock);
list_add_tail(&epi->fllink, &tfile->f_ep_links);
-   spin_unlock(&tfile->f_ep_lock);
+   spin_unlock(&f_ep_lock);
 
/* We have to drop the new item inside our item list to keep track of 
it */
write_lock_irqsave(&ep->lock, flags);
@@ -1160,7 +1161,6 @@
 {
int error;
unsigned long flags;
-   struct file *file = epi->ffd.file;
 
/*
 * Removes poll wait queue hooks. We _have_ to do this without holding
@@ -1173,10 +1173,10 @@
ep_unregister_pollwait(ep, epi);
 
/* Remove the current item from the list of epoll hooks */
-   spin_lock(&file->f_ep_lock);
+   spin_lock(&f_ep_lock);
if (EP_IS_LINKED(&epi->fllink))
EP_LIST_DEL(&epi->fllink);
-   spin_unlock(&file->f_ep_lock);
+   spin_unlock(&f_ep_lock);
 
/* We need to acquire the write IRQ lock before calling ep_unlink() */
write_lock_irqsave(&ep->lock, flags);
--- linux-2.6.12/include/linux/fs.h 2005-06-17 21:48:29.0 +0200
+++ linux-2.6.12-ed/include/linux/fs.h  2005-07-11 08:58:02.0 +0200
@@ -597,7 +597,6 @@
 #ifdef CONFIG_EPOLL
/* Used by fs/eventpoll.c to link all the hooks to this file */
struct list_headf_ep_links;
-   spinlock_t  f_ep_lock;
 #endif /* #ifdef CONFIG_EPOLL */
struct address_space*f_mapping;
 };

Re: aio-stress throughput regressions from 2.6.11 to 2.6.12

2005-07-11 Thread Sébastien Dugué

On Fri, 2005-07-01 at 13:26 +0530, Suparna Bhattacharya wrote:
> Has anyone else noticed major throughput regressions for random
> reads/writes with aio-stress in 2.6.12 ?
> Or have there been any other FS/IO regressions lately ?
> 
> On one test system I see a degradation from around 17+ MB/s to 11MB/s
> for random O_DIRECT AIO (aio-stress -o3 testext3/rwfile5) from 2.6.11
> to 2.6.12. It doesn't seem filesystem specific. Not good :(
> 
> BTW, Chris/Ben, it doesn't look like the changes to aio.c have had an impact
> (I copied those back to my 2.6.11 tree and tried the runs with no effect)
> So it is something else ...
> 
> Ideas/thoughts/observations ?
> 
> Regards
> Suparna
> 

  I'm too seeing a regression, but between 2.6.10 and 2.6.12 and using
sysbench + MySQL (with POSIX AIO). The difference is roughly 8% slower
for 2.6.12. I'm currently trying to trace it.

  aio-stress shows no difference so it probably does not come from
kernel AIO.

  Sébastien.


-- 
--

  Sébastien DuguéBULL/FREC:B1-247
  phone: (+33) 476 29 77 70  Bullcom: 229-7770

  mailto:[EMAIL PROTECTED]

  Linux POSIX AIO: http://www.bullopensource.org/posix
  
--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

hwd accel framebuffer: Newbi question (sleep in sync)

2005-07-11 Thread Andrey Volkov

Hi all,

Anyone could explain me, could I or couldn't
use process sleep (i.e. wait_for..., sleep_on...)
in fb_info->fb_sync and/or in any hwd accelerated routines (i.e. blit,
cursor and rectfill)?
Code, which now in kernel, look terrible for me (counter based pooling).
Must it be so?

-- 
Regards
Andrey Volkov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] eventpoll : Suppress a short lived lock from struct file

2005-07-11 Thread Peter Zijlstra

On Mon, 2005-07-11 at 09:18 +0200, Eric Dumazet wrote:
> Hi Davide
> 
> I found in my tests that there is no need to have a f_ep_lock spinlock
> attached to each struct file, using 8 bytes on 64bits platforms. The
> lock is hold for a very short time period and can be global, with almost
> no change in performance for applications using epoll, and a gain for
> all others.
> 

Have you tested the impact of this change on big SMP/NUMA machines?
I hate to see an Altrix crashing to its knees :-)

-- 
Peter Zijlstra <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [30/48] Suspend2 2.1.9.8 for 2.6.12: 607-atomic-copy.patch

2005-07-11 Thread Nigel Cunningham

Hi.

On Mon, 2005-07-11 at 04:01, Pavel Machek wrote:
> Hi!
> 
> > --- 608-compression.patch-old/kernel/power/suspend2_core/compression.c  
> > 1970-01-01 10:00:00.0 +1000
> > +++
> > 608-compression.patch-new/kernel/power/suspend2_core/compression.c
>  ~
> 
> suspend2_core looks like an extremely bad name for a directory... And
> this is really plugin, not a core, no? Plus it would be nice to drop
> non-essential stuff for initial submit, so that it is not *that* big
> to review.

Suspend2_core was just to keep things nicely separated for the moment.
I've already shifted everything into kernel/power after Pekka's email.

Regarding non essential stuff, the compression and encryption parts are
really quite small. LZF Compression doubles the speed, and encryption is
considered important by people who care about security. It also helps
you see why the plugin stuff is useful. Of course plugin isn't really
the right name anymore - it's more of an internal api.

Regards,

Nigel
-- 
Evolution.
Enumerate the requirements.
Consider the interdependencies.
Calculate the probabilities.
Be amazed that people believe it happened. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: reiser4 vs politics: linux misses out again

2005-07-11 Thread Erik Hensema

Horst von Brand ([EMAIL PROTECTED]):
[on reiserfs4]
>> >>   and _can_ do things
>> >> no other FS can
>
> Mostly useless things...

Depends on your point of view. If you define things to be useful
only when POSIX requires them, then yes, reiser4 contains a lot
of useless stuff.
However, it's the 'beyond POSIX'-stuff what makes reiser4
interesting.

Multistream files have been useful on other OSses for years. They
might be useful on Linux too (Samba will surely like them).

The plugin architecture is very interesting. Sometimes you don't
need files to be in the POSIX namespace. Why would you want to
store a mysql database in files? Why not skip the overhead of the
VFS and POSIX rules and just store them in a more efficient way?
Maybe you can create a swapfile plugin. No need for a swapfile to
be in the POSIX namespace either.
It's just a fun thing to experiment with. It's not always
nescesary to let the demand create the means. Give programmers
some powerful tools and wait and see what wonderful things start
to evolve.

And yes, maybe in ten years time POSIX is just a subsystem in
Linux. Maybe commerciale Unix vendors will start following Linux
as 'the' standard instead of the other way around. Seems fun to
me :-)

I think this debate will mostly boil down to 'do we want to
experiment with beyond-POSIX filesystems in linux?'.
Clearly we don't _need_ it now. There simply are no users. But
will users come when reiser4 is merged? Nobody knows.

IMHO reiser4 should be merged and be marked as experimental. It
should probably _always_ be marked as experimental, because we
_know_ we're going to need some other -- more generic -- API when
we decide we like the features of reiser4. The reiser4 APIs
should probably be implemented as generic VFS APIs. But since we
don't know yet what features we're going to use, let reiser4 be
self contained. Maybe reiser5 or reiser6 will follow standard
VFS-beyond-POSIX rules, with ext4 and JFS2 also implementing them.

It's just too damn hard to predict the future. IMHO better just
merge reiser4 and let it be clear to everybody that reiser4 is an
experiment.
As long as it doesn't affect the rest of the kernel and it's
clear to the users that reiser4 is *not* going to be the
standard, it's fine with me.

-- 
Erik Hensema <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] MIPS: [PATCH resend] C99 initializers for hw_interrupt_type structures

2005-07-11 Thread Ralf Baechle

On Sun, Jul 10, 2005 at 11:47:24PM +0200, [EMAIL PROTECTED] wrote:

> Convert the initializers of hw_interrupt_type structures to C99 initializers.
> 
> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>

None of these exists anymore in the MIPS CVS, at least according to
Rusty's nice little script.

(And yeah, finally having time to cut patch for you, Andrew.)

  Ralf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel 2.6.12 + IO-APIC + uhci_hcd = Trouble

2005-07-11 Thread Michel Bouissou

Le Dimanche 10 Juillet 2005 20:50, Protasevich, Natalie a écrit :
>
> Michel,
> Symptoms that you describe resemble several IRQ problems with VIA
> chipset reported by others (but not quite...) Could you check on
> bugzilla #4843 please
> http://bugzilla.kernel.org/show_bug.cgi?id=4843 and see if the patch
> fixes your problem.

Hi Nathalie,

Thanks for your answer and pointer. Unfortunately it doesn't help.

The patch you mention won't apply on my kernel alone, I need first to apply 
the patch from 
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c434b7a6aedfe428ad17cd61b21b125a7b7a29ce
 , 
then your patch applies OK.

Unfortunately, it doesn't solve my issue. Booting this kernel still results in 
an interrupt issue with uhci_hcd.

After boot, "cat /proc/interrupts" shows:
   CPU0
  0: 188066IO-APIC-edge  timer
  1:308IO-APIC-edge  i8042
  2:  0  XT-PIC  cascade
  4:413IO-APIC-edge  serial
  7:  3IO-APIC-edge  parport0
 14:   1177IO-APIC-edge  ide4
 15:   1186IO-APIC-edge  ide5
 18:   1028   IO-APIC-level  eth0, eth1
 19:   8513   IO-APIC-level  ide0, ide1, ide2, ide3, ehci_hcd:usb4
 21: 10   IO-APIC-level  uhci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3
 22:  0   IO-APIC-level  VIA8233
NMI:  0
LOC: 187967
ERR:  0
MIS:  0

(The problem is with IRQ 21 for uhci_hcd)

(It is to note that without those patches, I didn't see any IRQ managed by 
"XT-PIC", all were managed by the IO-APIC...)


The errors I get in /var/log/messages are :

usbcore: registered new driver usbfs
usbcore: registered new driver hub
USB Universal Host Controller Interface driver v2.2
PCI: Via IRQ fixup for :00:10.0, from 10 to 5
uhci_hcd :00:10.0: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller
uhci_hcd :00:10.0: new USB bus registered, assigned bus number 1
uhci_hcd :00:10.0: irq 21, io base 0xcc00
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
PCI: Via IRQ fixup for :00:10.1, from 10 to 5
uhci_hcd :00:10.1: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (#2)
uhci_hcd :00:10.1: new USB bus registered, assigned bus number 2
uhci_hcd :00:10.1: irq 21, io base 0xd000
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
PCI: Via IRQ fixup for :00:10.2, from 10 to 5
uhci_hcd :00:10.2: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (#3)
usb 1-1: new low speed USB device using uhci_hcd and address 2
uhci_hcd :00:10.2: new USB bus registered, assigned bus number 3
uhci_hcd :00:10.2: irq 21, io base 0xd400
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
usbcore: registered new driver hiddev
input: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-:00:10.0-1
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.01:USB HID core driver
PCI: Via IRQ fixup for :00:10.3, from 11 to 3
ehci_hcd :00:10.3: VIA Technologies, Inc. USB 2.0
ehci_hcd :00:10.3: new USB bus registered, assigned bus number 4
ehci_hcd :00:10.3: irq 19, io mem 0xe3009000
ehci_hcd :00:10.3: USB 2.0 initialized, EHCI 1.00, driver 10 Dec 2004
usb 1-1: USB disconnect, address 2
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 6 ports detected
usb 1-1: new low speed USB device using uhci_hcd and address 3
input: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-:00:10.0-1
usb 3-2: new full speed USB device using uhci_hcd and address 3

irq 21: nobody cared!
 [__report_bad_irq+42/144] __report_bad_irq+0x2a/0x90
 [] __report_bad_irq+0x2a/0x90
 [handle_IRQ_event+57/112] handle_IRQ_event+0x39/0x70
 [] handle_IRQ_event+0x39/0x70
 [note_interrupt+163/208] note_interrupt+0xa3/0xd0
 [] note_interrupt+0xa3/0xd0
 [__do_IRQ+320/352] __do_IRQ+0x140/0x160
 [] __do_IRQ+0x140/0x160
 [do_IRQ+50/112] do_IRQ+0x32/0x70
 [] do_IRQ+0x32/0x70
 [common_interrupt+26/32] common_interrupt+0x1a/0x20
 [] common_interrupt+0x1a/0x20
 [path_lookup+197/448] path_lookup+0xc5/0x1c0
 [] path_lookup+0xc5/0x1c0
 [do_page_fault+457/1427] do_page_fault+0x1c9/0x593
 [] do_page_fault+0x1c9/0x593
 [open_exec+50/272] open_exec+0x32/0x110
 [] open_exec+0x32/0x110
 [pg0+947004132/1069421568] usb_hcd_irq+0x44/0x90 [usbcore]
 [] usb_hcd_irq+0x44/0x90 [usbcore]
 [handle_IRQ_event+57/112] handle_IRQ_event+0x39/0x70
 [] handle_IRQ_event+0x39/0x70
 [do_execve+66/592] do_execve+0x42/0x250
 [] do_execve+0x42/0x250
 [sys_execve+70/176] sys_execve+0x46/0xb0
 [sysenter_past_esp+84/117] sysenter_past_esp+0x54/0x75
 [] sysenter_past_esp+0x54/0x75
handlers:
[pg0+947004064/1069421568] (usb_hcd_irq+0x0/0x90 [usbcore])
[] (usb_hcd_irq+0x0/0x90 [usbcore])
[pg0+947004064/1069421568] (usb_hcd_irq+0x0/0x90 [usbcore])
[] (usb_hcd_irq+0x0/0x90 [usbcore])
[pg0+947004064/1069421568] (usb_hcd_irq+0x0/0x90 [usbcore])
[] (usb_hcd_irq+0x0/0x90 [usbcore])
Disabling IRQ #21


At boot, IRQs seem to be distributed this way:

PCI: Probing PCI hardware

reiser4 hangs on lvm on raid on Fibrechannel

2005-07-11 Thread Alexander Gran

Hi,

I'm using reiser4 happily on my workstation for several month now. We'd just 
wanted to try it on a server now. The setup is as follows:
Fujistu RX100S2 Server, P4 3,2Ghz with HT. 1GB Ram.
Two Local HDD, on a Fasttrack TX4 Software Raid controller. 
Both have 3 raid partitions, two for /, two for swap and two for lvm (all 
raid1)In the lvm (group local) we currently have one volume of 10GB, mounted 
to /data/cax.
reiser4 on /data/cax works flawlessly.
Then the system has a FC card (QLogic Corp. QLA2300 64-bit), which is attached 
to a OXYGENRAID 416F, configured with 800GB. It has one lvm partition (for vg 
san). There is one volume (/dev/san/test/ mounted to /data/test. This is what 
doesn't work:
creating reiser4, creating empty files and deleting them work, but if we do 
something like
dd if=/dev/zero of=/data/test/test bs=1024 count=1024
sync
The sync command hangs. Writing directly to /dev/san/text works ok.
Kernel is 2.6.13-rc2-mm1, the same problem arises with SuSEs 2.6.11.4-21.7-smp
Kernel config is attached, more verbose system output follows:
cax-stor1:~ # lspci -v
:00:00.0 Host bridge: Intel Corporation 82875P/E7210 Memory Controller Hub 
(rev 02)
Subsystem: Fujitsu Siemens Computer GmbH: Unknown device 1034
Flags: bus master, fast devsel, latency 0
Memory at fc00 (32-bit, prefetchable) [size=32M]
Capabilities: [e4] #09 [3106]

:00:03.0 PCI bridge: Intel Corporation 82875P/E7210 Processor to PCI to 
CSA Bridge (rev 02) (prog-if 00 [Normal decode])
Flags: bus master, 66Mhz, fast devsel, latency 32
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
I/O behind bridge: 2000-2fff
Memory behind bridge: f810-f81f

:00:06.0 System peripheral: Intel Corporation 82875P/E7210 Processor to 
I/O Memory Interface (rev 02)
Flags: fast devsel
Memory at fecf (32-bit, non-prefetchable) [size=4K]

:00:1c.0 PCI bridge: Intel Corporation 6300ESB 64-bit PCI-X Bridge (rev 
02) (prog-if 00 [Normal decode])
Flags: bus master, 66Mhz, fast devsel, latency 32
Bus: primary=00, secondary=03, subordinate=03, sec-latency=48
I/O behind bridge: 3000-3fff
Memory behind bridge: f820-f82f
Capabilities: [50] PCI-X bridge device.

:00:1d.0 USB Controller: Intel Corporation 6300ESB USB Universal Host 
Controller (rev 02) (prog-if 00 [UHCI])
Subsystem: Fujitsu Siemens Computer GmbH: Unknown device 1034
Flags: bus master, medium devsel, latency 0, IRQ 20
I/O ports at 1400 [size=32]

:00:1d.1 USB Controller: Intel Corporation 6300ESB USB Universal Host 
Controller (rev 02) (prog-if 00 [UHCI])
Subsystem: Fujitsu Siemens Computer GmbH: Unknown device 1034
Flags: bus master, medium devsel, latency 0, IRQ 21
I/O ports at 1420 [size=32]

:00:1d.4 System peripheral: Intel Corporation 6300ESB Watchdog Timer (rev 
02)
Subsystem: Fujitsu Siemens Computer GmbH: Unknown device 1034
Flags: medium devsel
Memory at f800 (32-bit, non-prefetchable) [size=16]

:00:1d.5 PIC: Intel Corporation 6300ESB I/O Advanced Programmable 
Interrupt Controller (rev 02) (prog-if 20 [IO(X)-APIC])
Subsystem: Fujitsu Siemens Computer GmbH: Unknown device 1034
Flags: bus master, fast devsel, latency 0
Capabilities: [50] PCI-X non-bridge device.

:00:1d.7 USB Controller: Intel Corporation 6300ESB USB2 Enhanced Host 
Controller (rev 02) (prog-if 20 [EHCI])
Subsystem: Fujitsu Siemens Computer GmbH: Unknown device 1034
Flags: bus master, medium devsel, latency 0, IRQ 19
Memory at f8000400 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Capabilities: [58] #0a [2080]

:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 0a) (prog-if 
00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=04, subordinate=04, sec-latency=32
I/O behind bridge: 4000-4fff
Memory behind bridge: f830-f9ff

:00:1f.0 ISA bridge: Intel Corporation 6300ESB LPC Interface Controller 
(rev 02)
Flags: bus master, medium devsel, latency 0

:00:1f.1 IDE interface: Intel Corporation 6300ESB PATA Storage Controller 
(rev 02) (prog-if 8a [Master SecP PriP])
Subsystem: Fujitsu Siemens Computer GmbH: Unknown device 1034
Flags: bus master, medium devsel, latency 0, IRQ 16
I/O ports at 
I/O ports at 
I/O ports at 
I/O ports at 
I/O ports at 1460 [size=16]
Memory at 4000 (32-bit, non-prefetchable) [size=1K]

:00:1f.3 SMBus: Intel Corporation 6300ESB SMBus Controller (rev 02)
Subsystem: Fujitsu Siemens Computer GmbH: Unknown device 1034
Flags: medium devsel, IRQ 10
I/O ports at 1440 [size=32]

:02:01.0 Ethernet controller: Intel Corpora

[PATCH] v850: Update checksum.h to match changed function signatures

2005-07-11 Thread Miles Bader

Signed-off-by: Miles Bader <[EMAIL PROTECTED]>

 arch/v850/lib/checksum.c|3 ++-
 include/asm-v850/checksum.h |   11 ++-
 2 files changed, 8 insertions(+), 6 deletions(-)

diff -ruN -X../cludes linux-2.6.11-uc0/arch/v850/lib/checksum.c 
linux-2.6.11-uc0-v850-20050711/arch/v850/lib/checksum.c
--- linux-2.6.11-uc0/arch/v850/lib/checksum.c   2005-03-04 11:31:28.747099000 
+0900
+++ linux-2.6.11-uc0-v850-20050711/arch/v850/lib/checksum.c 2005-07-11 
13:05:36.844263000 +0900
@@ -138,7 +138,8 @@
  * Copy from userspace and compute checksum.  If we catch an exception
  * then zero the rest of the buffer.
  */
-unsigned int csum_partial_copy_from_user (const unsigned char *src, unsigned 
char *dst,
+unsigned int csum_partial_copy_from_user (const unsigned char *src,
+ unsigned char *dst,
   int len, unsigned int sum,
   int *err_ptr)
 {
diff -ruN -X../cludes linux-2.6.11-uc0/include/asm-v850/checksum.h 
linux-2.6.11-uc0-v850-20050711/include/asm-v850/checksum.h
--- linux-2.6.11-uc0/include/asm-v850/checksum.h2002-11-05 
11:25:31.859782000 +0900
+++ linux-2.6.11-uc0-v850-20050711/include/asm-v850/checksum.h  2005-07-11 
13:06:31.973753000 +0900
@@ -1,8 +1,8 @@
 /*
  * include/asm-v850/checksum.h -- Checksum ops
  *
- *  Copyright (C) 2001  NEC Corporation
- *  Copyright (C) 2001  Miles Bader <[EMAIL PROTECTED]>
+ *  Copyright (C) 2001,2005  NEC Corporation
+ *  Copyright (C) 2001,2005  Miles Bader <[EMAIL PROTECTED]>
  *
  * This file is subject to the terms and conditions of the GNU General
  * Public License.  See the file COPYING in the main directory of this
@@ -36,8 +36,8 @@
  * here even more important to align src and dst on a 32-bit (or even
  * better 64-bit) boundary
  */
-extern unsigned csum_partial_copy (const char *src, char *dst, int len,
-  unsigned sum);
+extern unsigned csum_partial_copy (const unsigned char *src,
+  unsigned char *dst, int len, unsigned sum);
 
 
 /*
@@ -46,7 +46,8 @@
  * here even more important to align src and dst on a 32-bit (or even
  * better 64-bit) boundary
  */
-extern unsigned csum_partial_copy_from_user (const char *src, char *dst,
+extern unsigned csum_partial_copy_from_user (const unsigned char *src,
+unsigned char *dst,
 int len, unsigned sum,
 int *csum_err);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] v850: Update mmu.h header to match implementation changes

2005-07-11 Thread Miles Bader

Signed-off-by: Miles Bader <[EMAIL PROTECTED]>

 include/asm-v850/mmu.h |   17 +++--
 1 files changed, 3 insertions(+), 14 deletions(-)

diff -ruN -X../cludes linux-2.6.11-uc0/include/asm-v850/mmu.h 
linux-2.6.11-uc0-v850-20050711/include/asm-v850/mmu.h
--- linux-2.6.11-uc0/include/asm-v850/mmu.h 2002-11-05 11:25:32.169771000 
+0900
+++ linux-2.6.11-uc0-v850-20050711/include/asm-v850/mmu.h   2005-04-11 
13:46:11.741698000 +0900
@@ -1,22 +1,11 @@
-/* Copyright (C) 2002, David McCullough <[EMAIL PROTECTED]> */
+/* Copyright (C) 2002, 2005, David McCullough <[EMAIL PROTECTED]> */
 
 #ifndef __V850_MMU_H__
 #define __V850_MMU_H__
 
-struct mm_rblock_struct {
-   int size;
-   int refcount;
-   void*kblock;
-};
-
-struct mm_tblock_struct {
-   struct mm_rblock_struct *rblock;
-   struct mm_tblock_struct *next;
-};
-
 typedef struct {
-   struct mm_tblock_struct tblock;
-   unsigned long   end_brk;
+   struct vm_list_struct   *vmlist;
+   unsigned long   end_brk;
 } mm_context_t;
 
 #endif /* __V850_MMU_H__ */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 62/82] remove linux/version.h from drivers/video/sis

2005-07-11 Thread Thomas Winischhofer

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Olaf Hering wrote:
> changing CONFIG_LOCALVERSION rebuilds too much, for no appearent
> reason.

[...]

> drivers/video/sis/init.h  |5
> drivers/video/sis/init301.h   |5
> drivers/video/sis/sis.h   |   46 --
> drivers/video/sis/sis_accel.c |  171 -
> drivers/video/sis/sis_accel.h |   13
> drivers/video/sis/sis_main.c  |  784 --
> drivers/video/sis/sis_main.h  |   55 --
> drivers/video/sis/vgatypes.h  |3
> 8 files changed, 1 insertion(+), 1081 deletions(-)


Please do NOT apply this.

- --
Thomas Winischhofer
Vienna/Austria
thomas AT winischhofer DOT net *** http://www.winischhofer.net

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFC0jskzydIRAktyUcRAo0gAKCwKAvQwa3YQz817JEYctiOsgUFrQCgrAFe
cg3hfbK320KjqojE7PjkmnM=
=L7jH
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] eventpoll : Suppress a short lived lock from struct file

2005-07-11 Thread Eric Dumazet


Peter Zijlstra a écrit :

On Mon, 2005-07-11 at 09:18 +0200, Eric Dumazet wrote:

Have you tested the impact of this change on big SMP/NUMA machines?
I hate to see an Altrix crashing to its knees :-)



I tested on a small NUMA machine (2 nodes), with a epoll enabled application,
that use around 100 epoll ctl per second.

Of course, one may write a special benchmark on a BIG SMP/NUMA machine that
 defeat these patch, using thousands of epoll ctl per second, but, a normal 
(well written ?)
epoll application doesnt constantly add/remove epoll ctl.

Should we waste 8 bytes per 'struct file' for a very unlikely micro benchmark ?

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [29/48] Suspend2 2.1.9.8 for 2.6.12: 606-all-settings.patch

2005-07-11 Thread Nigel Cunningham

Hi.

On Mon, 2005-07-11 at 04:03, Pavel Machek wrote:
> Hi!
> 
> > +static void suspend2_suspend_2(void)
> > +{
> > +   if (!save_image_part1()) {
> > +   suspend_power_down();
> > +
> > +   if (suspend2_powerdown_method == 3) {
> > +   int temp_result;
> > +
> > +   temp_result = read_pageset2(1);
> 
> 
> Is that just me or do I see way too many numbers. suspend2_suspend_2
> is really funny name for a functions. powerdown_method should really
> use some symbolic constants.

No, it's not just you. It's one of those hangovers from the original
code that I hadn't gotten around to cleaning up.

Symbolic constants now in place for the powerdown method too.

Regards,

Nigel
-- 
Evolution.
Enumerate the requirements.
Consider the interdependencies.
Calculate the probabilities.
Be amazed that people believe it happened. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] v850: Update checksum.h to match changed function signatures

2005-07-11 Thread Frederik Deweerdt

Le 11/07/05 18:24 +0900, Miles Bader écrivit:
> 
> -unsigned int csum_partial_copy_from_user (const unsigned char *src, unsigned 
> char *dst,
> +unsigned int csum_partial_copy_from_user (const unsigned char *src,
> +   unsigned char *dst,
  ^^^ Alignment looks fuzzy here
>int len, unsigned int sum,
>int *err_ptr)

Regards,
Frederik Deweerdt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ltp] IBM HDAPS Someone interested? (Userspace accelerometer viewer)

2005-07-11 Thread Paul Sladen

On Sun, 3 Jul 2005, Alejandro Bonilla wrote:
> PLEASE read the following article, it has the data of a guy that made a 
> driver in IBM for Linux and he described the driver he made.
> http://www.almaden.ibm.com/cs/people/marksmith/tpaps.html

Yesterday evening, I used my time here at Debconf5 constructively!  ;-)

  http://www.paul.sladen.org/thinkpad-r31/aps/accelerometer-viewer.jpg(43kB)
  http://www.paul.sladen.org/thinkpad-r31/aps/accelerometer-lid-shut.jpg  (27kB)

The sensor gives us two 10-bit AD values (corresponding to 0..1 volts on the
ADI chip), temperature (Celsius) and three status bits indicating:

  * lid open/closed
  * keyboard activity
  * nipple movement

On the X40 I borrowed (thanks Robert McQueen), at rest the outputs hover
around 512 (0x200).  Gravity is supposed to fall off in a sine-wave during
rotation, but I found that:

  theta = (N - 512) * 0.5

provides a surprisingly good approximation for pitch/roll values in degrees
in the range (-90..+90) so I think the sensor can do ~= +/-2.5G .

  http://www.paul.sladen.org/thinkpad-r31/aps/accelerometer-screenshot.png (9kB)

-Paul

PS.  Coincidently, the name of the machine I borrowed is 'theta'...
-- 
Mostly it snows here.  Helsinki, FI

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [38/48] Suspend2 2.1.9.8 for 2.6.12: 614-plugins.patch

2005-07-11 Thread Nigel Cunningham

Hi.

On Mon, 2005-07-11 at 04:08, Pavel Machek wrote:
> Hi!
> 
> > +unsigned long suspend2_powerdown_method = 5; /* S5 = off */
> 
> Constants.
> 
> > +   if (suspend2_powerdown_method == 3 ||
> > +   suspend2_powerdown_method == 4)
> 
> Constants...
> 
> > +   if (suspend2_powerdown_method == 3 ||
> > +   suspend2_powerdown_method == 4)
> 
> Constants...

Once per file is enough :>

> > +   suspend2_prepare_status(1, 0, "Ready to reboot.");
> > +   suspend2_prepare_status(1, 0, "Seeking to enter ACPI state");
> > +   suspend2_prepare_status(1, 0, "Preparing to enter ACPI 
> > state failed. Using normal powerdown.");
> > +   suspend2_prepare_status(1, 0, "Suspending devices 
> > failed. Using normal powerdown.");
> > +   suspend2_prepare_status(1, 0, "Entering ACPI state 
> > failed. Using normal powerdown.");
> > +   suspend2_prepare_status(1, 0, "Powering down.");
> 
> Too many magical constants here... Plus I don't really like your own
> logging subsystem.

The first parameter isn't needed anymore - gone. Second one changed to
an enum DONT_CLEAR_BAR | CLEAR_BAR.

Not liking it - that's alright there's some code you've written that I
don't like either :>

Nigel
-- 
Evolution.
Enumerate the requirements.
Consider the interdependencies.
Calculate the probabilities.
Be amazed that people believe it happened. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Apple USB Touchpad driver (new)

2005-07-11 Thread Stelian Pop

Le dimanche 10 juillet 2005 à 00:32 +0200, Peter Osterlund a écrit :
> Stelian Pop <[EMAIL PROTECTED]> writes:
> 
> > +Synaptics re-detection problems:
> > +
> > +
> > +The synaptics X11 driver tries to re-open the touchpad input device file
> > +(/dev/input/eventX) each time you change from text mode back to X11. If the
> > +input device file does not exist at this precise moment, the synaptics 
> > driver
> > +will give up searching for a touchpad, permanently. You will need to 
> > restart
> > +X11 if you want to reissue a scan.
> 
> I think this particular problem is fixed by the following patch to the
> X driver:
> 
> --- synaptics.c.old   2005-07-10 00:09:02.0 +0200
> +++ synaptics.c   2005-07-10 00:09:12.0 +0200
> @@ -524,6 +524,11 @@
>  
>  local->fd = xf86OpenSerial(local->options);
>  if (local->fd == -1) {
> + xf86ReplaceStrOption(local->options, "Device", "");
> + SetDeviceAndProtocol(local);
> + local->fd = xf86OpenSerial(local->options);
> +}
> +if (local->fd == -1) {
>   xf86Msg(X_WARNING, "%s: cannot open input device\n", local->name);
>   return !Success;
>  }

It does indeed fix the problem.

I removed that section from the documentation, as I assume you will
integrate this patch in future synaptics releases (and it wasn't anyway
a big problem for users, just for developers).

> 
> > +static int atp_calculate_abs(int *xy_sensors, int nb_sensors, int fact) {
> 
> I think this CodingStyle violation is quite annoying, because it
> prevents emacs from finding the beginning of the function. It should
> be written like this:

Indeed, that one slipped over, but this didn't prevent vim from finding
the beginning of the function :)

Thanks,

Stelian.
-- 
Stelian Pop <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] v850: Update checksum.h to match changed function signatures

2005-07-11 Thread Miles Bader

2005/7/11, Frederik Deweerdt <[EMAIL PROTECTED]>:
> > -unsigned int csum_partial_copy_from_user (const unsigned char *src, 
> > unsigned char *dst,
> > +unsigned int csum_partial_copy_from_user (const unsigned char *src,
> > +   unsigned char *dst,
>   ^^^ Alignment looks fuzzy here

It's actually a spaces-vs-tabs issue -- the existing lines use all
spaces for indentation, but my new line uses tabs, so when viewed as
part of the diff they look unaligned; they look OK in the actual
source file though.

-Miles
-- 
Do not taunt Happy Fun Ball.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

cagri otomotiv

2005-07-11 Thread Cagri Motorlu Araclar

Web sayfamizi ziyaret ediniz
Tel : 0212 521 11 11 - 521 04 04 - 521 36 96 - 531 76 18 - 532 91 11
Sinirsiz Müsteri Memnuniyetini Esas Alarak 30 Yildir ilkeli ve Kaliteli Hizmet 
Vermektedir.
Vade Takas yapilir.
2 El Otomotivde Tek isim
Kaliteye Çagri ...Hizmete Çagri...Güvene Çagri...
Web : http://www.cagrimotors.com
E-Mail : [EMAIL PROTECTED]

Lack of Documentation about SA_RESTART...

2005-07-11 Thread Paolo Ornati

The documentation (man pages & info libc) doesn't cover well interaction
between various syscalls and SA_RESTART flag of sigaction()... I wonder
why!

> MAN SIGACTION <

"SA_RESTART
Provide behaviour compatible with BSD signal semantics by
making certain system calls restartable across signals."

certain?!? :-O!


> INFO LIBC <

"   One important exception is `EINTR' (*note Interrupted Primitives::).
Many stream I/O implementations will treat it as an ordinary error,
which can be quite inconvenient.  You can avoid this hassle by
installing all signals with the `SA_RESTART' flag."


" -- Macro: int SA_RESTART
 This flag controls what happens when a signal is delivered during
 certain primitives (such as `open', `read' or `write'), and the
 signal handler returns normally.  There are two alternatives: the
 library function can resume, or it can return failure with error
 code `EINTR'."

Ok, "info libc" is a bit better.


But what I'm looking for is a list of syscalls that are automatically
restarted when SA_RESTART is set, and especially in what conditions.

For example: read(), write(), open() are obviously restarted, but even
on non-blocking fd?
And what about connect() and select() for example?

There are a lot of syscalls that can fail with "EINTR"! What's the
advantage of using SA_RESTART if one doesn't know what syscalls are
restarted?

One should always check for "EINTR" or use "TEMP_FAILURE_RETRY()" macro
as suggested in "info libc" !


Looking at the source I can easly see that a syscall is retarted when it
returns "-ERESTARTSYS" and SA_RESTART flag is set. Should I look at the
code for every syscall / particular condition?


Example of behavior: according to source code it seems that "connect()"
(the "net/ipv4/af_inet.c : inet_stream_connect()" implementation)
returns -ERESTARTSYS if interrupted, but if the socket is in
non-blocking mode it returns -EINTR.


SUMMARY:

1) there is a reason for this lack of documentation?

2) what can I safely assume about syscalls restart when using SA_RESTART
flag?


Bye,

-- 
Paolo Ornati
Linux 2.6.12.2 on x86_64
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Apple USB Touchpad driver (new)

2005-07-11 Thread Stelian Pop

Le lundi 11 juillet 2005 à 02:15 +0200, Peter Osterlund a écrit :
> Vojtech Pavlik <[EMAIL PROTECTED]> writes:
> 
> > On Sun, Jul 10, 2005 at 12:48:30AM +0200, Peter Osterlund wrote:
> > > Vojtech Pavlik <[EMAIL PROTECTED]> writes:
> > > 
> > > > Btw, what I don't completely understand is why you need linear
> > > > regression, when you're not trying to detect motion or something like
> > > > that. Basic floating average, or even simpler filtering like the input
> > > > core uses for fuzz could work well enough I believe.
> > > 
> > > Indeed, this function doesn't make much sense:
> > > 
> > > +static inline int smooth_history(int x0, int x1, int x2, int x3)
> > > +{
> > > + return x0 - ( x0 * 3 + x1 - x2 - x3 * 3 ) / 10;
> > > +}
[...]

> +static inline int smooth_history(int x0, int x1, int x2, int x3)
> +{
> + return (x0 + x1 + x2 + x3) / 4;
> +}

I took Peter's approach here and changed the smoothing to use basic
floating average as above.

> > Using a function like
> > 
> > return (x_old * 3 + x) / 4;
> > 
> > eliminates the need for a FIFO, and has similar (if not better)
> > properties to floating average, because its coefficients are
> > [ .25 .18 .14 .10 ... ].
> 
> Agreed.

Except that this does not work well enough.

There are two problems I encountered in this driver:
* fuzz problems (keeping the finger at the same place makes the pointer
dance around its position). This is solved by the input core's fuzz
treatment, as I already set the fuzz to 16 in the code.

* hickup problems (moving the finger generates non linear points,
something like 1 1 1 3 3 3 4 4 4 instead of 1 1 1 2 2 3 3 4 4). And here
the floating average approach works better than the input core's method.
(this could probably be solved also by changing the way the absolute
coordinate is calculated from the sensor array in atp_calculate_abs, but
I haven't been able to find a better linear function).

> I took the liberty to modify the patch myself, making these changes:
> 
> * Removed the extra filtering.
> * Converted the "open" counter to an "open" flag. (It is still needed
>   by the atp_resume() function.)
> * CodingStyle fixes.
> 
> I have only compile tested this as I don't have access to the
> hardware, so I don't know how well this works in practice. It's
> possible that the "dev->h_count > 3" test in the old patch filtered
> out spikes in the input signal.

I would prefer to submit the patch myself, because as you say you cannot
test the code and those changes are rather sensitive. Without the
smoothing function, the driver is almost unusable when used without the
synaptics driver (as a standard mouse), positionning the pointer at
exact locations is quite difficult.

> Also, it might be a good idea to compute an ABS_PRESSURE value instead
> of hardcoding it to 100. I think the psum variable in
> atp_calculate_abs() can be used, possibly after rescaling.

I already thought about this, one problem is that the sensors do not
report the pressure but only the amount of surface touched. A person
with thick fingers will always generate higher pressures then one with
thin ones, no matter how hard they push on the touchpad.

I don't think this value is reliable enough to be reported to the
userspace as ABS_PRESSURE...

Anyway, here is the updated patch:

Signed-off-by: Stelian Pop <[EMAIL PROTECTED]>

 Documentation/input/appletouch.txt |   83 ++
 drivers/usb/input/Kconfig  |   19 +
 drivers/usb/input/Makefile |1 
 drivers/usb/input/appletouch.c |  509
+
 4 files changed, 612 insertions(+)

Index: linux-2.6.git/drivers/usb/input/Makefile
===
--- linux-2.6.git.orig/drivers/usb/input/Makefile   2005-07-11
09:46:57.0 +0200
+++ linux-2.6.git/drivers/usb/input/Makefile2005-07-11
09:48:23.0 +0200
@@ -39,3 +39,4 @@
 obj-$(CONFIG_USB_WACOM)+= wacom.o
 obj-$(CONFIG_USB_ACECAD)   += acecad.o
 obj-$(CONFIG_USB_XPAD) += xpad.o
+obj-$(CONFIG_USB_APPLETOUCH)   += appletouch.o
Index: linux-2.6.git/drivers/usb/input/Kconfig
===
--- linux-2.6.git.orig/drivers/usb/input/Kconfig2005-07-11
09:46:57.0 +0200
+++ linux-2.6.git/drivers/usb/input/Kconfig 2005-07-11
09:48:23.0 +0200
@@ -259,3 +259,22 @@
  To compile this driver as a module, choose M here: the module will
be
  called ati_remote.

+config USB_APPLETOUCH
+   tristate "Apple USB Touchpad support"
+   depends on USB && INPUT
+   ---help---
+ Say Y here if you want to use an Apple USB Touchpad.
+
+ These are the touchpads that can be found on post-February 2005
+ Apple Powerbooks (prior models have a Synaptics touchpad connected
+ to the ADB bus).
+
+ This driver provides a basic mouse driver but can be interfaced
+ with the synaptics X11 driver

Re: [PATCH] Apple USB Touchpad driver (new)

2005-07-11 Thread Vojtech Pavlik

On Mon, Jul 11, 2005 at 02:15:49AM +0200, Peter Osterlund wrote:

> I took the liberty to modify the patch myself, making these changes:
> 
> * Removed the extra filtering.
> * Converted the "open" counter to an "open" flag. (It is still needed
>   by the atp_resume() function.)
> * CodingStyle fixes.
> 
> I have only compile tested this as I don't have access to the
> hardware, so I don't know how well this works in practice. It's
> possible that the "dev->h_count > 3" test in the old patch filtered
> out spikes in the input signal.
> 
> Also, it might be a good idea to compute an ABS_PRESSURE value instead
> of hardcoding it to 100. I think the psum variable in
> atp_calculate_abs() can be used, possibly after rescaling.

Stelian, can you check the patch, and if everything is OK, add your
Signed-off-by: line?

> Signed-off-by: Peter Osterlund <[EMAIL PROTECTED]>
> ---
> 
>  Documentation/input/appletouch.txt |  120 +
>  drivers/usb/input/Kconfig  |   19 +
>  drivers/usb/input/Makefile |1 
>  drivers/usb/input/appletouch.c |  461 
> 
>  4 files changed, 601 insertions(+), 0 deletions(-)

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

PROBLEM: fork() & setpriority()

2005-07-11 Thread Rommer


Hello,

I have trouble with fork() and setpriority().
When priority of child process != priority of parent process
and used SIGCHLD handler.
See example.

kernel 2.6.12.1, no SMP

--
Best regards, Roman
#include 
#include 
#include 
#include 
#include 
#include 
#include 

struct sigaction sa_chld;
int i = 0;


static void chld_handler (int signum) {
int status = 0;
wait (&status);
printf ("child closed: status = %i\n", status >> 8);
}


void test (void) {
pid_t pid = fork ();

if (pid == 0) {
	setpriority (PRIO_PROCESS, 0, 0);
	printf ("helo from child, i = %i\n", i);
	exit (i);
}
}


int main () {
memset (&sa_chld, 0, sizeof (sa_chld));
sa_chld.sa_handler = &chld_handler;
sigaction (SIGCHLD, &sa_chld, NULL);

setpriority (PRIO_PROCESS, 0, -1);
while (1) {
	i++;
	test ();
	sleep (3);
}
return 0;
}

Re: [PATCH] Apple USB Touchpad driver (new)

2005-07-11 Thread Vojtech Pavlik

On Mon, Jul 11, 2005 at 12:39:31PM +0200, Stelian Pop wrote:
 
> > > Using a function like
> > > 
> > >   return (x_old * 3 + x) / 4;
> > > 
> > > eliminates the need for a FIFO, and has similar (if not better)
> > > properties to floating average, because its coefficients are
> > > [ .25 .18 .14 .10 ... ].
> > 
> > Agreed.
> 
> Except that this does not work well enough.

I guess the quick motion compensation in input bites you. The above
equation should do even more smoothing than regular 4-point floating
average.

> There are two problems I encountered in this driver:
> * fuzz problems (keeping the finger at the same place makes the pointer
> dance around its position). This is solved by the input core's fuzz
> treatment, as I already set the fuzz to 16 in the code.

OK.

> * hickup problems (moving the finger generates non linear points,
> something like 1 1 1 3 3 3 4 4 4 instead of 1 1 1 2 2 3 3 4 4). And here
> the floating average approach works better than the input core's method.
> (this could probably be solved also by changing the way the absolute
> coordinate is calculated from the sensor array in atp_calculate_abs, but
> I haven't been able to find a better linear function).

I of course won't object to the floating average in the driver if you
say it's needed, I'm just wondering what happens here, because the input
core should smooth this out as well, and if it doesn't, there may be a
problem somewhere.

> > Also, it might be a good idea to compute an ABS_PRESSURE value instead
> > of hardcoding it to 100. I think the psum variable in
> > atp_calculate_abs() can be used, possibly after rescaling.
> 
> I already thought about this, one problem is that the sensors do not
> report the pressure but only the amount of surface touched. A person
> with thick fingers will always generate higher pressures then one with
> thin ones, no matter how hard they push on the touchpad.

That's what all other touchpads do.

> I don't think this value is reliable enough to be reported to the
> userspace as ABS_PRESSURE...

I believe it'd still be more useful than a two-value (0 and 100) output.


> + /*
> +  * in the future, we could add here code to search for
> +  * a second finger...
> +  * for now, scrolling using the synaptics X driver is
> +  * much more simpler to achieve.
> +  */

This could be quite useful, too, for right and middle button taps (2 and
3 fingers) - since the Macs lack these buttons.

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Swap partition vs swap file

2005-07-11 Thread Helge Hafting


Wakko Warner wrote:


Bernd Eckenfels wrote:
 


In article <[EMAIL PROTECTED]> you wrote:
   


You misunderstood entirely what I said.
 


There is no portable/documented way to grow a file without having the file
system null its content. However why is that a problem, you dont create
those files very often. Besides it is better for the OS to be able to asume
that a page with zeros in it is equal to the page on fresh swap.
   


You don't need to zero out swapfiles. You can fill them with anything,
even /dev/urandom.  Zero-filling may be faster though.  A swapfile
is not zero the second time you use it - then it contains leftovers
from last time.



So are you saying that if I create a swap partition it's best to use dd to
zero it out before mkswap?  If no, then why would a file be different?  I
know there's no documented way to create a file of given size without
writing content.  I saw windows grow a pagefile several meg in less than a
second so I'm sure that it doesn't zero out the space first.
 


Linux doesn't grow swapfiles at all.  It uses what's there at mkswap time.
You can make new ones of course - manually.


As far as portable, we're talking about linux, portability is not an issue
in this case.  I myself don't use swap files (or partitions), however, there
was a project I recall that would dynamically add/remove swap as needed. 
Creating a file of 20-50mb quickly would have been beneficial.
 


You can create 50M quickly - even if it actually have to be written.  If
you can't, don't use that device for swap. 

Ability to allocate some blocks without actually writing to them is nice 
for this

purpose, but current linux filesystems doesn't have an api for doing that.
The necessary changes would touch all existing writeable filesystems, and
that is a lot of work for very little gain.  As they say, you don't 
create swapfiles
all that often.  The time saved on swapfile creation might take a long 
time to

make up for the time spent on making, auditing and supporting those
changes.

Helge Hafting




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Apple USB Touchpad driver (new)

2005-07-11 Thread Stelian Pop

On Mon, Jul 11, 2005 at 03:52:44AM -0700, Andrew Morton wrote:
> Stelian Pop <[EMAIL PROTECTED]> wrote:
> >
> > Anyway, here is the updated patch:
> > 
> >  Signed-off-by: Stelian Pop <[EMAIL PROTECTED]>
> > 
> >   Documentation/input/appletouch.txt |   83 ++
> >   drivers/usb/input/Kconfig  |   19 +
> >   drivers/usb/input/Makefile |1 
> >   drivers/usb/input/appletouch.c |  509
> >  +
> 
> Badly wordwrapped.

Oops, bad Evolution (even if I did use insert->text file for the patch).

Going back to mutt. Sorry about this.

Changelog:
* CodingStyle fixes
* open counter replaced by a binary flag
* udev hacks are no longer necessary with a patched
  synaptics, update the documentation

Signed-off-by: Stelian Pop <[EMAIL PROTECTED]>

 Documentation/input/appletouch.txt |   83 ++
 drivers/usb/input/Kconfig  |   19 +
 drivers/usb/input/Makefile |1 
 drivers/usb/input/appletouch.c |  509 +

 4 files changed, 612 insertions(+)
Index: linux-2.6.git/drivers/usb/input/Makefile
===
--- linux-2.6.git.orig/drivers/usb/input/Makefile   2005-07-11 
09:46:57.0 +0200
+++ linux-2.6.git/drivers/usb/input/Makefile2005-07-11 09:48:23.0 
+0200
@@ -39,3 +39,4 @@
 obj-$(CONFIG_USB_WACOM)+= wacom.o
 obj-$(CONFIG_USB_ACECAD)   += acecad.o
 obj-$(CONFIG_USB_XPAD) += xpad.o
+obj-$(CONFIG_USB_APPLETOUCH)   += appletouch.o
Index: linux-2.6.git/drivers/usb/input/Kconfig
===
--- linux-2.6.git.orig/drivers/usb/input/Kconfig2005-07-11 
09:46:57.0 +0200
+++ linux-2.6.git/drivers/usb/input/Kconfig 2005-07-11 09:48:23.0 
+0200
@@ -259,3 +259,22 @@
  To compile this driver as a module, choose M here: the module will be
  called ati_remote.
 
+config USB_APPLETOUCH
+   tristate "Apple USB Touchpad support"
+   depends on USB && INPUT
+   ---help---
+ Say Y here if you want to use an Apple USB Touchpad.
+
+ These are the touchpads that can be found on post-February 2005
+ Apple Powerbooks (prior models have a Synaptics touchpad connected
+ to the ADB bus).
+
+ This driver provides a basic mouse driver but can be interfaced
+ with the synaptics X11 driver to provide acceleration and
+ scrolling in X11.
+
+ For further information, see
+ .
+
+ To compile this driver as a module, choose M here: the
+ module will be called appletouch.
Index: linux-2.6.git/Documentation/input/appletouch.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.git/Documentation/input/appletouch.txt2005-07-11 
11:57:41.0 +0200
@@ -0,0 +1,83 @@
+Apple Touchpad Driver (appletouch)
+--
+   Copyright (C) 2005 Stelian Pop <[EMAIL PROTECTED]>
+
+appletouch is a Linux kernel driver for the USB touchpad found on post
+February 2005 Apple Alu Powerbooks.
+
+This driver is derived from Johannes Berg's appletrackpad driver[1], but it has
+been improved in some areas:
+   * appletouch is a full kernel driver, no userspace program is necessary
+   * appletouch can be interfaced with the synaptics X11 driver, in order
+ to have touchpad acceleration, scrolling, etc.
+
+Credits go to Johannes Berg for reverse-engineering the touchpad protocol,
+Frank Arnold for further improvements, and Alex Harper for some additional
+information about the inner workings of the touchpad sensors.
+
+Usage:
+--
+
+In order to use the touchpad in the basic mode, compile the driver and load
+the module. A new input device will be detected and you will be able to read
+the mouse data from /dev/input/mice (using gpm, or X11).
+
+In X11, you can configure the touchpad to use the synaptics X11 driver, which
+will give additional functionalities, like acceleration, scrolling etc. In
+order to do this, make sure you're using a recent version of the synaptics
+driver (tested with 0.14.2, available from [2]), and configure a new input
+device in your X11 configuration file (take a look below for an example). For
+additional configuration, see the synaptics driver documentation.
+
+   Section "InputDevice"
+   Identifier  "Synaptics Touchpad"
+   Driver  "synaptics"
+   Option  "SendCoreEvents""true"
+   Option  "Device""/dev/input/mice"
+   Option  "Protocol"  "auto-dev"
+   Option  "LeftEdge"  "0"
+   Option  "RightEdge" "850"
+   Option  "TopEdge"   "0"
+

Re: PROBLEM: fork() & setpriority()

2005-07-11 Thread Arjan van de Ven

On Mon, 2005-07-11 at 13:58 +0300, Rommer wrote:
> Hello,
> 
> I have trouble with fork() and setpriority().
> When priority of child process != priority of parent process
> and used SIGCHLD handler.
> See example.

the example is buggy in that printf() isn't allowed in signal handlers
btw...


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Apple USB Touchpad driver (new)

2005-07-11 Thread Stelian Pop

Le lundi 11 juillet 2005 à 13:00 +0200, Vojtech Pavlik a écrit :
> On Mon, Jul 11, 2005 at 12:39:31PM +0200, Stelian Pop wrote:
>  
> > > > Using a function like
> > > > 
> > > > return (x_old * 3 + x) / 4;
> > > > 
> > > > eliminates the need for a FIFO, and has similar (if not better)
> > > > properties to floating average, because its coefficients are
> > > > [ .25 .18 .14 .10 ... ].
> > > 
> > > Agreed.
> > 
> > Except that this does not work well enough.
> 
> I guess the quick motion compensation in input bites you. The above
> equation should do even more smoothing than regular 4-point floating
> average.

Possible. The 'fuzz' parameter in input core serves too many usages
ihmo. Let me try removing the quick motion compensation and see...

> > I already thought about this, one problem is that the sensors do not
> > report the pressure but only the amount of surface touched. A person
> > with thick fingers will always generate higher pressures then one with
> > thin ones, no matter how hard they push on the touchpad.
> 
> That's what all other touchpads do.

I thought the hardware is capable of calculating real pressure...

> > I don't think this value is reliable enough to be reported to the
> > userspace as ABS_PRESSURE...
> 
> I believe it'd still be more useful than a two-value (0 and 100) output.

Ok, I'll do it.

> > +   /*
> > +* in the future, we could add here code to search for
> > +* a second finger...
> > +* for now, scrolling using the synaptics X driver is
> > +* much more simpler to achieve.
> > +*/
> 
> This could be quite useful, too, for right and middle button taps (2 and
> 3 fingers) - since the Macs lack these buttons.

Indeed. But this can be a later improvement, let's make one finger work
for now :)

Stelian.
-- 
Stelian Pop <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: fork() & setpriority()

2005-07-11 Thread Rommer


Hello,

Arjan van de Ven wrote:

On Mon, 2005-07-11 at 13:58 +0300, Rommer wrote:


Hello,

I have trouble with fork() and setpriority().
When priority of child process != priority of parent process
and used SIGCHLD handler.
See example.



the example is buggy in that printf() isn't allowed in signal handlers
btw...



ok, you can remove printf() from SIGCHLD handler.

--
Best regards, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: reiser4 vs politics: linux misses out again

2005-07-11 Thread Ed Tomlinson

On Sunday 10 July 2005 20:01, Ed Cogburn wrote:
> Jim Crilly wrote:
> 
> > But in most of the changesets on the bkbits site you can go back over 2
> > years and not see anything from namesys people. Nearly all of the fixes 
> > commited in the past 2-3 years are from SuSe.

With Chris Mason's name attached?  Chris wrote the journaling support for R3
and worked for SUSE for a while (he may still?).   I also remember seeing quite
a few patches run though the reiser mailing list for comment...
  
> So, for the sake of argument, if IBM were to drop official support for JFS,
> we'd yank JFS out of the kernel even if there was someone else willing to
> support it?  Why does it now *matter* who supports it, as long as its being
> maintained?  And will we now block IBM's hypothetical JFS2 from the kernel
> if IBM, from the programmers up to the CEO, doesn't swear on their momma's
> grave that they'll continue to support JFS1, even if JFS1 is being
> supported by others?  Jeez, this is why it doesn't take a kernel dev to see
> the problems here, common sense seems to be an increasingly rare ingredient
> in these arguments against R4.  If I didn't know better, I'd think you were
> making this stuff up as you went along
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Apple USB Touchpad driver (new)

2005-07-11 Thread Vojtech Pavlik

On Mon, Jul 11, 2005 at 01:08:35PM +0200, Stelian Pop wrote:

> Possible. The 'fuzz' parameter in input core serves too many usages
> ihmo. Let me try removing the quick motion compensation and see...

It was designed for joysticks and works very well for them. Usefulness
for other device types may vary. And I'll gladly accept patches to
improve it.

> > > I already thought about this, one problem is that the sensors do not
> > > report the pressure but only the amount of surface touched. A person
> > > with thick fingers will always generate higher pressures then one with
> > > thin ones, no matter how hard they push on the touchpad.
> > 
> > That's what all other touchpads do.
> 
> I thought the hardware is capable of calculating real pressure...

Since the sensor is just a multi-layer PCB with a clever trace layout,
it can't.

> > > I don't think this value is reliable enough to be reported to the
> > > userspace as ABS_PRESSURE...
> > 
> > I believe it'd still be more useful than a two-value (0 and 100) output.
> 
> Ok, I'll do it.

Thanks. Should I wait for that or apply the patch you just sent?

> > This could be quite useful, too, for right and middle button taps (2 and
> > 3 fingers) - since the Macs lack these buttons.
> 
> Indeed. But this can be a later improvement, let's make one finger work
> for now :)
 
Agreed.

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PREEMPT_RT and I-PIPE: the numbers, part 4

2005-07-11 Thread Karim Yaghmour

Ingo Molnar wrote:
> So why do your "ping flood" results show such difference? It really is 
> just another type of interrupt workload and has nothing special in it.
...
> are you suggesting this is not really a benchmark but a way to test how 
> well a particular system withholds against extreme external load?

Look, you're basically splitting hairs. No matter how involved an explanation
you can provide, it remains that both vanilla and I-pipe were subject to the
same load. If PREEMPT_RT consistently shows the same degradation under the
same setup, and that is indeed the case, then the problem is with PREEMPT_RT,
not the tests.

> so you can see ping packet flow fluctuations in your tests? Then you 
> cannot use those results as any sort of benchmark metric.

I didn't say this. I said that if fluctuation there is, then maybe this is
something we want to see the effect of. In real world applications,
interrupts may not come in at a steady pace, as you try to achieve in your
own tests.

> and from this point on you should see zero lmbench overhead from flood 
> pinging. Can vanilla or I-PIPE do that?

Let's not get into what I-pipe can or cannot do, that's not what these
numbers are about. It's pretty darn amazing that we're even having this
conversation. The PREEMPT_RT stuff is being worked on by more than a
dozen developers spread accross some of the most well-known Linux companies
out there (RedHat, MontaVista, IBM, TimeSys, etc.). Yet, despite this
massive involvement, here we have a patch developed by a single guy,
Philippe, who's doing this work outside his regular work hours, and his
patch, which does provide guaranteed deterministic behavior, is:
a) Much smaller than PREEMPT_RT
b) Less intrusive than PREEMPT_RT
c) Performs very well, as-good-as if not sometimes even better than PREEMPT_RT

Splitting hairs won't erase this reality. And again, before the I get the
PREEMPT_RT mob again on my back, this is just for the sake of argument,
both approaches remain valid, and are not mutually exclusive.

Like I said before, others are free to publish their own numbers showing
differently from what we've found.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

opening the framebuffer device

2005-07-11 Thread subramanyam yenugonda

Hi All!

How to open the frame buffer device if user has
multiple monitors on single video card.

Thanks in advance.
~YSM






__
Free antispam, antivirus and 1GB to save all your messages
Only in Yahoo! Mail: http://in.mail.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] i386: Per node IDT

2005-07-11 Thread Oleg Nesterov

Zwane Mwaikambo wrote:
>
> --- linux-2.6.13-rc1-mm1/arch/i386/kernel/entry.S 3 Jul 2005 13:20:43 
> -   1.1.1.1
> +++ linux-2.6.13-rc1-mm1/arch/i386/kernel/entry.S 10 Jul 2005 22:33:37 
> -
> -
> +/* Build the IRQ entry stubs */
>  vector=0
> -ENTRY(irq_entries_start)
> + .align IRQ_STUB_SIZE,0x90
> +ENTRY(interrupt)
>  .rept NR_IRQS
>   ALIGN
> -1:   pushl $vector-256
> + pushl $vector
>   jmp common_interrupt
>
>  [...snip...]
>
> --- linux-2.6.13-rc1-mm1/arch/i386/kernel/irq.c   3 Jul 2005 13:20:43 
> -   1.1.1.1
> +++ linux-2.6.13-rc1-mm1/arch/i386/kernel/irq.c   4 Jul 2005 21:39:56 
> -
> @@ -53,8 +53,7 @@ static union irq_ctx *softirq_ctx[NR_CPU
>   */
>  fastcall unsigned int do_IRQ(struct pt_regs *regs)
>  {
> - /* high bits used in ret_from_ code */
> - int irq = regs->orig_eax & 0xff;
> + int irq = regs->orig_eax;

Could you explain this change? I think it breaks do_signal/handle_signal,
they check orig_eax >= 0 to handle -ERESTARTSYS:

/* Are we from a system call? */
if (regs->orig_eax >= 0) {
/* If so, check system call restarting.. */
switch (regs->eax) {
case -ERESTART_RESTARTBLOCK:
case -ERESTARTNOHAND:

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] wbsd version bump

2005-07-11 Thread Pierre Ossman

Version increase of the wbsd driver.

Signed-off-by: Pierre Ossman <[EMAIL PROTECTED]>

Even though the changes are minor for the next release an increasing
version number simplifies my support issues.

Index: linux/drivers/mmc/wbsd.c
===
--- linux/drivers/mmc/wbsd.c	(revision 151)
+++ linux/drivers/mmc/wbsd.c	(working copy)
@@ -42,7 +42,7 @@
 #include "wbsd.h"
 
 #define DRIVER_NAME "wbsd"
-#define DRIVER_VERSION "1.2"
+#define DRIVER_VERSION "1.3"
 
 #ifdef CONFIG_MMC_DEBUG
 #define DBG(x...) \

Re: reiser4 plugins

2005-07-11 Thread Jaroslav Soltys

> So basically if I write a program that works in both Gnome and KDE
> I should (according to your description) implement my own VFS that
> will use the Gnome or KDE VFS that will then use the OS VFS.
> 
> Is it only me finding that a little silly?

Maybe. Advantages of kde/gnome/other userland vfs ? Authentication
when using SMB shares, transparent access to fish/webdav/ftp/... but
only for applications that use these libraries. Maybe patched
automount together with lufs plus some GUI for acquiring user
credentials from kde/gnome user could do the same, but kde/gnome vfs
is portable to other unices.

Oh, by the way... mount something with smbmount and turn of that
computer you mounted the share from. You must switch to root to be
able to do umount -lf. But yes, I agree with you, it is silly and I
would also like to have one good solution than two average ones.

> I mean, if I am to have the same functionality under neither Gnome
> nor VFS and they don't support something I need I _NEED_ a vfs so
> that my program is so totally independent on anything at all.

right, wouldn't it be nice to 'cd /mnt/webdav/' or
'/mnt/ssh/infernal.machine/' in bash ? :)

> My program calling My VFS which calls KDE/Gnome's VFS which calls the OS
> VFS will be slowe than just calling the VFS immidiately - I do hope you
> can see that.

Some of kde/gnome vfs are not even filesystems at all, this is the
main advantage of userland vfs. Imagine that it is possible for user
to add new FS to kde/gnome, is it possible for him to add it to kernel
without root access ? compile new module and load it ? (Un)fortunately
not.

jaro
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix whitespace in wbsd

2005-07-11 Thread Pierre Ossman

Remove lots of trailing whitespace caused by not-so-great editor.

Signed-off-by: Pierre Ossman <[EMAIL PROTECTED]>
--- linux/drivers/mmc/wbsd.c.orig	2005-07-11 14:26:40.0 +0200
+++ linux/drivers/mmc/wbsd.c	2005-07-11 14:27:02.0 +0200
@@ -93,7 +93,7 @@
 static inline void wbsd_unlock_config(struct wbsd_host* host)
 {
 	BUG_ON(host->config == 0);
-	
+
 	outb(host->unlock_code, host->config);
 	outb(host->unlock_code, host->config);
 }
@@ -101,14 +101,14 @@
 static inline void wbsd_lock_config(struct wbsd_host* host)
 {
 	BUG_ON(host->config == 0);
-	
+
 	outb(LOCK_CODE, host->config);
 }
 
 static inline void wbsd_write_config(struct wbsd_host* host, u8 reg, u8 value)
 {
 	BUG_ON(host->config == 0);
-	
+
 	outb(reg, host->config);
 	outb(value, host->config + 1);
 }
@@ -116,7 +116,7 @@
 static inline u8 wbsd_read_config(struct wbsd_host* host, u8 reg)
 {
 	BUG_ON(host->config == 0);
-	
+
 	outb(reg, host->config);
 	return inb(host->config + 1);
 }
@@ -140,21 +140,21 @@
 static void wbsd_init_device(struct wbsd_host* host)
 {
 	u8 setup, ier;
-	
+
 	/*
 	 * Reset chip (SD/MMC part) and fifo.
 	 */
 	setup = wbsd_read_index(host, WBSD_IDX_SETUP);
 	setup |= WBSD_FIFO_RESET | WBSD_SOFT_RESET;
 	wbsd_write_index(host, WBSD_IDX_SETUP, setup);
-	
+
 	/*
 	 * Set DAT3 to input
 	 */
 	setup &= ~WBSD_DAT3_H;
 	wbsd_write_index(host, WBSD_IDX_SETUP, setup);
 	host->flags &= ~WBSD_FIGNORE_DETECT;
-	
+
 	/*
 	 * Read back default clock.
 	 */
@@ -164,12 +164,12 @@
 	 * Power down port.
 	 */
 	outb(WBSD_POWER_N, host->base + WBSD_CSR);
-	
+
 	/*
 	 * Set maximum timeout.
 	 */
 	wbsd_write_index(host, WBSD_IDX_TAAC, 0x7F);
-	
+
 	/*
 	 * Test for card presence
 	 */
@@ -177,7 +177,7 @@
 		host->flags |= WBSD_FCARD_PRESENT;
 	else
 		host->flags &= ~WBSD_FCARD_PRESENT;
-	
+
 	/*
 	 * Enable interesting interrupts.
 	 */
@@ -200,9 +200,9 @@
 static void wbsd_reset(struct wbsd_host* host)
 {
 	u8 setup;
-	
+
 	printk(KERN_ERR DRIVER_NAME ": Resetting chip\n");
-	
+
 	/*
 	 * Soft reset of chip (SD/MMC part).
 	 */
@@ -214,9 +214,9 @@
 static void wbsd_request_end(struct wbsd_host* host, struct mmc_request* mrq)
 {
 	unsigned long dmaflags;
-	
+
 	DBGF("Ending request, cmd (%x)\n", mrq->cmd->opcode);
-	
+
 	if (host->dma >= 0)
 	{
 		/*
@@ -232,7 +232,7 @@
 		 */
 		wbsd_write_index(host, WBSD_IDX_DMA, 0);
 	}
-	
+
 	host->mrq = NULL;
 
 	/*
@@ -275,7 +275,7 @@
 	host->offset = 0;
 	host->remain = host->cur_sg->length;
 	  }
-	
+
 	return host->num_sg;
 }
 
@@ -297,12 +297,12 @@
 	struct scatterlist* sg;
 	char* dmabuf = host->dma_buffer;
 	char* sgbuf;
-	
+
 	size = host->size;
-	
+
 	sg = data->sg;
 	len = data->sg_len;
-	
+
 	/*
 	 * Just loop through all entries. Size might not
 	 * be the entire list though so make sure that
@@ -317,23 +317,23 @@
 			memcpy(dmabuf, sgbuf, sg[i].length);
 		kunmap_atomic(sgbuf, KM_BIO_SRC_IRQ);
 		dmabuf += sg[i].length;
-		
+
 		if (size < sg[i].length)
 			size = 0;
 		else
 			size -= sg[i].length;
-	
+
 		if (size == 0)
 			break;
 	}
-	
+
 	/*
 	 * Check that we didn't get a request to transfer
 	 * more data than can fit into the SG list.
 	 */
-	
+
 	BUG_ON(size != 0);
-	
+
 	host->size -= size;
 }
 
@@ -343,12 +343,12 @@
 	struct scatterlist* sg;
 	char* dmabuf = host->dma_buffer;
 	char* sgbuf;
-	
+
 	size = host->size;
-	
+
 	sg = data->sg;
 	len = data->sg_len;
-	
+
 	/*
 	 * Just loop through all entries. Size might not
 	 * be the entire list though so make sure that
@@ -363,30 +363,30 @@
 			memcpy(sgbuf, dmabuf, sg[i].length);
 		kunmap_atomic(sgbuf, KM_BIO_SRC_IRQ);
 		dmabuf += sg[i].length;
-		
+
 		if (size < sg[i].length)
 			size = 0;
 		else
 			size -= sg[i].length;
-		
+
 		if (size == 0)
 			break;
 	}
-	
+
 	/*
 	 * Check that we didn't get a request to transfer
 	 * more data than can fit into the SG list.
 	 */
-	
+
 	BUG_ON(size != 0);
-	
+
 	host->size -= size;
 }
 
 /*
  * Command handling
  */
- 
+
 static inline void wbsd_get_short_reply(struct wbsd_host* host,
 	struct mmc_command* cmd)
 {
@@ -398,7 +398,7 @@
 		cmd->error = MMC_ERR_INVALID;
 		return;
 	}
-	
+
 	cmd->resp[0] =
 		wbsd_read_index(host, WBSD_IDX_RESP12) << 24;
 	cmd->resp[0] |=
@@ -415,7 +415,7 @@
 	struct mmc_command* cmd)
 {
 	int i;
-	
+
 	/*
 	 * Correct response type?
 	 */
@@ -424,7 +424,7 @@
 		cmd->error = MMC_ERR_INVALID;
 		return;
 	}
-	
+
 	for (i = 0;i < 4;i++)
 	{
 		cmd->resp[i] =
@@ -442,7 +442,7 @@
 {
 	int i;
 	u8 status, isr;
-	
+
 	DBGF("Sending cmd (%x)\n", cmd->opcode);
 
 	/*
@@ -451,16 +451,16 @@
 	 * transfer.
 	 */
 	host->isr = 0;
-	
+
 	/*
 	 * Send the command (CRC calculated by host).
 	 */
 	outb(cmd->opcode, host->base + WBSD_CMDR);
 	for (i = 3;i >= 0;i--)
 		outb((cmd->arg >> (i * 8)) & 0xff, host->base + WBSD_CMDR);
-	
+
 	cmd->error = MMC_ERR_NONE;
-	
+
 	/*
 	 * Wait for the request to complete.
 	 */
@@ -477,7 +477,7 @@
 		 * Read back status.
 		 */
 		isr = host->isr;
-		
+
 		/* Card removed? */
 		if (isr & WBSD_INT_CAR

Re: I have one doubt about detail of page reclaim.

2005-07-11 Thread [EMAIL PROTECTED]


[EMAIL PROTECTED] wrote:

   I am reading code of function balabce_pgdat(pg_data_t *pgdat, int 
nr_pages, int order).



   Sorry, that have one typo, it should be balance_pgdat().





  liyu/NOW:D

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Apple USB Touchpad driver (new)

2005-07-11 Thread Stelian Pop

Le lundi 11 juillet 2005 à 13:21 +0200, Vojtech Pavlik a écrit :
> On Mon, Jul 11, 2005 at 01:08:35PM +0200, Stelian Pop wrote:
> 
> > Possible. The 'fuzz' parameter in input core serves too many usages
> > ihmo. Let me try removing the quick motion compensation and see...
> 
> It was designed for joysticks and works very well for them. Usefulness
> for other device types may vary. And I'll gladly accept patches to
> improve it.

Ok, I understand now what is happenning, but I'm not sure how to solve
the problem. As I suspected, it is caused by 'fuzz' being a bit abused
by the input core.

The fuzz parameter in the input core is used today to say:
* any change in the -fuzz/2 / +fuzz/2 range is ignored
* any change in the -fuzz / +fuzz range is smoothed using x_old * 3 +
x) / 4;
* any change in the -fuzz*2 / +fuzz/2 range is smoothed using x_old
+x) / 2;

My driver needs to ignore changes in the -8 / +8 range (that's why I set
FUZZ to 16 in the first place), but it needs to smooth the movement when
much larger changes occur (I would need to set FUZZ to 64 for smoothing
to work correctly here).

How to make it work ? Obviously I could implement either fuzz
elimination or smoothing in the driver, and leave the other
transformation to the input core (today it is the smoothing which is in
the driver, but doing it the other way around would result in much less
code).

The other (proper ?) solution would be to change the input core and
separate fuzz and smoothing. This would however require an API addition,
and I'm not sure you want to do that. If you do, I could work on a patch
implementing an inputdev->abssmooth[] table, etc).

> > I thought the hardware is capable of calculating real pressure...
> 
> Since the sensor is just a multi-layer PCB with a clever trace layout,
> it can't.
> 
> > > > I don't think this value is reliable enough to be reported to the
> > > > userspace as ABS_PRESSURE...
> > > 
> > > I believe it'd still be more useful than a two-value (0 and 100) output.
> > 
> > Ok, I'll do it.
> 
> Thanks. Should I wait for that or apply the patch you just sent?

Well, it depends on what we do with smoothing.

Stelian.
-- 
Stelian Pop <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2.6.13-rc2] pci: restore BAR values from pci_set_power_state for D3hot->D0

2005-07-11 Thread Lennert Buytenhek

On Fri, Jul 08, 2005 at 02:34:56PM -0400, John W. Linville wrote:

> Some PCI devices lose all configuration (including BARs) when
> transitioning from D3hot->D0.  This leaves such a device in an
> inaccessible state.  The patch below causes the BARs to be restored
> when enabling such a device, so that its driver will be able to
> access it.

It might be useful to have this functionality exported to outside of
the generic PCI code.

There are a number of PCI boards that have their reset logic wired
up wrong and lose their config space info (BARs) when you reset them.
The Radisys ENP2611 PCI board is a good example -- it has its reset
logic wired in such a way that if you reset the (ARM-based) CPU on
the board, it also causes the 21555 nontransparent PCI bridge on the
board to be reset, which makes it lose all its primary config space
info (BARs, etc.)  The IXP1200 CPU-based PCI cards (now obsolete)
used to suffer from the same issue.

This is currently worked around in the driver, which caches all BAR
values when the module is first loaded, and detects when the card is
reset and then writes back all BARs manually.

--L
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCH 1/2] fsnotify

2005-07-11 Thread David Woodhouse

On Fri, 2005-07-08 at 18:26 -0700, Chris Wright wrote:
> Add fsnotify as infrastructure for various fs notifcation schemes.
> Move dnotify to fsnotify.

-   inode_dir_notify(dir, DN_CREATE);
+   fsnotify_create(dir, new_dentry->d_name.name);

+ * We don't compile any of this away in some complicated menagerie of ifdefs.
+ * Instead, we rely on the code inside to optimize away as needed.

+static inline void fsnotify_create(struct inode *inode, const char *name)
+{
+   dnotify_create(inode, name);
+}

+static inline void dnotify_create(struct inode *inode, const char *name)
+{
+   inode_dir_notify(inode, DN_CREATE);
+}

To be honest, I don't really see that this is in any way better than
what we had before. Yes, two different pieces of code actually use hooks
in similar places in the VFS code. But this 'infrastructure' just to
share those hooks is overkill as far as I can tell. It really isn't any
better than having both inotify and audit hooks side by side where we
can actually see what's going on at a glance. In fact, it's worse.

What would make sense, perhaps, would be to actually merge those hooks;
not just a cosmetic amalgamation of the calling sites. Currently, each
of inotify and the audit code does its own filtering when its hooks are
triggered, and then acts upon the event only if it affects a watched
inode. 

To actually merge that filtering code would make sense -- and then both
inotify and auditfs would just register watches on certain inodes and
receive notification as appropriate.

There are features of each which would probably be desirable in the
merged result. The inotify version grows the inode structure by quite a
lot, while the auditfs version recognises that watches will be
relatively infrequent, so uses only a bit in i_flags as a
quickly-accessible indication that an inode is 'interesting' and
maintains its actual data for that inode elsewhere. The auditfs version
is also capable of handling the "Child named '' of this inode" kind
of watch, where watches are placed on objects which don't yet exist.
Also, the auditfs code already calls back to a separate function
auditfs_attach_wdata() in the _real_ audit code, which actually reports
when the watch is triggered; it shouldn't be hard to make it call into
an inotify function instead, depending on the type of watch which is
hit.

On the other hand, the inotify triggers require a little more
information from the original hook, which audit_notify_watch() would
need to pass through if the audit code were used as the basis for a
merged 'core watch' functionality.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2.6.13-rc2] pci: restore BAR values from pci_set_power_state for D3hot->D0

2005-07-11 Thread John W. Linville

On Mon, Jul 11, 2005 at 02:48:44PM +0200, Lennert Buytenhek wrote:
> On Fri, Jul 08, 2005 at 02:34:56PM -0400, John W. Linville wrote:
> 
> > Some PCI devices lose all configuration (including BARs) when
> > transitioning from D3hot->D0.  This leaves such a device in an
> > inaccessible state.  The patch below causes the BARs to be restored
> > when enabling such a device, so that its driver will be able to
> > access it.
> 
> It might be useful to have this functionality exported to outside of
> the generic PCI code.

Fine by me...patch to follow...

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Dirty page tracking patch

2005-07-11 Thread Kimball Murray

Hello to all.

On behalf of Stratus Technologies (www.stratus.com) I'd like to
present a patch to the i386 kernel code that will allow developers to track
dirty memory pages.  Stratus uses this technique to facilitate bringing
separate cpu and memory module "nodes" into lockstep with each other, with
a minimum of OS down time. This feature could also be used to provide inputs
into memory management algorithms, or to support hot-plug memory dimm modules
for specially developed hardware.

Stratus has used this patch in kernels 2.4.2, 2.4.18, 2.6.5, and 2.6.9
with great success. Also this same technique in different forms has been
shipping in Stratus products for 25 years in many different operating systems.
Stratus would like to share this tracking capability with the community.  In
particular, we'd like to hear ideas anyone may have about other uses for it,
how to improve it technically, or even if there's a better way to do this.  In
its current state, it is pretty lightweight, but it's inevitable that
developers will find ways to make it better, and more versatile.


Thank in advance for those that take an interest in this discussion.

- Kimball Murray(Stratus Technologies, currently on-site at
 Redhat)

Please CC comments to:

[EMAIL PROTECTED]


What the patch does:
---

The patch adds a new _PAGE_SOFT_DIRTY bit to the pte layout.  This
takes the place of a previously unused bit in the pte.  The hardware will set
the _PAGE_DIRTY bit when the cpu writes to memory.  The _PAGE_SOFT_DIRTY bit
will only ever be set by software that is trying to copy live memory from one
memory domain to another. Such software would clear the _PAGE_DIRTY bit while
it sets the _PAGE_SOFT_DIRTY bit.  In this way, it would know that this page
has been added to a "must copy" list already.  A few kernel macros are
modified by this patch so that when the kernel queries the _PAGE_DIRTY bit, it
also queries the _PAGE_SOFT_DIRTY bit.  Doing this allows the kernel to run
normally whether or not a page harvest is in progress.
Another addition this patch provides, is the mm_track(void*) call,
which allows software to record that a particular page needs to be copied
later.  For synchronizing memory domains on a live system, a cyclic page
harvester must not only copy all newly-dirtied pages in a given pass to the
target memory domain, it must also worry about pages that were dirtied after
the last pass, but whose reference is lost before the next pass.  For this
reason, the patch adds a few calls to mm_track() in places where page
references are being retired.
Finally, tracking can be turned on and off by software at runtime.
When turned off, the impact of this patch on kernel performance is negligible.


What the patch doesn't do:
-

Pages dirtied by device DMA into memory are not captured by the
tracking mechanism provided by this patch.  Stratus has constructed a special
PCI bridge which has a "snarf" mode that, when enabled, directs all DMA to
memory to each of the participating cpu/memory nodes.  Further, Stratus
hardware also performs a hardware memory check before releasing new cpu/memory
nodes into lockstep service.  This provides a means of evaluating both the
memory tracking patch, and also a particular harvest algorithm.
The page harvest routine is not in this patch.  Stratus has a goal
that this patch have a minimal kernel footprint.  Therefore, our particular
harvest routine is in a kernel module.  Many implementations of the harvest
are possible as well, and we did not want to constrain that in this patch.


Below are more details about the Stratus hardware and our harvest
algorithm.  Some of the discussion below may repeat things already covered
above.  If you just want to have a look at the patch itself, then please
fast-forward to the word "snip" in this document.


Stratus Architecture


Stratus Technologies builds highly available, fault-tolerant servers
by provided for redundancy of all system components, including processors
and DIMMs. It is possible to remove and replace these components in a way that
is transparent to the applications running on the system.

Below is my poor man's sketch of the system layout.  The customer
replaceable units (CRUs) are usually 1-U rack-mounted slices.  The PCI bridge
in the middle box is split across the backplane, but appears as a single
PCI-PCI bridge to the OS.  The top and bottom halves of the bridge communicate
using a proprietary, packet-based protocol.

(below) 2 or 3 CPU CRUs

+-+   ++
|  CPUs and DIMMs |   |  CPUs and DIMMs|... lockstep domain
+-+   ++
  | |
  +-+
  |   |   P

[patch 2.6.13-rc2] PCI: Add symbol exports for pci_restore_bars

2005-07-11 Thread John W. Linville

Globalize and add EXPORT_SYMBOL for pci_restore_bars.

Signed-off-by: John W. Linville <[EMAIL PROTECTED]>
---
Some have expressed interest in making general use of the the
pci_restore_bars function.

 drivers/pci/pci.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -228,7 +228,7 @@ pci_find_parent_resource(const struct pc
  * Restore the BAR values for a given device, so as to make it
  * accessible by its driver.
  */
-static void
+void
 pci_restore_bars(struct pci_dev *dev)
 {
int i, numres;
@@ -833,6 +833,7 @@ struct pci_dev *isa_bridge;
 EXPORT_SYMBOL(isa_bridge);
 #endif
 
+EXPORT_SYMBOL(pci_restore_bars);
 EXPORT_SYMBOL(pci_enable_device_bars);
 EXPORT_SYMBOL(pci_enable_device);
 EXPORT_SYMBOL(pci_disable_device);
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-11 Thread Alan Cox

On Gwe, 2005-07-08 at 22:59, Andrew Morton wrote:
> Chris Wedgwood <[EMAIL PROTECTED]> wrote:
> >
> > On Thu, Jun 23, 2005 at 11:28:47AM -0700, Linux Kernel Mailing List wrote:
>   ^^
> 
> It's been over two weeks and nobody has complained about anything.

Then your mail system is faulty because I did. 1000Hz is good for
multmedia, 
100Hz is good for power management/older boxes, 250Hz is too fast for
some laptop APM to avoid clock slew, too fast for power saving and too
slow for some multimedia weenies.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-11 Thread Alan Cox

> > Because some machines exhibit appreciable latency in entering low power
> > state via ACPI, and 1000Hz reduces their battery life.  By about half,
> > iirc.
> > 
> Then the owners of such machines can use HZ=250 and leave the default
> alone.  Why should everyone have to bear the cost?

They need 100 really it seems, 250-500 have no real effect and on the
Dell I tried 250 didn't stop the wild clock slew from the APM bios
either. I played with this a fair bit on a couple of laptops. I've not
seen anything > 20% saving however so I've no idea who/why someone saw
50%
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Realtime Preemption, 2.6.12, Beginners Guide?

2005-07-11 Thread Paulo Marques


Ingo Molnar wrote:

(gdb) 
(gdb) # c013ebf4, stack size:  388 bytes #
(gdb) 
(gdb) 0xc013ebf4 is in __print_symbol (kernel/kallsyms.c:234).


The attached patch fixes this partially by reducing the stack usage by
128 bytes. Compile, boot and run tested and apparently it works fine.

I didn't want to use kmalloc's in there because this function is
probably called from very "hard" contexts (kernel OOPS, stack overflow
dumps, etc.).

The stack usage could be reduced even further (I can do a patch for this
if needed) by changing the function to receive a "prefix" and a "suffix"
string instead of a format string.

The function could then simply do:
   printk(prefix);
   printk(symbol);
   printk(address);
   if (module) printk(module name);
   printk(suffix);

This way it wouldn't need to allocate a buffer big enough for the whole
string, just for one symbol name (128 bytes).

This is a much more intrusive change however (there are ~65 callers that
would need changing), so I leave the decision to more experienced hackers :)

--
Paulo Marques - www.grupopie.com

It is a mistake to think you can solve any major problems
just with potatoes.
Douglas Adams
--- ./kernel/kallsyms.c.orig	2005-07-11 12:32:32.0 +0100
+++ ./kernel/kallsyms.c	2005-07-11 12:34:42.0 +0100
@@ -232,23 +232,21 @@ const char *kallsyms_lookup(unsigned lon
 /* Replace "%s" in format with address, or returns -errno. */
 void __print_symbol(const char *fmt, unsigned long address)
 {
-	char *modname;
+	char *modname, *bufend;
 	const char *name;
 	unsigned long offset, size;
-	char namebuf[KSYM_NAME_LEN+1];
 	char buffer[sizeof("%s+%#lx/%#lx [%s]") + KSYM_NAME_LEN +
 		2*(BITS_PER_LONG*3/10) + MODULE_NAME_LEN + 1];

-	name = kallsyms_lookup(address, &size, &offset, &modname, namebuf);
+	name = kallsyms_lookup(address, &size, &offset, &modname, buffer);

 	if (!name)
 		sprintf(buffer, "0x%lx", address);
 	else {
+		bufend = strchr(buffer, '\0');
+		bufend += sprintf(bufend, "+%#lx/%#lx", offset, size);
 		if (modname)
-			sprintf(buffer, "%s+%#lx/%#lx [%s]", name, offset,
-size, modname);
-		else
-			sprintf(buffer, "%s+%#lx/%#lx", name, offset, size);
+			sprintf(bufend, " [%s]", modname);
 	}
 	printk(fmt, buffer);
 }

Re: [RFC][PATCH] i386: Per node IDT

2005-07-11 Thread Zwane Mwaikambo

On Sun, 11 Jul 2005, Andi Kleen wrote:

> Why per node? Why not go the whole way and make it per CPU?
> 
> I would also not define it statically, but allocate it at boot time
> in node local memory.

I went per node so that it would be minimal/zero impact for the no-node 
case, it would also simplify hotplug cpu since once a cpu in a node goes 
down, we still have other participating processors capable of handling 
its devices without having to do too much migration work. I'll definitely 
incorporate the node local allocations however, for some i386 systems we 
might be forced to stick some additional IDTs on node 0 since the IDTR 
will only take 32bit addresses and we could end up with only highmem on 
some nodes.

Thanks for the feedback,
Zwane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Dirty page tracking patch

2005-07-11 Thread Arjan van de Ven

On Mon, 2005-07-11 at 09:16 -0400, Kimball Murray wrote:
> Hello to all.
> 
>   On behalf of Stratus Technologies (www.stratus.com) I'd like to
> present a patch to the i386 kernel code that will allow developers to track
> dirty memory pages.  Stratus uses this technique to facilitate bringing
> separate cpu and memory module "nodes" into lockstep with each other, with
> a minimum of OS down time. This feature could also be used to provide inputs
> into memory management algorithms, or to support hot-plug memory dimm modules
> for specially developed hardware.
> 

is the stratus code entirely open source/GPL ? (I assume it is since you
EXPORT_SYMBOL_GPL and also use other similar stuff). If so.. could you
post an URL to that? It's customary to do so when you post interface
patches for review so that the users of the interfaces can be seen, and
thus the interface better reviewed.

Also this patch is just plain weird/really corner case...

+#define mm_track(ptep)

you have to make that a do { ; } while (0) define as per kernel
convention/need

also you now make set_pte() and co more than trivial assignments, please
convert them to inlines so that typechecking is performed and no double
evaluation of arguments is done!
(this is a real problem for code that would do set_pte(pte++, value) in
a loop or so)

-   if (!pte_dirty(*ptep))
-   return 0;
-   return test_and_clear_bit(_PAGE_BIT_DIRTY, &ptep->pte_low);
+   mm_track(ptep);
+   return (test_and_clear_bit(_PAGE_BIT_DIRTY,
&ptep->pte_low) |
+   test_and_clear_bit(_PAGE_BIT_SOFTDIRTY,
&ptep->pte_low));
 }

are you sure you're not introducing a race condition there?
and if you're sure, why do you need 2 atomic ops in sequence?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Apple USB Touchpad driver (new)

2005-07-11 Thread Stelian Pop

Le lundi 11 juillet 2005 à 14:47 +0200, Stelian Pop a écrit :

> How to make it work ? Obviously I could implement either fuzz
> elimination or smoothing in the driver, and leave the other
> transformation to the input core (today it is the smoothing which is in
> the driver, but doing it the other way around would result in much less
> code).

And of course it cannot be done without modifying input core in order to
prevent it throwing away the samples. The only way to manage the current
situation is to implement smoothing in the driver itself (but I used the
same algorithm as in the input core, thus removing the unneeded history)

> > > > > I don't think this value is reliable enough to be reported to the
> > > > > userspace as ABS_PRESSURE...
> > > > 
> > > > I believe it'd still be more useful than a two-value (0 and 100) output.
> > > 
> > > Ok, I'll do it.

Implemented too.

Here is the latest incarnation of the patch, please apply this one.

If Vojtech decides that it makes sense to modify the input core in order
to separate fuzz and smoothing, then the driver could be simplified a
bit more.

Thanks,

Stelian.

Changes:
* report ABS_PRESSURE events
* simplify smoothing by using the same technique as the input
  core eliminating the need for a FIFO.

Signed-off-by: Stelian Pop <[EMAIL PROTECTED]>

 Documentation/input/appletouch.txt |   83 ++
 drivers/usb/input/Kconfig  |   19 +
 drivers/usb/input/Makefile |1 
 drivers/usb/input/appletouch.c |  480 +

 4 files changed, 583 insertions(+)
Index: linux-2.6.git/drivers/usb/input/Makefile
===
--- linux-2.6.git.orig/drivers/usb/input/Makefile   2005-07-11 
09:46:57.0 +0200
+++ linux-2.6.git/drivers/usb/input/Makefile2005-07-11 09:48:23.0 
+0200
@@ -39,3 +39,4 @@
 obj-$(CONFIG_USB_WACOM)+= wacom.o
 obj-$(CONFIG_USB_ACECAD)   += acecad.o
 obj-$(CONFIG_USB_XPAD) += xpad.o
+obj-$(CONFIG_USB_APPLETOUCH)   += appletouch.o
Index: linux-2.6.git/drivers/usb/input/Kconfig
===
--- linux-2.6.git.orig/drivers/usb/input/Kconfig2005-07-11 
09:46:57.0 +0200
+++ linux-2.6.git/drivers/usb/input/Kconfig 2005-07-11 09:48:23.0 
+0200
@@ -259,3 +259,22 @@
  To compile this driver as a module, choose M here: the module will be
  called ati_remote.
 
+config USB_APPLETOUCH
+   tristate "Apple USB Touchpad support"
+   depends on USB && INPUT
+   ---help---
+ Say Y here if you want to use an Apple USB Touchpad.
+
+ These are the touchpads that can be found on post-February 2005
+ Apple Powerbooks (prior models have a Synaptics touchpad connected
+ to the ADB bus).
+
+ This driver provides a basic mouse driver but can be interfaced
+ with the synaptics X11 driver to provide acceleration and
+ scrolling in X11.
+
+ For further information, see
+ .
+
+ To compile this driver as a module, choose M here: the
+ module will be called appletouch.
Index: linux-2.6.git/Documentation/input/appletouch.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.git/Documentation/input/appletouch.txt2005-07-11 
15:02:51.0 +0200
@@ -0,0 +1,83 @@
+Apple Touchpad Driver (appletouch)
+--
+   Copyright (C) 2005 Stelian Pop <[EMAIL PROTECTED]>
+
+appletouch is a Linux kernel driver for the USB touchpad found on post
+February 2005 Apple Alu Powerbooks.
+
+This driver is derived from Johannes Berg's appletrackpad driver[1], but it has
+been improved in some areas:
+   * appletouch is a full kernel driver, no userspace program is necessary
+   * appletouch can be interfaced with the synaptics X11 driver, in order
+ to have touchpad acceleration, scrolling, etc.
+
+Credits go to Johannes Berg for reverse-engineering the touchpad protocol,
+Frank Arnold for further improvements, and Alex Harper for some additional
+information about the inner workings of the touchpad sensors.
+
+Usage:
+--
+
+In order to use the touchpad in the basic mode, compile the driver and load
+the module. A new input device will be detected and you will be able to read
+the mouse data from /dev/input/mice (using gpm, or X11).
+
+In X11, you can configure the touchpad to use the synaptics X11 driver, which
+will give additional functionalities, like acceleration, scrolling etc. In
+order to do this, make sure you're using a recent version of the synaptics
+driver (tested with 0.14.2, available from [2]), and configure a new input
+device in your X11 configuration file (take a look below for an example). For
+additional configuration, see the synaptics driver documentation.
+
+   S

Re: [RFC] Atmel-supplied hardware headers for AT91RM9200 SoC processor

2005-07-11 Thread Alan Cox

> No reason to use the horror it is as-is.  Beein hardware description they
> won't change ever except for additions, so just clean the mess up into
> somethign nice and submit them.  You could have done so in the time you
> spent arguing on linux-arm-kernel already.

Or written a perl script to reprocess them into something saner for that
matter. The licensing does look problematic - perhaps Atmel will be
happy to dual license them (see the many BSD bits of code that are in
kernel and say
things like "or at your option you may use the GNU Public License
version 2 or
later" and similar.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Apple USB Touchpad driver (new)

2005-07-11 Thread Peter Osterlund

Stelian Pop <[EMAIL PROTECTED]> writes:

> Le lundi 11 juillet 2005 Ã  02:15 +0200, Peter Osterlund a Ã©crit :
> > Vojtech Pavlik <[EMAIL PROTECTED]> writes:
> > 
> > > Using a function like
> > > 
> > >   return (x_old * 3 + x) / 4;
> > > 
> > > eliminates the need for a FIFO, and has similar (if not better)
> > > properties to floating average, because its coefficients are
> > > [ .25 .18 .14 .10 ... ].
> > 
> > Agreed.
> 
> Except that this does not work well enough.
> 
> There are two problems I encountered in this driver:
> * fuzz problems (keeping the finger at the same place makes the pointer
> dance around its position). This is solved by the input core's fuzz
> treatment, as I already set the fuzz to 16 in the code.
> 
> * hickup problems (moving the finger generates non linear points,
> something like 1 1 1 3 3 3 4 4 4 instead of 1 1 1 2 2 3 3 4 4). And here
> the floating average approach works better than the input core's method.
> (this could probably be solved also by changing the way the absolute
> coordinate is calculated from the sensor array in atp_calculate_abs, but
> I haven't been able to find a better linear function).

It would be interesting if you could generate some debug dumps using
the "sample" line:

+   dbg_dump("sample", xy_cur);

The "accumulator" dumps are not needed, the raw data should be
enough. Including timing information would be helpful though, like
this:

--- a/drivers/usb/input/appletouch.c
+++ b/drivers/usb/input/appletouch.c
@@ -121,7 +121,7 @@ struct atp {
 #define dbg_dump(msg, tab) \
if (debug > 1) {\
int i;  \
-   printk("appletouch: %s ", msg); \
+   printk("appletouch: %s %lld ", msg, (long long)jiffies);\
for (i = 0; i < ATP_XSENSORS + ATP_YSENSORS; i++)   \
printk("%02x ", tab[i]);\
printk("\n");   \

Debug dumps for the following actions would be interesting.

1. When not touching the touchpad.
2. When trying to hold a finger on the touchpad without moving it.
3. A single finger movement. (Touch, move finger, release.)
4. A single finger touch. First a light touch, then pressing harder
   and harder, to see if a reliable pressure value can be computed
   from the data.
5. A two-finger touch.

> I would prefer to submit the patch myself, because as you say you cannot
> test the code and those changes are rather sensitive.

No problem, I just needed a patch when I was playing around with StGIT
and thought I might as well use a real patch.

-- 
Peter Osterlund - [EMAIL PROTECTED]
http://web.telia.com/~u89404340
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Apple USB Touchpad driver (new)

2005-07-11 Thread Peter Osterlund

Stelian Pop <[EMAIL PROTECTED]> writes:

> +/*
> + * Smooth the data sequence by estimating the slope for the data sequence
> + * [x3, x2, x1, x0] by using linear regression to fit a line to the data and
> + * use the slope of the line. Taken from the synaptics X driver.
> + */

This comment is not correct now that the code uses floating average
instead. Maybe just remove it. The floating average calculation is
much more obvious than the linear regression stuff.

-- 
Peter Osterlund - [EMAIL PROTECTED]
http://web.telia.com/~u89404340
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Realtime Preemption, 2.6.12, Beginners Guide?

2005-07-11 Thread Alistair John Strachan

On Saturday 09 Jul 2005 17:04, Alistair John Strachan wrote:
> On Saturday 09 Jul 2005 16:57, Ingo Molnar wrote:
> > * Alistair John Strachan <[EMAIL PROTECTED]> wrote:
> > > Okay, I'll send you the vmlinux from -18 with a new digital photo, and
> > > config, with CONFIG_4KSTACKS enabled.
> >
> > this crash too seems to indicate trigger_softirqs()/wakeup_softirqd().
> > Somewhere we somehow corrupt the stack and e.g. in oops7.jpg we return
> > to 00c011ed. Note that it's a right-shifted address that could be one of
> > these:
> >
> >  c011ed50 t wakeup_softirqd
> >  c011ed80 t trigger_softirqs
> >
> > but it looks pretty weird. DEBUG_STACK_POISON (and the full-debug
> > .config i sent) could perhaps uncover other types of stack corruptions.
>
> You weren't kidding about the overhead from DEBUG_STACK_POISON.
> Unfortunately that config causes a triple fault randomly after boot. The
> machine doesn't crash, or oops, it just resets.
>
> This problem has gone from bad to worse :-)

Okay, maybe not. Some combination of the debug options you enabled causes this 
problem, but DEBUG_STACK_POISON itself is not the cause. Today I compiled Yet 
Another (tm) CONFIG_4KSTACKS kernel with DEBUG_STACK_POISON, 
CONFIG_DEBUG_STACKOVERFLOW and CONFIG_LATENCY_TRACE, which worked fine (from 
a random reboot perspective).

I've also upgraded GCC to 4.0.1 as Jakub highlighted elsewhere in this thread 
that it had been released.

Here's a screenshot of the oops. Notice that "stack left" is now -52. We've 
confirmed this is a stack overflow!

http://devzero.co.uk/~alistair/oops8.jpeg

I'm going to try the 8K stack kernel with the same stuff and see if I can get 
a stack trace. I hope this is the beginning of the end for this problem.

-- 
Cheers,
Alistair.

personal:   alistair()devzero!co!uk
university: s0348365()sms!ed!ac!uk
student:CS/CSim Undergraduate
contact:1F2 55 South Clerk Street,
Edinburgh. EH8 9PP.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Atmel-supplied hardware headers for AT91RM9200 SoC processor

2005-07-11 Thread Andrew Victor

hi,

> > No reason to use the horror it is as-is.  Beein hardware description they
> > won't change ever except for additions, so just clean the mess up into
> > somethign nice and submit them.  You could have done so in the time you
> > spent arguing on linux-arm-kernel already.
> 
> Or written a perl script to reprocess them into something saner for
> that matter.

The issue that everybody seems to be forgetting (or ignoring) with
changing the headers is that ALL the drivers then also need to be
converted, and re-tested.


> The licensing does look problematic - perhaps Atmel will be happy to
> dual license them (see the many BSD bits of code that are in kernel
> and say things like "or at your option you may use the GNU Public
> License version 2 or later" and similar.

I have asked Atmel if they're willing to dual-license the headers.  The
licensing issue is probably now with their legal department, but I don't
see them having a problem with it.


Regards,
  Andrew Victor


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] eventpoll : Suppress a short lived lock from struct file

2005-07-11 Thread Davide Libenzi


On Mon, 11 Jul 2005, Eric Dumazet wrote:


Peter Zijlstra a écrit :

On Mon, 2005-07-11 at 09:18 +0200, Eric Dumazet wrote:

Have you tested the impact of this change on big SMP/NUMA machines?
I hate to see an Altrix crashing to its knees :-)



I tested on a small NUMA machine (2 nodes), with a epoll enabled application,
that use around 100 epoll ctl per second.

Of course, one may write a special benchmark on a BIG SMP/NUMA machine that
defeat these patch, using thousands of epoll ctl per second, but, a normal 
(well written ?)

epoll application doesnt constantly add/remove epoll ctl.

Should we waste 8 bytes per 'struct file' for a very unlikely micro benchmark 
?


Eric, I can't really say I like this one. Not at least after extensive 
tests run on top of it. You are asking to add a bottleneck to save 8 bytes 
on an entity that taken alone in more than 120 bytes. Consider that when 
you have a "struct file" allocated, the cost on the system is not only the 
struct itself, but all the allocations associated with it. For example, if 
you consider that a case where you might feel a "struct file" pressure is 
when you have hundreds of thousands of network connections, the 8 bytes 
saved compared to all the buffers associated with those sockets boils down 
to basically nothing.




- Davide

Re: [RFC][PATCH] i386: Per node IDT

2005-07-11 Thread Zwane Mwaikambo

On Mon, 11 Jul 2005, Arjan van de Ven wrote:

> On Mon, 2005-07-11 at 03:59 +0200, Andi Kleen wrote:
> > Why per node? Why not go the whole way and make it per CPU?
> 
> Agreed, for two reasons even
> 1) Per cpu allows for even more devices and cache locality
> 2) While few people have a NUMA system, many have an SMP system so you
> get a lot more testing.

Agreed, the first version was a per cpu one simply so that i could test it 
on a normal SMP system. Andi seems to be of the same opinion, what do you 
think of the hotplug cpu case (explained in previous email)?

Thanks Arjan,
Zwane

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: reiserfs + quotas in kernel 2.6.11.12

2005-07-11 Thread Jan Kara

  Hi,

> How stable is reiserfs quotas in 2.6.11.12?
  They should be pretty stable. At least I don't know about any reported
bugs in that or any newer version unless you are using 1KB blocks. With
1KB blocks there was a bug which should be fixed since 2.6.13-rc1 I
think.
Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

I have centrino laptop with no freq/voltage tables in BIOS

2005-07-11 Thread Mariusz Gniazdowski

Hi.
I have centrino laptop with no built-in frequency/voltage pairs in
BIOS/ACPI. I have found this thread:

http://lkml.org/lkml/2005/7/6/101

And it would be exactly what i need. My laptop is Gericom Blockbuster
Excellent. CPU:

cpu family  : 6
model   : 13
model name  : Intel(R) Pentium(R) M processor 1.60GHz
stepping: 6

-- 
Regards
Mariusz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Realtime Preemption, 2.6.12, Beginners Guide?

2005-07-11 Thread Ingo Molnar


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> might be an incorrect printout of stack_left :( The esp looks more or 
> less normal. Not sure why it printed -52.

here's the stack_left calculation:

+   printk("ds: %04x   es: %04x   ss: %04x   preempt: %08x\n",
+   regs->xds & 0x, regs->xes & 0x, ss, preempt_count());
+   printk("Process %s (pid: %d, threadinfo=%p task=%p stack_left=%ld 
worst_left=%ld)",
+   current->comm, current->pid, current_thread_info(), current,
+   (regs->esp & (THREAD_SIZE-1))-sizeof(struct thread_info),
+   worst_stack_left);

i cannot see anything wrong in it, but your esp is 0xc04cded0, 
THREAD_SIZE-1 is 0xfff, so the result should be:

0xed0-sizeof(struct thread_info).

which should not be -52.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

CONFIG_ALPHA_GENERIC problem with gcc-4.1

2005-07-11 Thread Dan Kegel


I've been doing builds of linux-2.6.11 as a sanity check
for new versions of gcc, and a problem just popped up
in arch/alpha/Makefile (see http://gcc.gnu.org/ml/gcc/2005-07/msg00397.html)
I think I can work around this myself by using CONFIG_ALPHA_EV6
instead of CONFIG_ALPHA_GENERIC, but here's my analysis
of the problem; maybe the alpha kernel maintainer can take it from here?

Take the following with a grain of salt; I don't know much
about alpha or gcc, I'm just doing a little QA.

arch/alpha/Makefile says:
36   # If GENERIC, make sure to turn off any instruction set extensions that
37   # the host compiler might have on by default.  Given that EV4 and EV5
38   # have the same instruction set, prefer EV5 because an EV5 schedule is
39   # more likely to keep an EV4 processor busy than vice-versa.
40   ifeq ($(CONFIG_ALPHA_GENERIC),y)
41 mcpu := ev5
42 mcpu_done := y
43   endif
...
84 # For TSUNAMI, we must have the assembler not emulate our instructions.
85 # The same is true for IRONGATE, POLARIS, PYXIS.
86 # BWX is most important, but we don't really want any emulation ever.
87 CFLAGS += $(cflags-y) -Wa,-mev6

Thus when you pick CONFIG_ALPHA_GENERIC, gcc is invoked with
the contradictory options -mcpu=ev5 -Wa,-mev6

This probably means that even on ev5, some ev6 instructions are used.
In particular, see include/asm-alpha/compiler.h:

#if defined(__alpha_bwx__)
#define __kernel_ldbu(mem)  (mem)
#define __kernel_ldwu(mem)  (mem)
#define __kernel_stb(val,mem)   ((mem) = (val))
#define __kernel_stw(val,mem)   ((mem) = (val))
#else
#define __kernel_ldbu(mem)  \
  ({ unsigned char __kir;   \
 __asm__("ldbu %0,%1" : "=r"(__kir) : "m"(mem));\
 __kir; })

That inline assembly is fine on ev5, but only if the assembler
is emulating the ldbu instruction with a macro -- exactly
the kind of thing arch/alpha/Makefile is trying to
inhibit when it says -Wa,-mev6.

This is an issue now because building the kernel with CONFIG_ALPHA_GENERIC
fails on the current gcc-4.1 snapshot with

> {standard input}:496: Error: macro requires $at register while noat in effect
> make[1]: *** [arch/alpha/kernel/core_cia.o] Error 1

and it looks like a kernel problem, not a gcc problem:
don't try to use ev6 instructions on ev5 or earlier processors.

That probably means conditionalizing that -Wa,ev6 properly,
but if that's hard, maybe it means dropping support for ev4 and ev5 processors,
and mapping CONFIG_ALPHA_GENERIC to ev6.  I wouldn't know...
- Dan

--
Trying to get a job as a c++ developer?  See 
http://kegel.com/academy/getting-hired.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ltp] IBM HDAPS Someone interested? (Userspace accelerometer viewer)

2005-07-11 Thread Alan Cox

On Llu, 2005-07-11 at 10:42, Paul Sladen wrote:
>   theta = (N - 512) * 0.5
> 
> provides a surprisingly good approximation for pitch/roll values in degrees
> in the range (-90..+90) so I think the sensor can do ~= +/-2.5G .
> 
>   http://www.paul.sladen.org/thinkpad-r31/aps/accelerometer-screenshot.png 
> (9kB)

Is the quality good enough to use it DEC itsy style as an input device
for games like Marble madness ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] i386: Per node IDT

2005-07-11 Thread Oleg Nesterov

Hello Zwane,

Zwane Mwaikambo wrote:
>
> > On Mon, 11 Jul 2005, Oleg Nesterov wrote:
> >
> > Could you explain this change? I think it breaks do_signal/handle_signal,
> > they check orig_eax >= 0 to handle -ERESTARTSYS:
> >
> > /* Are we from a system call? */
> > if (regs->orig_eax >= 0) {
> > /* If so, check system call restarting.. */
> > switch (regs->eax) {
> > case -ERESTART_RESTARTBLOCK:
> > case -ERESTARTNOHAND:
>
> The change is so that we can send IRQs higher than 256 to do_IRQ. That 
> looks like it tries to check if we came in via system_call since we'd save 
> the system call number as orig_eax. Now that i think about it, doesn't 
> that path always get taken when we interrupt userspace and have pending 
> signals on return from interrupt?

As far as I can see, we always have orig_eax < 0 on interrupt, because

irq_entries_start:
pushl $vector-256   <-  orig_eax
jmp common_interrupt

and NR_IRQS < 256. So if we have pending signals on return from interrupt,
do_signal() will not corrupt userspace registers when regs->eax == -ERESTART...
accidentally.

Probably it makes sense to change it to
pushl $vector - 0x - 1

and in do_IRQ()
int irq = regs->orig_eax & 0x

if you need to send IRQs higher than 256 to do_IRQ.

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Swapping broken on 2.6.9? Limit Page Cache growth?

2005-07-11 Thread Douglas McNaught

Jon Florence <[EMAIL PROTECTED]> writes:

> Hi,
> I have got a box running  2.6.9-1.667smp (FC3)

That's a Red Hat kernel so you should take it up with them, not the
LKML.

-Doug
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 5/12] lsm stacking v0.2: actual stacker module

2005-07-11 Thread Stephen Smalley

On Thu, 2005-06-30 at 14:50 -0500, [EMAIL PROTECTED] wrote:
> Adds the actual stacker LSM.

> +static int stacker_inode_getsecurity(struct inode *inode, const char *name, 
> void *buffer, size_t size)
> +{
> + 
> RETURN_ERROR_IF_ANY_ERROR(inode_getsecurity,inode_getsecurity(inode,name,buffer,size));
> +}
> +
> +static int stacker_inode_setsecurity(struct inode *inode, const char *name, 
> const void *value, size_t size, int flags)
> +{
> + 
> RETURN_ERROR_IF_ANY_ERROR(inode_setsecurity,inode_setsecurity(inode,name,value,size,flags));
> +}
> +
> +static int stacker_inode_listsecurity(struct inode *inode, char *buffer, 
> size_t buffer_size)
> +{
> + 
> RETURN_ERROR_IF_ANY_ERROR(inode_listsecurity,inode_listsecurity(inode,buffer, 
> buffer_size));
> +}

These hooks pose a similar problem for stacking as with the
[gs]etprocattr hooks, although [gs]etsecurity have the benefit of
already taking a distinguishing name suffix (the part after the
security. prefix).  Note also that inode_getsecurity returns the number
of bytes used/required on success.

The proposed inode_init_security hook will likewise have an issue for
stacking.

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.13-rc2 (git followed) unable to boot with initrd

2005-07-11 Thread Bas Vermeulen

I am currently unable to boot 2.6.13-rc2. I've got a working 2.6.13-rc1
whose .config I use to compile 2.6.13-rc2. I'm attaching the failed boot
log to this message. I'm booting with the same options as 2.6.13-rc1.

If anyone knows how to get it working again, I'd be grateful.

-- 
Bas Vermeulen <[EMAIL PROTECTED]>
[17179569.184000] Linux version 2.6.13-rc2 ([EMAIL PROTECTED]) (gcc v
ersion 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)) #48 Mon Jul 11 13:30:51 CEST 2005
[17179569.184000] BIOS-provided physical RAM map:
[17179569.184000]  BIOS-e820:  - 0009fc00 (usable)
[17179569.184000]  BIOS-e820: 0009fc00 - 000a (reserved)
[17179569.184000]  BIOS-e820: 0010 - 1ffd3000 (usable)
[17179569.184000]  BIOS-e820: 1ffd3000 - 2000 (reserved)
[17179569.184000]  BIOS-e820: feda - fee0 (reserved)
[17179569.184000]  BIOS-e820: ffb8 - 0001 (reserved)
[17179569.184000] 511MB LOWMEM available.
[17179569.184000] DMI 2.3 present.
[17179569.184000] ACPI: PM-Timer IO Port: 0x808
[17179569.184000] Allocating PCI resources starting at 2000 (gap: 2000:d
eda)
[17179569.184000] Built 1 zonelists
[17179569.184000] Kernel command line: ro root=LABEL=/ console=ttyS0,9600n8 cons
ole=tty0
[17179569.184000] Initializing CPU#0
[17179569.184000] CPU 0 irqstacks, hard=c095c000 soft=c095b000
[17179569.184000] PID hash table entries: 2048 (order: 11, 32768 bytes)
[17179569.184000] Detected 996.916 MHz processor.
[17179569.184000] Using pmtmr for high-res timesource
[17179569.184000] Console: colour VGA+ 80x25
[17179571.52] Dentry cache hash table entries: 131072 (order: 7, 524288 byte
s)
[17179571.608000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
 
[17179571.728000] Memory: 509256k/524108k available (5757k kernel code, 14200k r
eserved, 2562k data, 212k init, 0k highmem)
[17179571.856000] Checking if this processor honours the WP bit even in supervis
or mode... Ok.
[17179572.036000] Calibrating delay using timer specific routine.. 1995.50 BogoM
IPS (lpj=3991006)
[17179572.136000] Security Framework v1.0.0 initialized
[17179572.196000] Capability LSM initialized
[17179572.244000] Mount-cache hash table entries: 512
[17179572.30] CPU: L1 I cache: 16K, L1 D cache: 16K
[17179572.36] CPU: L2 cache: 512K
[17179572.40] Intel machine check architecture supported.
[17179572.468000] Intel machine check reporting enabled on CPU#0.
[17179572.536000] mtrr: v2.0 (20020519)
[17179572.58] CPU: Intel Mobile Intel(R) Pentium(R) III CPU - M  1000MHz ste
pping 04
[17179572.672000] Enabling fast FPU save and restore... done.
[17179572.74] Enabling unmasked SIMD FPU exception support... done.
[17179572.816000] Checking 'hlt' instruction... OK.
[17179572.888000]  tbxface-0118 [02] acpi_load_tables  : ACPI Tables success
fully acquired
[17179573.028000] Parsing all Control Methods:..


[17179573.32] Table [DSDT](id F004) - 431 Objects with 67 Devices 166 Method
s 7 Regions
[17179573.42] ACPI Namespace successfully loaded at root c09a4d40
[17179573.492000] ACPI: setting ELCR to 0200 (from 0800)
[17179573.556000] evxfevnt-0094 [03] acpi_enable   : Transition to ACPI 
mode successful
[17179573.656000] checking if image is initramfs... it is
[17179573.764000] Freeing initrd memory: 298k freed
[17179573.82] NET: Registered protocol family 16
[17179573.888000] PCI: PCI BIOS revision 2.10 entry at 0xfbfee, last bus=2
[17179573.964000] PCI: Using configuration type 1
[17179574.02] ACPI: Subsystem revision 20050309
[17179574.092000] evgpeblk-0979 [06] ev_create_gpe_block   : GPE 00 to 0F [_GPE]
 2 regs on int 0x9
[17179574.196000] evgpeblk-0987 [06] ev_create_gpe_block   : Found 5 Wake, Enabl
ed 0 Runtime GPEs in this block
[17179574.32] evgpeblk-0979 [06] ev_create_gpe_block   : GPE 10 to 1F [_GPE]
 2 regs on int 0x9
[17179574.424000] evgpeblk-0987 [06] ev_create_gpe_block   : Found 2 Wake, Enabl
ed 2 Runtime GPEs in this block
[17179574.548000] Completing Region/Field/Buffer/Package initialization:
...
[17179574.728000] Initialized 7/7 Regions 9/10 Fields 24/25 Buffers 23/33 Packag
es (440 nodes)
[17179574.832000] Executing all Device _STA and_INI methods:
...
[17179575.184000] 71 Devices found containing: 71 _STA, 3 _INI methods
[17179575.264000] ACPI: Interpreter enabled
[17179575.308000] ACPI: Using PIC for interrupt routing
[17179575.384000] ACPI: PCI Root Bridge [PCI0] (:00)
[17179575.444000] PCI: Probing PCI hardware (bus 00)
[17179575.50] PCI: Ignoring BAR0-3 of IDE controller :00:1f.1
[17179575.576000] PCI: Transparent bridge - :00:1e.0
[17179576.0

kernel guide to space

2005-07-11 Thread Michael S. Tsirkin

Hi!
I've been tasked with edicating some new hires on linux kernel coding style.
While we have Documentation/CodingStyle, it skips detail that is supposed to
be learned by example.

Since I've been burned by this a couple of times myself till I learned,
I've put together a short list of rules complementing Documentation/CodingStyle.
This list is attached, below.

Please cc me directly with comments, if any.

Thanks,
MST

---

kernel guide to space AKA a boring list of rules
http://www.mellanox.com/mst/boring.txt

This text deals mostly with whitespace issues, hence the name.

Whitespace -- In computer science, a whitespace (or a whitespace character) is
any character which does not display itself but does take up space.
From Wikipedia, the free encyclopedia.

1. Read Documentation/CodingStyle. Yes, it applies to you.
   When working on a specific driver/subsystem, try to follow
   the style of the surrounding codebase.

2. The last character on a line is never a whitespace
Get a decent editor and don't leave whitespace at the end of
lines.
Documentation/CodingStyle

Whitespace issues:

3. Space rules for C

3a. Binary operators
+ - / * %
== !=  > < >= <= && || 
& | ^ << >> 
= *= /= %= += -= <<= >>= &= ^= |=

spaces around the operator
a + b

3b. Unary operators
! ~
+ - *
&

no space between operator and operand
*a

3c. * in types
Leave space between name and * in types.
Multiple * dont need additional space between them.

struct foo **bar;

3d. Conditional
?:
spaces around both ? and :
a ? b : c

3e. sizeof
space after the operator
sizeof a

3f. Braces etc
() [] -> .

no space around any of these (but see 3h)
foo(bar)

3g. Comma
,
space after comma, no space before comma
foo, bar

3h. Semicolon
;
no space before semicolon
foo;

3i. if/else/do/while/for/switch
space between if/else/do/while and following/preceeding
statements/expressions, if any:

if (a) {
} else {
}

do {
} while (b);

3j. Labels
goto and case labels should have a line of their own (possibly
with a comment).
No space before colon in labels.

int foobar()
{
...
foolabel: /* short comment */
foo();
}

4. Indentation rules for C
Use tabs, not spaces, for indentation. Tabs should be 8 characters wide.

4a. Labels
case labels should be indented same as the switch statement.
statements occurring after a case label are indented by one level.

switch (foo) {
case foo:
bar();
default:
break;
}

4b. Global scope
Functions, type definitions/declarations, defines, global
variables etc are global scope. Start them at the first
character in a line (indent level 0).

static struct foo *foo_bar(struct foo *first, struct bar *second,
   struct foobar* thirsd);

4c. Breaking long lines
Descendants are always substantially shorter than the parent
and are placed substantially to the right.
Documentation/CodingStyle

Descendant must be indented at least to the level of the innermost
compound expression in the parent. All descendants at the same level
are indented the same.
if (foobar(.) + barbar * foobar(bar +
foo *
oof)) {
}

5. Blank lines
One blank line between functions.

void foo()
{
}

/* comment */
void bar()
{
}

No more than one blank line in a row.
Last (or first) line in a file is never blank.

Non-whitespace issues:

6. One-line statement does not need a {} block, so dont put it into one
if (foo)
bar;

7. Comments
Dont use C99 // comments.

8. Return codes
Functions that return success/failure status, should use 0 for success,
a negative value for failure.
Error codes are in linux/errno.h .

if (do_something()) {
handle_error();
return -EINVAL;
}

Functions that test a condition return 1 if condition is satisfied,
0 if its not.

if (is_condition())
condition_true();

9. Data types
Standard linux types are in linux/types.h .
See also Linux Device Drivers, Third Edition,
Chapter 11: Data Types in the Kernel.  http://lwn.net/images/pdf/LDD3/

9a. Integer types
int is the default integer type.
Use unsigned type if you perform bit operations (<<,>>,&,|,~).
Use unsi

Re: [RFC][PATCH] i386: Per node IDT

2005-07-11 Thread Brian Gerst


Zwane Mwaikambo wrote:

On Sun, 11 Jul 2005, Andi Kleen wrote:



Why per node? Why not go the whole way and make it per CPU?

I would also not define it statically, but allocate it at boot time
in node local memory.



I went per node so that it would be minimal/zero impact for the no-node 
case, it would also simplify hotplug cpu since once a cpu in a node goes 
down, we still have other participating processors capable of handling 
its devices without having to do too much migration work. I'll definitely 
incorporate the node local allocations however, for some i386 systems we 
might be forced to stick some additional IDTs on node 0 since the IDTR 
will only take 32bit addresses and we could end up with only highmem on 
some nodes.


Doesn't the IDTR take a virtual address?  It has to or else the f00f bug 
fix wouldn't work.


--
Brian Gerst
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] i386: Per node IDT

2005-07-11 Thread Zwane Mwaikambo

Hi Oleg,

On Mon, 11 Jul 2005, Oleg Nesterov wrote:

> > The change is so that we can send IRQs higher than 256 to do_IRQ. That 
> > looks like it tries to check if we came in via system_call since we'd save 
> > the system call number as orig_eax. Now that i think about it, doesn't 
> > that path always get taken when we interrupt userspace and have pending 
> > signals on return from interrupt?
> 
> As far as I can see, we always have orig_eax < 0 on interrupt, because
> 
> irq_entries_start:
>   pushl $vector-256   <-  orig_eax
>   jmp common_interrupt
> 
> and NR_IRQS < 256. So if we have pending signals on return from interrupt,
> do_signal() will not corrupt userspace registers when regs->eax == 
> -ERESTART...
> accidentally.
> 
> Probably it makes sense to change it to
>   pushl $vector - 0x - 1
> 
> and in do_IRQ()
>   int irq = regs->orig_eax & 0x
> 
> if you need to send IRQs higher than 256 to do_IRQ.

Good catch, thanks i'll change that!

Zwane

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] i386: Per node IDT

2005-07-11 Thread Oleg Nesterov

Oleg Nesterov wrote:
> 
> Probably it makes sense to change it to
> pushl $vector - 0x - 1
> 

Please note that entry.S:BUILD_INTERRUPT() also does this trick:
pushl $nr-256;

so it should be changed as well.

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: realtime-preempt-2.6.12-final-V0.7.51-11 glitches [no more]

2005-07-11 Thread Rui Nuno Capela

> * Rui Nuno Capela <[EMAIL PROTECTED]> wrote:
>
>> OTOH, I'll take this chance to show you something that is annoying me
>> for quite some time. Just look to the attached chart where I've marked
>> the spot with an arrow and a question mark. Its just one example of a
>> strange behavior/phenomenon while running the jack_test4.2 test on
>> PREEMPT_RT kernels: the CPU usage, which stays normally around 50%,
>> suddenly jumps to 60% steady, starting at random points in time but
>> always some time after the test has been started. Note that this
>> randomness surely adds to the the slight differences found on the
>> above results.
>
> how long does this condition persist? Firstly, please upgrade to the
> -51-16 kernel, previous kernels had a condition where interrupt storms
> (or repeat interrupts) could occur. (Your irqs/sec values dont suggest
> such a condition, but it could still occur.)
>
> Then could you enable profiling (CONFIG_PROFILING=y and profile=1 boot
> parameter), and create a script like this to capture a kernel profile
> for a fixed amount of time:
>
>  #!/bin/bash
>
>  readprofile -r  # reset profile
>  sleep 10
>  readprofile -n -m /home/mingo/System.map | sort -n
>
> and start it manually when the anomaly triggers. Also start it during a
> 'normal' period of the test. The output should give us a rough idea of
> what is happening. This type of profiling is very low-overhead so it
> wont disturb the condition.
>
> Note that you can increase the frequency and the quality of profiling by
> enabling the NMI watchdog (LOCAL_APIC in the .config and nmi_watchdog=2
> boot option), in the -RT kernel it will automatically switch the
> profiling tick to occur from NMI context. Such tracing will also show
> overhead occuring in irqs-off functions.
>

After several trials, with CONFIG_PROFILING=y and profile=1 nmi_watchdog=2
as boot parameters, I'm almost convinced I'm doing something wrong :)

- `readprofile` always just outputs one line:

 0 total0.

- `readprofile -a` gives the whole kernel symbol list, all with zero times.

Is there anything else I can check around here?
-- 
rncbc aka Rui Nuno Capela
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] i386: Per node IDT

2005-07-11 Thread Zwane Mwaikambo

On Mon, 11 Jul 2005, Brian Gerst wrote:

> Zwane Mwaikambo wrote:
> > On Sun, 11 Jul 2005, Andi Kleen wrote:
> > 
> > 
> > > Why per node? Why not go the whole way and make it per CPU?
> > > 
> > > I would also not define it statically, but allocate it at boot time
> > > in node local memory.
> > 
> > 
> > I went per node so that it would be minimal/zero impact for the no-node
> > case, it would also simplify hotplug cpu since once a cpu in a node goes
> > down, we still have other participating processors capable of handling its
> > devices without having to do too much migration work. I'll definitely
> > incorporate the node local allocations however, for some i386 systems we
> > might be forced to stick some additional IDTs on node 0 since the IDTR will
> > only take 32bit addresses and we could end up with only highmem on some
> > nodes.
> 
> Doesn't the IDTR take a virtual address?  It has to or else the f00f bug fix
> wouldn't work.

Yes you're right, i wasn't quite awake when i replied, thanks for 
correcting that.

Zwane

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git patches] IDE update

2005-07-11 Thread Alan Cox

On Maw, 2005-07-05 at 20:14, Jens Axboe wrote:
> IDE still has much lower overhead per command than your average SCSI
> hardware. SATA with FIS even improves on this, definitely a good thing!

But SCSI overlaps them while in PATA they are dead time. Thats why PATA
is so demanding of large I/O block sizes

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Paranoia] program cdparanoia not setting count and/or reply_len properly

2005-07-11 Thread Douglas Gilbert


Bill Davidsen wrote:

Aaron VonderHaar wrote:


When ripping from a scsi device (/dev/sg*) with linux kernel 2.6.11,
my kernel log is filled with messages like

=== dmesg ===
sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing data 
in;

  program cdparanoia not setting count and/or reply_len properly
printk: 40 messages suppressed.
sg_write: data in/out 30576/30576 bytes for SCSI command 
0xbe--guessing data in;

  program cdparanoia not setting count and/or reply_len properly
printk: 128 messages suppressed.
sg_write: data in/out 30576/30576 bytes for SCSI command 
0xbe--guessing data in;

  program cdparanoia not setting count and/or reply_len properly
printk: 149 messages suppressed.
sg_write: data in/out 16464/16464 bytes for SCSI command 
0xbe--guessing data in;

  program cdparanoia not setting count and/or reply_len properly
printk: 153 messages suppressed.
sg_write: data in/out 30576/30576 bytes for SCSI command 
0xbe--guessing data in;

  program cdparanoia not setting count and/or reply_len properly
printk: 153 messages suppressed.
=== END ===

After taking a hard look at the cdparanoia code handle_scsi_cmd(), and
the kernel sg driver, I've found the problem is this bit of code in
the kernel,

=== linux/drivers/scsi/sg.c LINE 566 ===
   /*
* SG_DXFER_TO_FROM_DEV is functionally equivalent to 
SG_DXFER_FROM_DEV,

* but is is possible that the app intended SG_DXFER_TO_DEV,
because there
* is a non-zero input_size, so emit a warning.
*/
   if (hp->dxfer_direction == SG_DXFER_TO_FROM_DEV)
   if (printk_ratelimit())
   printk(KERN_WARNING
  "sg_write: data in/out %d/%d bytes for
SCSI command 0x%x--"
  "guessing data in;\n" KERN_WARNING "   "
  "program %s not setting count and/or
reply_len properly\n",
  old_hdr.reply_len - (int)SZ_SG_HEADER,
  input_size, (unsigned int) cmnd[0],
  current->comm);
=== END ===

As I said, this is in kernel 2.6.11.  I noticed that this piece of
code is absent from 2.6.9, so is presumably a new error-checking
addition, which unfortunately breaks cdparanoia (the following comment
seems to explain why cdparanoia must set the count "incorrectly"),

=== cdparanoia-III-alpha9.8/interface/scsi_interface.c LINE 130 ===
 /* The following is one of the scariest hacks I've ever had to use.
The idea is this: We want to know if a command fails.  The
generic scsi driver (as of now) won't tell us; it hands back the
uninitialized contents of the preallocated kernel buffer.  We
force this buffer to a known value via another bug (nonzero data
length for a command that doesn't take data) such that we can
tell if the command failed.  Scared yet? */
=== END ===

With this new warning being logged for nearly every SCSI command
(hundreds of times per second), my system becomes unresponsive and
ripping is considerably slowed.

If I remove the warning code from the kernel and recompile it, ripping
seems to proceed as normal.

I'm not sure what ought to be done about this, but I though I should
at least record my hours' worth of bewilderment for the next person
who googles this error message.



Alan Cox has sent fixes for some of the problems in sg to one of the 
maintainers, but they don't seem to be in mainline. Perhaps he could 
send them to akpm and see if he is interested in fixing problems. Once 
the problem with error reporting is fixed (and that may not be the fix 
Alan has devised) by someone then paranoia can get rid of the egregious 
hack.


Bill,
If I have received any patches from Alan Cox on this
matter, then I have forgotten/misplaced them.

The change that upset cdparanoia dates from a thread
on this list (lsml) titled:
"[PATCH] sg.c to set direction more reliably ..."
from last year proposed by Jeremy Higdon, see:
http://marc.theaimsgroup.com/?l=linux-scsi&m=109350427728262&w=2

I'm surprised that cdparanoia is still using the sg
version 2 (or 1) interface, that comment is relevant
prior to lk 2.4.0 (and that problem could be worked
around from lk 2.2.6 which dates from 1998).
Perhaps it is an old version of cdparanoia.

I've expanded the recipients list, perhaps we'll get a status on (a) if 
the fix Alan has will cause correct error reporting, and (b) when it can 
be put in mainline. The paranoia can clean up its act.


Anyway, that "printk_ratelimit()" could be replaced
by a static so that the message is output once per
kernel lifetime. The SG_IO block layer passthrough does
something like this for commands that it doesn't understand.
Even that causes email queries to me ... "how come
the kernel reports LOG SENSE as an unknown opcode".
At least this single log warning doesn't break any
apps that I am aware of.

Doug Gilbert



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" i

Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-11 Thread Alan Cox

> 3) is tricky I guess, it's designed for cases that are like "I want a
> timer 1 second from now, but it's ok to be also at 1.5 seconds if that
> suits you better". Those cases are far less rare than you might think at
> first, most watchdog kind things are of this type. This accuracy thing
> will allow the kernel to save many wakeups by grouping them I suspect.
> 
> Alan: you worked on this before, where did you end up with ?

For #1/#2 add_timer_relative is an easy wrapper around add_timer for the
moment and I did play with that a bit. Never looked at the accuracy
stuff. In theory its a case of picking existing timeout points for low
resolution timers or tacking them onto an existing timeout that is near
enough. On the pratical side there area few problems with timer
performanc eon insert to consider

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Atmel-supplied hardware headers for AT91RM9200 SoC processor

2005-07-11 Thread Alan Cox

On Llu, 2005-07-11 at 14:57, Andrew Victor wrote:
> The issue that everybody seems to be forgetting (or ignoring) with
> changing the headers is that ALL the drivers then also need to be
> converted, and re-tested.

So its a few more lines of perl

> I have asked Atmel if they're willing to dual-license the headers.  The
> licensing issue is probably now with their legal department, but I don't
> see them having a problem with it.

Great
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] NMI: Update NMI users of RCU to use new API

2005-07-11 Thread Paul E. McKenney

Uses of RCU for dynamically changeable NMI handlers need to use the
new rcu_dereference() and rcu_assign_pointer() facilities.  This change
makes it clear that these uses are safe from a memory-barrier viewpoint,
but the main purpose is to document exactly what operations are being
protected by RCU.  This has been tested on x86 and x86-64, which are
the only architectures affected by this change.

Signed-off-by: <[EMAIL PROTECTED]>
---

 i386/kernel/traps.c |4 ++--
 x86_64/kernel/nmi.c |4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)


diff -urpN -X dontdiff linux-2.6.12-rc6/arch/i386/kernel/traps.c 
linux-2.6.12-rc6-NMIRCUfix/arch/i386/kernel/traps.c
--- linux-2.6.12-rc6/arch/i386/kernel/traps.c   2005-06-17 16:34:17.0 
-0700
+++ linux-2.6.12-rc6-NMIRCUfix/arch/i386/kernel/traps.c 2005-07-01 
15:16:24.0 -0700
@@ -626,7 +626,7 @@ fastcall void do_nmi(struct pt_regs * re
cpu = smp_processor_id();
++nmi_count(cpu);
 
-   if (!nmi_callback(regs, cpu))
+   if (!rcu_dereference(nmi_callback)(regs, cpu))
default_do_nmi(regs);
 
nmi_exit();
@@ -634,7 +634,7 @@ fastcall void do_nmi(struct pt_regs * re
 
 void set_nmi_callback(nmi_callback_t callback)
 {
-   nmi_callback = callback;
+   rcu_assign_pointer(nmi_callback, callback);
 }
 
 void unset_nmi_callback(void)
diff -urpN -X dontdiff linux-2.6.12-rc6/arch/x86_64/kernel/nmi.c 
linux-2.6.12-rc6-NMIRCUfix/arch/x86_64/kernel/nmi.c
--- linux-2.6.12-rc6/arch/x86_64/kernel/nmi.c   2005-06-17 16:34:24.0 
-0700
+++ linux-2.6.12-rc6-NMIRCUfix/arch/x86_64/kernel/nmi.c 2005-07-01 
15:15:21.0 -0700
@@ -522,14 +522,14 @@ asmlinkage void do_nmi(struct pt_regs * 
 
nmi_enter();
add_pda(__nmi_count,1);
-   if (!nmi_callback(regs, cpu))
+   if (!rcu_dereference(nmi_callback)(regs, cpu))
default_do_nmi(regs);
nmi_exit();
 }
 
 void set_nmi_callback(nmi_callback_t callback)
 {
-   nmi_callback = callback;
+   rcu_assign_pointer(nmi_callback, callback);
 }
 
 void unset_nmi_callback(void)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] eventpoll : Suppress a short lived lock from struct file

2005-07-11 Thread Eric Dumazet


Davide Libenzi a écrit :

Eric, I can't really say I like this one. Not at least after extensive 
tests run on top of it.


fair enough :)

You are asking to add a bottleneck to save 8 
bytes on an entity that taken alone in more than 120 bytes. Consider 
that when you have a "struct file" allocated, the cost on the system is 
not only the struct itself, but all the allocations associated with it. 
For example, if you consider that a case where you might feel a "struct 
file" pressure is when you have hundreds of thousands of network 
connections, the 8 bytes saved compared to all the buffers associated 
with those sockets boils down to basically nothing.


Well, the filp_cachep slab is created with SLAB_HWCACHE_ALIGN, enforcing a 
alignment of 64 bytes or even 128 bytes.

So it can be usefull to let the size of struct file goes from 0x84 to 0x80, because we can gain 64 or 128 bytes per file (0x80 bytes really 
allocated instead of 0xc0 or even 0x100 on Pentium 4).


In my case, I use other patches outside the scope of eventpoll (like declaring f_security only #ifdef CONFIG_SECURITY_SELINUX), and really 
gain 128 bytes of low memory per file. It reduces cache pressure for a given workload, and reduce lowmem pressure.


Before :

# grep filp /proc/slabinfo
filp   66633  66750256   151 : tunables  120   608 : 
slabdata   4450   4450 60


After :

# grep filp /proc/slabinfo
filp   82712  82987128   311 : tunables  120   608 : 
slabdata   2677   2677 20


It may appears to you as a penalty, but at least for me it is a noticeable gain.

Another candidate to "file struct" size reduction is the big struct file_ra_state that is included in all files, even sockets that dont use 
it, but that's a different story :)


Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix JBD race in t_forget list handling

2005-07-11 Thread Jan Kara

  Hello all,

  attached patch should close the possible race between
journal_commit_transaction() and journal_unmap_buffer() (which adds
buffers to committing transaction's t_forget list) that could leave
some buffers on transaction's t_forget list (hence leading to an
assertion failure later when transaction is dropped). The patch is
against 2.6.13-rc2 kernel.  The race was really happening to David Wilk
<[EMAIL PROTECTED]> (thanks for testing) so please apply if you find
the patch correct.

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
Fix race between journal_commit_transaction() and other places as
journal_unmap_buffer() that are adding buffers to transaction's t_forget
list. We have to protect against such places by holding j_list_lock even when
traversing the t_forget list. The fact that other places can only add buffers
to the list makes the locking easier. OTOH the lock ranking complicates
the stuff...

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>

diff -rup -x*.o -x.* linux-2.6.13-rc2/fs/jbd/commit.c 
linux-2.6.13-rc2-1-forgetfix/fs/jbd/commit.c
--- linux-2.6.13-rc2/fs/jbd/commit.cFri Jun 24 16:27:10 2005
+++ linux-2.6.13-rc2-1-forgetfix/fs/jbd/commit.cMon Jul 11 17:20:48 2005
@@ -720,11 +720,17 @@ wait_for_iobuf:
J_ASSERT(commit_transaction->t_log_list == NULL);
 
 restart_loop:
+   /*
+* As there are other places (journal_unmap_buffer()) adding buffers
+* to this list we have to be careful and hold the j_list_lock.
+*/
+   spin_lock(&journal->j_list_lock);
while (commit_transaction->t_forget) {
transaction_t *cp_transaction;
struct buffer_head *bh;
 
jh = commit_transaction->t_forget;
+   spin_unlock(&journal->j_list_lock);
bh = jh2bh(jh);
jbd_lock_bh_state(bh);
J_ASSERT_JH(jh, jh->b_transaction == commit_transaction ||
@@ -792,9 +798,25 @@ restart_loop:
journal_remove_journal_head(bh);  /* needs a brelse */
release_buffer_page(bh);
}
+   cond_resched_lock(&journal->j_list_lock);
+   }
+   spin_unlock(&journal->j_list_lock);
+   /*
+* This is a bit sleazy.  We borrow j_list_lock to protect
+* journal->j_committing_transaction in __journal_remove_checkpoint.
+* Really, __journal_remove_checkpoint should be using j_state_lock but
+* it's a bit hassle to hold that across __journal_remove_checkpoint
+*/
+   spin_lock(&journal->j_state_lock);
+   spin_lock(&journal->j_list_lock);
+   /*
+* Now recheck if some buffers did not get attached to the transaction
+* while the lock was dropped...
+*/
+   if (commit_transaction->t_forget) {
spin_unlock(&journal->j_list_lock);
-   if (cond_resched())
-   goto restart_loop;
+   spin_unlock(&journal->j_state_lock);
+   goto restart_loop;
}
 
/* Done with this transaction! */
@@ -803,14 +825,6 @@ restart_loop:
 
J_ASSERT(commit_transaction->t_state == T_COMMIT);
 
-   /*
-* This is a bit sleazy.  We borrow j_list_lock to protect
-* journal->j_committing_transaction in __journal_remove_checkpoint.
-* Really, __jornal_remove_checkpoint should be using j_state_lock but
-* it's a bit hassle to hold that across __journal_remove_checkpoint
-*/
-   spin_lock(&journal->j_state_lock);
-   spin_lock(&journal->j_list_lock);
commit_transaction->t_state = T_FINISHED;
J_ASSERT(commit_transaction == journal->j_committing_transaction);
journal->j_commit_sequence = commit_transaction->t_tid;

Re: kernel guide to space

2005-07-11 Thread Sander

Michael S. Tsirkin wrote (ao):
>   Use tabs, not spaces, for indentation. Tabs should be 8
>   characters wide.

A tab is a tab. The editor/viewer can be configured to show 2, 3, 4, 8,
any amount of characters, right?

Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

PAGE_BUG macro

2005-07-11 Thread Gustavo Guillermo Pérez

Hello list, I have old code, and updating I see PAGE_BUG was gone, is fine to 
diable with a macro to allow build on old kernels?.

I see on old post PAGE_BUG and friends seems to be exterminated

#ifdef PAGE_BUG


#endif

:|

-- 
Gustavo Guillermo Pérez
Compunauta uLinux
www.compunauta.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Early kmalloc/kfree

2005-07-11 Thread Alex Williamson

On Sat, 2005-07-09 at 18:06 -0700, Christoph Lameter wrote:
> On Fri, 9 Jul 2005, Andi Kleen wrote:
> 
> > I think that is a really really bad idea.   slab is already complex enough
> > and adding scary hacks like this will probably make it collapse
> > under its own weight at some point.
> 
> Seconded.
> 
> Maybe we can solve this by bringing the system up in a limited 
> configuration and then discover additional capabilities during ACPI 
> discovery and reconfigure.

   From a user perspective of the memory allocators, I liked this idea
of making the transition from bootmem to slab be transparent.  It's
currently extremely difficult to have any kind of service span the
transition when there doesn't even appear to be a programmatic way to
know which one to use. 

   The original problem Bob and I were trying to solve is simply how to
automatically deal with a system that may or may not have an IOMMU that
if it exists, is only discoverable in ACPI namespace.  Getting ACPI
namespace available by paginig_init() makes this relatively easy because
the memory zones can be setup properly for the hardware available.  If
we wait till after that point, we'll need to figure out how to
re-balance the dma and normal zones to make memory allocations
efficient.

   I agree that ACPI is potentially a slippery slope, and many pieces of
it are impractical for early use.  I think this can be controlled by
using common early setup services in the ACPI subsystem that limit what
components get initialized.  That said, I'm open to other suggestions on
how we might reconfigure the system later to accomplish this task.
Thanks,

Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] NMI: Update NMI users of RCU to use new API

2005-07-11 Thread Paul E. McKenney

Add documentation on how to use RCU to implement dynamically changeable
NMI handlers.

Signed-off-by: <[EMAIL PROTECTED]>
---

 NMI-RCU.txt |  112 
 1 files changed, 112 insertions(+)

diff -urpN -X dontdiff linux-2.6.12-rc6/Documentation/RCU/NMI-RCU.txt 
linux-2.6.12-rc6-RCUdoc/Documentation/RCU/NMI-RCU.txt
--- linux-2.6.12-rc6/Documentation/RCU/NMI-RCU.txt  1969-12-31 
16:00:00.0 -0800
+++ linux-2.6.12-rc6-RCUdoc/Documentation/RCU/NMI-RCU.txt   2005-06-28 
12:31:48.0 -0700
@@ -0,0 +1,112 @@
+Using RCU to Protect Dynamic NMI Handlers
+
+
+Although RCU is usually used to protect read-mostly data structures,
+it is possible to use RCU to provide dynamic non-maskable interrupt
+handlers, as well as dynamic irq handlers.  This document describes
+how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer
+work in "arch/i386/oprofile/nmi_timer_int.c" and in
+"arch/i386/kernel/traps.c".
+
+The relevant pieces of code are listed below, each followed by a
+brief explanation.
+
+   static int dummy_nmi_callback(struct pt_regs *regs, int cpu)
+   {
+   return 0;
+   }
+
+The dummy_nmi_callback() function is a "dummy" NMI handler that does
+nothing, but returns zero, thus saying that it did nothing, allowing
+the NMI handler to take the default machine-specific action.
+
+   static nmi_callback_t nmi_callback = dummy_nmi_callback;
+
+This nmi_callback variable is a global function pointer to the current
+NMI handler.
+ 
+   fastcall void do_nmi(struct pt_regs * regs, long error_code)
+   {
+   int cpu;
+
+   nmi_enter();
+
+   cpu = smp_processor_id();
+   ++nmi_count(cpu);
+
+   if (!rcu_dereference(nmi_callback)(regs, cpu))
+   default_do_nmi(regs);
+
+   nmi_exit();
+   }
+
+The do_nmi() function processes each NMI.  It first disables preemption
+in the same way that a hardware irq would, then increments the per-CPU
+count of NMIs.  It then invokes the NMI handler stored in the nmi_callback
+function pointer.  If this handler returns zero, do_nmi() invokes the
+default_do_nmi() function to handle a machine-specific NMI.  Finally,
+preemption is restored.
+
+Strictly speaking, rcu_dereference() is not needed, since this code runs
+only on i386, which does not need rcu_dereference() anyway.  However,
+it is a good documentation aid, particularly for anyone attempting to
+do something similar on Alpha.
+
+Quick Quiz:  Why might the rcu_dereference() be necessary on Alpha,
+given that the code referenced by the pointer is read-only?
+
+
+Back to the discussion of NMI and RCU...
+
+   void set_nmi_callback(nmi_callback_t callback)
+   {
+   rcu_assign_pointer(nmi_callback, callback);
+   }
+
+The set_nmi_callback() function registers an NMI handler.  Note that any
+data that is to be used by the callback must be initialized up -before-
+the call to set_nmi_callback().  On architectures that do not order
+writes, the rcu_assign_pointer() ensures that the NMI handler sees the
+initialized values.
+
+   void unset_nmi_callback(void)
+   {
+   rcu_assign_pointer(nmi_callback, dummy_nmi_callback);
+   }
+
+This function unregisters an NMI handler, restoring the original
+dummy_nmi_handler().  However, there may well be an NMI handler
+currently executing on some other CPU.  We therefore cannot free
+up any data structures used by the old NMI handler until execution
+of it completes on all other CPUs.
+
+One way to accomplish this is via synchronize_sched(), perhaps as
+follows:
+
+   unset_nmi_callback();
+   synchronize_sched();
+   kfree(my_nmi_data);
+
+This works because synchronize_sched() blocks until all CPUs complete
+any preemption-disabled segments of code that they were executing.
+Since NMI handlers disable preemption, synchronize_sched() is guaranteed
+not to return until all ongoing NMI handlers exit.  It is therefore safe
+to free up the handler's data as soon as synchronize_sched() returns.
+
+
+Answer to Quick Quiz
+
+   Why might the rcu_dereference() be necessary on Alpha, given
+   that the code referenced by the pointer is read-only?
+
+   Answer: The caller to set_nmi_callback() might well have
+   initialized some data that is to be used by the
+   new NMI handler.  In this case, the rcu_dereference()
+   would be needed, because otherwise a CPU that received
+   an NMI just after the new handler was set might see
+   the pointer to the new NMI handler, but the old
+   pre-initialized version of the handler's data.
+
+   More important, the rcu_dereference() makes it clear
+   to someone reading the code that the pointer is being
+   protected by RCU.
-
To unsubscribe from this list: send the l

Re: kernel guide to space

2005-07-11 Thread Dmitry Torokhov

Hi,

On 7/11/05, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:

> 3e. sizeof
>space after the operator
>sizeof a

If braces are used no spaces please : sizeof(struct foo) 
 
> 
> 4c. Breaking long lines
>Descendants are always substantially shorter than the parent
>and are placed substantially to the right.
>Documentation/CodingStyle
> 
>Descendant must be indented at least to the level of the innermost
>compound expression in the parent. All descendants at the same level
>are indented the same.
>if (foobar(.) + barbar * foobar(bar +
>foo *
>oof)) {
>}

Ugh, that's as ugly as it can get... Something like below is much
easier to read...

if (foobar(.) +
barbar * foobar(bar + foo * oof)) {
}

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Atmel-supplied hardware headers for AT91RM9200 SoC processor

2005-07-11 Thread Christoph Hellwig

On Mon, Jul 11, 2005 at 03:57:26PM +0200, Andrew Victor wrote:
> > Or written a perl script to reprocess them into something saner for
> > that matter.
> 
> The issue that everybody seems to be forgetting (or ignoring) with
> changing the headers is that ALL the drivers then also need to be
> converted, and re-tested.

Given your lack of taste they'll probably need a full rewrite anyway.
And given their not in-tree yet we don't care at all either.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Realtime Preemption, 2.6.12, Beginners Guide?

2005-07-11 Thread Alistair John Strachan

On Monday 11 Jul 2005 15:43, Ingo Molnar wrote:
> * Alistair John Strachan <[EMAIL PROTECTED]> wrote:
> > It's annoying that this is so readily reproducible here, yet almost
> > impossible to debug, and clearly a sideaffect of 4KSTACKS.. without it
> > actually being a stack overflow.
> >
> > I realise 4KSTACKS is a considerable rework of the IRQ handler, etc.
> > and probably even more heavily modified by rt-preempt, but is there
> > nothing else that can be tested before a serial console run?
>
> 4K stacks never really caused any trouble under PREEMPT_RT (or any other
> kernel i tried). It's not that complex either.
>
> one useful thing could be to give me exact instructions on how to set up
> an openvpn network similar to yours, and what kind of workload to
> generate. Maybe i can reproduce it here.

OpenVPN isn't terribly difficult to set up, but it's more than a 5 minute job. 
You'll need universal tun/tap in your kernel before you start, and openvpn 
itself installed (I've compiled from source and used Debian's 2.0.0 package, 
I'm sure Red Hat has an equivalent), then it's just a case of setting up a 
client and a server.

If you like, I can generate the "keys" used for server/client and I've 
attached the configs for the server and the client they we use here. 
Obviously for security reasons I can't attach OUR keys verbatim, but I'll 
instruct you on how to generate them.

So, on the server:

a) Install OpenVPN
b) mkdir -p /etc/openvpn/keys
c) Copy attached server.conf to /etc/openvpn
d) Modify server.conf if necessary (shouldn't be required)
e) Generate your server and client keys (see below)

This mostly repeats the moderately good documentation on 
http://openvpn.net/howto.html, but I can't expect you to read it all so I'll 
give you a bite-sized version. It saves you figuring out the same rubbish I 
had to about 6 months ago. OpenVPN will create (with my configs) a verbose 
log in /etc/openvpn/log on both machines.

1) cd /usr/share/doc/openvpn/easy-rsa

2) Edit "vars". Change line export KEY_DIR=... to:

export KEY_DIR=/etc/openvpn/keys

3) Save and exit

4) On Bash (at least) type

. ./vars

Which imports "vars" into your environment.

5) ./clean-all

6) ./build-ca (enter any old crap)

7) ./build-key-server server

Enter the common-name as "server" again. No password.

8) Finally, generate the client key (used by the client for crypto)

./build-key client1

Where "client1" is an arbitrary name. When prompted for "common-name", 
enter
the same string; this is important and I was head-scratching for some 
time
as to why it wouldn't work without this... Again no password.

8) ./build-dh (this takes a while)

With that done, /etc/openvpn/keys should contain at least..

01.pem
ca.{crt,key}
dh1024.pem
server.{crt,csr,key}
client1.{crt,csr,key}

Plus some other cruft that's probably not required. Now you should be able to 
start the openvpn server with something like..

openvpn --cd /etc/openvpn --config server.conf

Add some other flags like verbose if you want to see what's happening. 
Remember it's logging everything to /etc/openvpn/log which you can supress by 
commenting out the logfile line in the config.

It'll bring up a tun device on the server side, and wait patiently for VPN 
connections.

The client side is a piece of cake.

1) mkdir /etc/openvpn

2) Copy client1.crt, client1.key, and ca.crt from the server's /etc/openvpn 
directory to the client's /etc/openvpn directory.

3) Copy the attached client.conf to the same directory.

4) Edit the config as necessary and save (should work with only the server IP 
changes).

Again, the client machine will need to have the universal tun/tap driver 
loaded. Bring up the openvpn with:

openvpn --cd /etc/openvpn --config client.conf

A connection should be established and, hopefully, you'll get a pingable route 
to 10.0.0.1. I then made this my default gateway with:

route del default wlan
route add default tun0

Then I was able to ping machines on the server side without having a local 
gateway to them. One working VPN.

I suggest you try all this on a "stable" kernel, and once you've established 
it works, just transfer a file at a reasonable data rate through the tunnel.

Ours links to a company server with a consumer grade 1Mbit ADSL connection, 
and transferring just about anything at 110K/s causes the kernel to crash 
within about 10 seconds.

I wish you the best of luck with getting this going, and I apologise in 
advance for the poor instructions.

-- 
Cheers,
Alistair.

personal:   alistair()devzero!co!uk
university: s0348365()sms!ed!ac!uk
student:CS/CSim Undergraduate
contact:1F2 55 South Clerk Street,
Edinburgh. EH8 9PP.
client
remote 192.168.99.1 443

ca ca.crt
cert client1.crt
key client1.key
ns-cert-type server

dev tun
proto udp
nobind
user nobody
group nobody

persist-key
persist-tun

log /etc/openvpn/log
verb 3
server 10.0.0.0 255.255.255.0
port 443

[PATCH] Make ll_rw_block() wait for buffer lock

2005-07-11 Thread Jan Kara

  Hello,

  attached patch adds an operation SWRITE to ll_rw_block(). When this
operation is specified ll_rw_block() waits for a buffer lock and doesn't
just skip the locked buffer. Under some circumstances we need to make
sure that current data are really being sent to disk and the old
ll_rw_block()'s behaviour makes this impossible to achieve (as in some
places we lock and unlock buffer without sending it to disk). The patch
also changes the one caller in buffer.c. Please apply.

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
Introduce new ll_rw_block() operation SWRITE meaning that block layer should
wait for the buffer lock and write-out afterwards. Hence data in buffers
at the time of call are guaranteed to be submitted to the disk.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>

diff -rupX /home/jack/.kerndiffexclude linux-2.6.12-1-forgetfix/fs/buffer.c 
linux-2.6.12-2-ll_rw_block-fix/fs/buffer.c
--- linux-2.6.12-1-forgetfix/fs/buffer.c2005-06-28 13:26:17.0 
+0200
+++ linux-2.6.12-2-ll_rw_block-fix/fs/buffer.c  2005-07-07 07:10:34.0 
+0200
@@ -933,8 +933,7 @@ static int fsync_buffers_list(spinlock_t
 * contents - it is a noop if I/O is still in
 * flight on potentially older contents.
 */
-   wait_on_buffer(bh);
-   ll_rw_block(WRITE, 1, &bh);
+   ll_rw_block(SWRITE, 1, &bh);
brelse(bh);
spin_lock(lock);
}
@@ -2805,21 +2804,22 @@ int submit_bh(int rw, struct buffer_head
 
 /**
  * ll_rw_block: low-level access to block devices (DEPRECATED)
- * @rw: whether to %READ or %WRITE or maybe %READA (readahead)
+ * @rw: whether to %READ or %WRITE or %SWRITE or maybe %READA (readahead)
  * @nr: number of &struct buffer_heads in the array
  * @bhs: array of pointers to &struct buffer_head
  *
- * ll_rw_block() takes an array of pointers to &struct buffer_heads,
- * and requests an I/O operation on them, either a %READ or a %WRITE.
- * The third %READA option is described in the documentation for
- * generic_make_request() which ll_rw_block() calls.
+ * ll_rw_block() takes an array of pointers to &struct buffer_heads, and
+ * requests an I/O operation on them, either a %READ or a %WRITE.  The third
+ * %SWRITE is like %WRITE only we make sure that the *current* data in buffers
+ * are sent to disk. The fourth %READA option is described in the documentation
+ * for generic_make_request() which ll_rw_block() calls.
  *
  * This function drops any buffer that it cannot get a lock on (with the
- * BH_Lock state bit), any buffer that appears to be clean when doing a
- * write request, and any buffer that appears to be up-to-date when doing
- * read request.  Further it marks as clean buffers that are processed for
- * writing (the buffer cache won't assume that they are actually clean until
- * the buffer gets unlocked).
+ * BH_Lock state bit) unless SWRITE is required, any buffer that appears to be
+ * clean when doing a write request, and any buffer that appears to be
+ * up-to-date when doing read request.  Further it marks as clean buffers that
+ * are processed for writing (the buffer cache won't assume that they are
+ * actually clean until the buffer gets unlocked).
  *
  * ll_rw_block sets b_end_io to simple completion handler that marks
  * the buffer up-to-date (if approriate), unlocks the buffer and wakes
@@ -2835,11 +2835,13 @@ void ll_rw_block(int rw, int nr, struct 
for (i = 0; i < nr; i++) {
struct buffer_head *bh = bhs[i];
 
-   if (test_set_buffer_locked(bh))
+   if (rw == SWRITE)
+   lock_buffer(bh);
+   else if (test_set_buffer_locked(bh))
continue;
 
get_bh(bh);
-   if (rw == WRITE) {
+   if (rw == WRITE || rw == SWRITE) {
if (test_clear_buffer_dirty(bh)) {
bh->b_end_io = end_buffer_write_sync;
submit_bh(WRITE, bh);
diff -rupX /home/jack/.kerndiffexclude 
linux-2.6.12-1-forgetfix/include/linux/fs.h 
linux-2.6.12-2-ll_rw_block-fix/include/linux/fs.h
--- linux-2.6.12-1-forgetfix/include/linux/fs.h 2005-06-28 13:26:35.0 
+0200
+++ linux-2.6.12-2-ll_rw_block-fix/include/linux/fs.h   2005-07-07 
07:16:39.0 +0200
@@ -69,6 +69,7 @@ extern int dir_notify_enable;
 #define READ 0
 #define WRITE 1
 #define READA 2/* read-ahead  - don't block if no resources */
+#define SWRITE 3   /* for ll_rw_block() - wait for buffer lock */
 #define SPECIAL 4  /* For non-blockdevice requests in request queue */
 #define READ_SYNC  (READ | (1 << BIO_RW_SYNC))
 #define WRITE_SYNC (WRITE | (1 << BIO

[PATCH] Change ll_rw_block() calls in Reiser

2005-07-11 Thread Jan Kara

  Hello,

  attached patch changes ll_rw_block() calls in Reiserfs to make sure
that submitted data really reach the disk (the patch relies on the
previous ll_rw_block() patch).

Honza

-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
We need to be sure that current data in buffer are sent to disk.
Hence we need to call ll_rw_block() with SWRITE.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>

diff -rupX /home/jack/.kerndiffexclude 
linux-2.6.12-1-forgetfix/fs/reiserfs/journal.c 
linux-2.6.12-2-ll_rw_block-fix/fs/reiserfs/journal.c
--- linux-2.6.12-1-forgetfix/fs/reiserfs/journal.c  2005-06-28 
13:26:20.0 +0200
+++ linux-2.6.12-2-ll_rw_block-fix/fs/reiserfs/journal.c2005-07-07 
07:20:19.0 +0200
@@ -966,7 +966,7 @@ static int flush_commit_list(struct supe
  SB_ONDISK_JOURNAL_SIZE(s);
 tbh = journal_find_get_block(s, bn) ;
 if (buffer_dirty(tbh)) /* redundant, ll_rw_block() checks */
-   ll_rw_block(WRITE, 1, &tbh) ;
+   ll_rw_block(SWRITE, 1, &tbh) ;
 put_bh(tbh) ;
   }
   atomic_dec(&journal->j_async_throttle);
@@ -1977,7 +1977,7 @@ abort_replay:
   /* flush out the real blocks */
   for (i = 0 ; i < get_desc_trans_len(desc) ; i++) {
 set_buffer_dirty(real_blocks[i]) ;
-ll_rw_block(WRITE, 1, real_blocks + i) ;
+ll_rw_block(SWRITE, 1, real_blocks + i) ;
   }
   for (i = 0 ; i < get_desc_trans_len(desc) ; i++) {
 wait_on_buffer(real_blocks[i]) ;

Re: realtime-preempt-2.6.12-final-V0.7.51-11 glitches [no more]

2005-07-11 Thread Ingo Molnar

* Rui Nuno Capela <[EMAIL PROTECTED]> wrote:

> After several trials, with CONFIG_PROFILING=y and profile=1 
> nmi_watchdog=2 as boot parameters, I'm almost convinced I'm doing 
> something wrong :)
> 
> - `readprofile` always just outputs one line:
> 
>  0 total0.
> 
> - `readprofile -a` gives the whole kernel symbol list, all with zero times.
> 
> Is there anything else I can check around here?

it means that the NMI watchdog was not activated - i.e. the 'NMI' counts 
in /proc/interrupts do not increase. Do you have LOCAL_APIC enabled in 
the .config? If yes and if nmi_watchdog=1 does not work either then it's 
probably not possible to activate the NMI watchdog on your box. In that 
case try nmi_watchdog=0, that should activate normal profiling. (unless 
i've broken it via the profile-via-NMI changes ...)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Change HFS+ to not use ll_rw_block()

2005-07-11 Thread Jan Kara

  Hi,

  attached patch changes HFS+ to use sync_one_buffer() instead of
ll_rw_block() and wait_on_buffer().

Honza

-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
Use block layer predefined function.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>

diff -rupX /home/jack/.kerndiffexclude 
linux-2.6.12-1-forgetfix/fs/hfsplus/super.c 
linux-2.6.12-2-ll_rw_block-fix/fs/hfsplus/super.c
--- linux-2.6.12-1-forgetfix/fs/hfsplus/super.c 2005-06-28 13:26:18.0 
+0200
+++ linux-2.6.12-2-ll_rw_block-fix/fs/hfsplus/super.c   2005-07-09 
01:52:10.0 +0200
@@ -217,8 +217,7 @@ static void hfsplus_put_super(struct sup
vhdr->attributes |= cpu_to_be32(HFSPLUS_VOL_UNMNT);
vhdr->attributes &= cpu_to_be32(~HFSPLUS_VOL_INCNSTNT);
mark_buffer_dirty(HFSPLUS_SB(sb).s_vhbh);
-   ll_rw_block(WRITE, 1, &HFSPLUS_SB(sb).s_vhbh);
-   wait_on_buffer(HFSPLUS_SB(sb).s_vhbh);
+   sync_dirty_buffer(HFSPLUS_SB(sb).s_vhbh);
}
 
hfs_btree_close(HFSPLUS_SB(sb).cat_tree);
@@ -415,8 +414,7 @@ static int hfsplus_fill_super(struct sup
vhdr->attributes &= cpu_to_be32(~HFSPLUS_VOL_UNMNT);
vhdr->attributes |= cpu_to_be32(HFSPLUS_VOL_INCNSTNT);
mark_buffer_dirty(HFSPLUS_SB(sb).s_vhbh);
-   ll_rw_block(WRITE, 1, &HFSPLUS_SB(sb).s_vhbh);
-   wait_on_buffer(HFSPLUS_SB(sb).s_vhbh);
+   sync_dirty_buffer(HFSPLUS_SB(sb).s_vhbh);
 
if (!HFSPLUS_SB(sb).hidden_dir) {
printk("HFS+: create hidden dir...\n");

Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-11 Thread Chris Wedgwood

On Mon, Jul 11, 2005 at 10:05:10AM -0400, Theodore Ts'o wrote:

> The real answer here is for the tickless patches to cleaned up to
> the point where they can be merged, and then we won't waste battery
> power entering the timer interrupt in the first place.  :-)

Whilst conceptually this is a nice idea I've yet to see any viable
code that overall has a lower cost.  Tickless is a really nice idea
for embedded devices and also paravirtualized hardware but I don't
think anyone has it working well enough yet do they?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Change ll_rw_block() calls in UFS

2005-07-11 Thread Jan Kara

  Hi,

  attached patch changes UFS to use SWRITE when sending data to disk in
O_SYNC mode. Please apply.

Honza


-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
We need to be sure that current data are sent to disk. Hence we call
ll_rw_block() with SWRITE.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>

diff -rupX /home/jack/.kerndiffexclude linux-2.6.12-1-forgetfix/fs/ufs/balloc.c 
linux-2.6.12-2-ll_rw_block-fix/fs/ufs/balloc.c
--- linux-2.6.12-1-forgetfix/fs/ufs/balloc.c2005-01-05 17:19:34.0 
+0100
+++ linux-2.6.12-2-ll_rw_block-fix/fs/ufs/balloc.c  2005-07-09 
02:00:04.0 +0200
@@ -114,8 +114,7 @@ void ufs_free_fragments (struct inode * 
ubh_mark_buffer_dirty (USPI_UBH);
ubh_mark_buffer_dirty (UCPI_UBH);
if (sb->s_flags & MS_SYNCHRONOUS) {
-   ubh_wait_on_buffer (UCPI_UBH);
-   ubh_ll_rw_block (WRITE, 1, (struct ufs_buffer_head **)&ucpi);
+   ubh_ll_rw_block (SWRITE, 1, (struct ufs_buffer_head **)&ucpi);
ubh_wait_on_buffer (UCPI_UBH);
}
sb->s_dirt = 1;
@@ -200,8 +199,7 @@ do_more:
ubh_mark_buffer_dirty (USPI_UBH);
ubh_mark_buffer_dirty (UCPI_UBH);
if (sb->s_flags & MS_SYNCHRONOUS) {
-   ubh_wait_on_buffer (UCPI_UBH);
-   ubh_ll_rw_block (WRITE, 1, (struct ufs_buffer_head **)&ucpi);
+   ubh_ll_rw_block (SWRITE, 1, (struct ufs_buffer_head **)&ucpi);
ubh_wait_on_buffer (UCPI_UBH);
}
 
@@ -459,8 +457,7 @@ ufs_add_fragments (struct inode * inode,
ubh_mark_buffer_dirty (USPI_UBH);
ubh_mark_buffer_dirty (UCPI_UBH);
if (sb->s_flags & MS_SYNCHRONOUS) {
-   ubh_wait_on_buffer (UCPI_UBH);
-   ubh_ll_rw_block (WRITE, 1, (struct ufs_buffer_head **)&ucpi);
+   ubh_ll_rw_block (SWRITE, 1, (struct ufs_buffer_head **)&ucpi);
ubh_wait_on_buffer (UCPI_UBH);
}
sb->s_dirt = 1;
@@ -585,8 +582,7 @@ succed:
ubh_mark_buffer_dirty (USPI_UBH);
ubh_mark_buffer_dirty (UCPI_UBH);
if (sb->s_flags & MS_SYNCHRONOUS) {
-   ubh_wait_on_buffer (UCPI_UBH);
-   ubh_ll_rw_block (WRITE, 1, (struct ufs_buffer_head **)&ucpi);
+   ubh_ll_rw_block (SWRITE, 1, (struct ufs_buffer_head **)&ucpi);
ubh_wait_on_buffer (UCPI_UBH);
}
sb->s_dirt = 1;
diff -rupX /home/jack/.kerndiffexclude linux-2.6.12-1-forgetfix/fs/ufs/ialloc.c 
linux-2.6.12-2-ll_rw_block-fix/fs/ufs/ialloc.c
--- linux-2.6.12-1-forgetfix/fs/ufs/ialloc.c2005-03-03 18:58:30.0 
+0100
+++ linux-2.6.12-2-ll_rw_block-fix/fs/ufs/ialloc.c  2005-07-09 
02:01:06.0 +0200
@@ -124,8 +124,7 @@ void ufs_free_inode (struct inode * inod
ubh_mark_buffer_dirty (USPI_UBH);
ubh_mark_buffer_dirty (UCPI_UBH);
if (sb->s_flags & MS_SYNCHRONOUS) {
-   ubh_wait_on_buffer (UCPI_UBH);
-   ubh_ll_rw_block (WRITE, 1, (struct ufs_buffer_head **) &ucpi);
+   ubh_ll_rw_block (SWRITE, 1, (struct ufs_buffer_head **) &ucpi);
ubh_wait_on_buffer (UCPI_UBH);
}

@@ -249,8 +248,7 @@ cg_found:
ubh_mark_buffer_dirty (USPI_UBH);
ubh_mark_buffer_dirty (UCPI_UBH);
if (sb->s_flags & MS_SYNCHRONOUS) {
-   ubh_wait_on_buffer (UCPI_UBH);
-   ubh_ll_rw_block (WRITE, 1, (struct ufs_buffer_head **) &ucpi);
+   ubh_ll_rw_block (SWRITE, 1, (struct ufs_buffer_head **) &ucpi);
ubh_wait_on_buffer (UCPI_UBH);
}
sb->s_dirt = 1;
diff -rupX /home/jack/.kerndiffexclude 
linux-2.6.12-1-forgetfix/fs/ufs/truncate.c 
linux-2.6.12-2-ll_rw_block-fix/fs/ufs/truncate.c
--- linux-2.6.12-1-forgetfix/fs/ufs/truncate.c  2005-03-03 18:58:30.0 
+0100
+++ linux-2.6.12-2-ll_rw_block-fix/fs/ufs/truncate.c2005-07-09 
02:01:52.0 +0200
@@ -285,8 +285,7 @@ next:;
}
}
if (IS_SYNC(inode) && ind_ubh && ubh_buffer_dirty(ind_ubh)) {
-   ubh_wait_on_buffer (ind_ubh);
-   ubh_ll_rw_block (WRITE, 1, &ind_ubh);
+   ubh_ll_rw_block (SWRITE, 1, &ind_ubh);
ubh_wait_on_buffer (ind_ubh);
}
ubh_brelse (ind_ubh);
@@ -353,8 +352,7 @@ static int ufs_trunc_dindirect (struct i
}
}
if (IS_SYNC(inode) && dind_bh && ubh_buffer_dirty(dind_bh)) {
-   ubh_wait_on_buffer (dind_bh);
-   ubh_ll_rw_block (WRITE, 1, &dind_bh);
+   ubh_ll_rw_block (SWRITE, 1, &dind_bh);
ubh_wait_on_buffer (dind_bh);
}
ubh_brelse (dind_bh);
@@ -418,8 +416,7 @@ static int ufs_trunc_tindirect (struct i
}
}
if (IS_SYNC(inode) && tind_bh && ubh_buffer_dirty(tind_bh)) {
-   ubh_wait_on_buffer (tind_bh);
-   ubh_ll_

Re: [RFC/PATCH 1/2] fsnotify

2005-07-11 Thread Robert Love

On Mon, 2005-07-11 at 13:52 +0100, David Woodhouse wrote:

> To be honest, I don't really see that this is in any way better than
> what we had before. Yes, two different pieces of code actually use hooks
> in similar places in the VFS code. But this 'infrastructure' just to
> share those hooks is overkill as far as I can tell. It really isn't any
> better than having both inotify and audit hooks side by side where we
> can actually see what's going on at a glance. In fact, it's worse.

I think what makes this patch look superfluous is that Chris added a set
of wrappers for dnotify, too.

In the inotify patch, the fsnotify wrappers call directly into the
inotify and dnotify interfaces and they do consolidate code and clean
things up.  I added fsnotify at hch's request.

Now that audit is coming along, fsnotify makes even more sense.

I would like to share some more code at a lower level, though, as you
pointed out.

I planned to look at redoing dnotify entirely on top of inotify, once
inotify is in the kernel proper, for example.

Robert Love

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Attempted summary of "RT patch acceptance" thread, take 2

2005-07-11 Thread Daniel Walker


The PREEMPT_RT description doesn't seem correct. According to your
"hard" definition, PREEMPT_RT can provably hit a hard deadline for
interrupt response. 


Daniel


On Mon, 2005-07-11 at 07:55 -0700, Paul E. McKenney wrote:


> 
> a.Quality of service: "soft realtime", with timeframe of a few 10s
>   of microseconds for task scheduling and interrupt-handler entry.
>   System services providing I/O, networking, task creation, and
>   VM manipulation can take much longer, though some subsystems
>   (e.g., ALSA) have been reworked to obtain good latencies.
>   Since spinlocks are replaced by blocking mutexes, the performance
>   penalty can be significant (up to 40%) for some system calls,
>   but user-mode execution runs at full speed.  There is likely to
>   be some performance penalty exacted from RCU, but, with luck,
>   this penalty will be minimal.
> 
>   Kristian Benoit and Karim Yaghmour have run an impressive set of
>   benchmarks comparing CONFIG_PREEMPT_RT with CONFIG_PREEMPT(?) and
>   Ipipe, see the LKML threads starting with:
> 
>   1. http://marc.theaimsgroup.com/?l=linux-kernel&m=111846495403131&w=2
>   2. http://marc.theaimsgroup.com/?l=linux-kernel&m=111928813818151&w=2
>   3. http://marc.theaimsgroup.com/?l=linux-kernel&m=112008491422956&w=2
>   4. http://marc.theaimsgroup.com/?l=linux-kernel&m=112086443319815&w=2
> 
>   This last run put CONFIG_PREEMPT_RT at about 70 microseconds
>   interrupt-response-time latency.  The machine under test was a
>   Dell PowerEdge SC420 with a P4 2.8GHz CPU and 256MB RAM running
>   a UP build of Fedora Core 3.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix race in do_get_write_access()

2005-07-11 Thread Jan Kara

  Hello,

  attached patch should fix the following race:
 Proc 1   Proc 2

 __flush_batch()
   ll_rw_block()
do_get_write_access()
   lock_buffer
 jh is only waiting for checkpoint
 -> b_transaction == NULL ->
 do nothing
   unlock_buffer
test_set_buffer_locked()
test_clear_buffer_dirty()
   __journal_file_buffer()
change the data
submit_bh()

  and we have sent wrong data to disk... We now clean the dirty buffer
flag under buffer lock in all cases and hence we know that whenever a buffer
is starting to be journaled we either finish the pending write-out
before attaching a buffer to a transaction or we won't write the buffer
until the transaction is going to be committed... Please apply.

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
The test in jbd_unexpected_dirty_buffer() is redundant - remove it.
Furthermore we have to clear the buffer dirty bit under the buffer lock
to prevent races with buffer write-out (and hence prevent returning
a buffer with IO happening).

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>

diff -rupX /home/jack/.kerndiffexclude 
linux-2.6.12-2-ll_rw_block-fix/fs/jbd/transaction.c 
linux-2.6.12-3-early-writeout-fix/fs/jbd/transaction.c
--- linux-2.6.12-2-ll_rw_block-fix/fs/jbd/transaction.c 2005-06-28 
13:26:18.0 +0200
+++ linux-2.6.12-3-early-writeout-fix/fs/jbd/transaction.c  2005-07-09 
08:40:01.0 +0200
@@ -493,20 +493,17 @@ static void jbd_unexpected_dirty_buffer(
struct buffer_head *bh = jh2bh(jh);
int jlist;
 
-   if (buffer_dirty(bh)) {
-   /* If this buffer is one which might reasonably be dirty
-* --- ie. data, or not part of this journal --- then
-* we're OK to leave it alone, but otherwise we need to
-* move the dirty bit to the journal's own internal
-* JBDDirty bit. */
-   jlist = jh->b_jlist;
-
-   if (jlist == BJ_Metadata || jlist == BJ_Reserved || 
-   jlist == BJ_Shadow || jlist == BJ_Forget) {
-   if (test_clear_buffer_dirty(jh2bh(jh))) {
-   set_bit(BH_JBDDirty, &jh2bh(jh)->b_state);
-   }
-   }
+   /* If this buffer is one which might reasonably be dirty
+* --- ie. data, or not part of this journal --- then
+* we're OK to leave it alone, but otherwise we need to
+* move the dirty bit to the journal's own internal
+* JBDDirty bit. */
+   jlist = jh->b_jlist;
+
+   if (jlist == BJ_Metadata || jlist == BJ_Reserved || 
+   jlist == BJ_Shadow || jlist == BJ_Forget) {
+   if (test_clear_buffer_dirty(jh2bh(jh)))
+   set_bit(BH_JBDDirty, &jh2bh(jh)->b_state);
}
 }
 
@@ -574,9 +571,14 @@ repeat:
if (jh->b_next_transaction)
J_ASSERT_JH(jh, jh->b_next_transaction ==
transaction);
-   JBUFFER_TRACE(jh, "Unexpected dirty buffer");
-   jbd_unexpected_dirty_buffer(jh);
-   }
+   }
+   /*
+* In any case we need to clean the dirty flag and we must
+* do it under the buffer lock to be sure we don't race
+* with running write-out.
+*/
+   JBUFFER_TRACE(jh, "Unexpected dirty buffer");
+   jbd_unexpected_dirty_buffer(jh);
}
 
unlock_buffer(bh);

[PATCH] Change ll_rw_block() calls in JBD

2005-07-11 Thread Jan Kara

  Hi,

  attached patch changes calls of ll_rw_block() in JBD to make sure the
data really reach the disk.

Honza

-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
We must be sure that the current data in buffer are sent to disk. Hence
we have to call ll_rw_block() with SWRITE.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>

diff -rupX /home/jack/.kerndiffexclude 
linux-2.6.12-1-forgetfix/fs/jbd/checkpoint.c 
linux-2.6.12-2-ll_rw_block-fix/fs/jbd/checkpoint.c
--- linux-2.6.12-1-forgetfix/fs/jbd/checkpoint.c2005-06-28 
13:26:18.0 +0200
+++ linux-2.6.12-2-ll_rw_block-fix/fs/jbd/checkpoint.c  2005-07-07 
07:18:47.0 +0200
@@ -204,7 +204,7 @@ __flush_batch(journal_t *journal, struct
int i;
 
spin_unlock(&journal->j_list_lock);
-   ll_rw_block(WRITE, *batch_count, bhs);
+   ll_rw_block(SWRITE, *batch_count, bhs);
spin_lock(&journal->j_list_lock);
for (i = 0; i < *batch_count; i++) {
struct buffer_head *bh = bhs[i];
diff -rupX /home/jack/.kerndiffexclude linux-2.6.12-1-forgetfix/fs/jbd/commit.c 
linux-2.6.12-2-ll_rw_block-fix/fs/jbd/commit.c
--- linux-2.6.12-1-forgetfix/fs/jbd/commit.c2005-07-06 01:22:13.0 
+0200
+++ linux-2.6.12-2-ll_rw_block-fix/fs/jbd/commit.c  2005-07-07 
07:18:20.0 +0200
@@ -358,7 +358,7 @@ write_out_data:
jbd_debug(2, "submit %d writes\n",
bufs);
spin_unlock(&journal->j_list_lock);
-   ll_rw_block(WRITE, bufs, wbuf);
+   ll_rw_block(SWRITE, bufs, wbuf);
journal_brelse_array(wbuf, bufs);
bufs = 0;
goto write_out_data;
@@ -381,7 +381,7 @@ write_out_data:
 
if (bufs) {
spin_unlock(&journal->j_list_lock);
-   ll_rw_block(WRITE, bufs, wbuf);
+   ll_rw_block(SWRITE, bufs, wbuf);
journal_brelse_array(wbuf, bufs);
spin_lock(&journal->j_list_lock);
}
diff -rupX /home/jack/.kerndiffexclude 
linux-2.6.12-1-forgetfix/fs/jbd/journal.c 
linux-2.6.12-2-ll_rw_block-fix/fs/jbd/journal.c
--- linux-2.6.12-1-forgetfix/fs/jbd/journal.c   2005-06-28 13:26:18.0 
+0200
+++ linux-2.6.12-2-ll_rw_block-fix/fs/jbd/journal.c 2005-07-07 
07:17:11.0 +0200
@@ -969,7 +969,7 @@ void journal_update_superblock(journal_t
if (wait)
sync_dirty_buffer(bh);
else
-   ll_rw_block(WRITE, 1, &bh);
+   ll_rw_block(SWRITE, 1, &bh);
 
 out:
/* If we have just flushed the log (by marking s_start==0), then
diff -rupX /home/jack/.kerndiffexclude linux-2.6.12-1-forgetfix/fs/jbd/revoke.c 
linux-2.6.12-2-ll_rw_block-fix/fs/jbd/revoke.c
--- linux-2.6.12-1-forgetfix/fs/jbd/revoke.c2005-03-03 18:58:29.0 
+0100
+++ linux-2.6.12-2-ll_rw_block-fix/fs/jbd/revoke.c  2005-07-07 
07:12:34.0 +0200
@@ -613,7 +613,7 @@ static void flush_descriptor(journal_t *
set_buffer_jwrite(bh);
BUFFER_TRACE(bh, "write");
set_buffer_dirty(bh);
-   ll_rw_block(WRITE, 1, &bh);
+   ll_rw_block(SWRITE, 1, &bh);
 }
 #endif

Re: [RFC] RCU and CONFIG_PREEMPT_RT progress, part 2

2005-07-11 Thread Paul E. McKenney

On Mon, Jul 11, 2005 at 05:16:27PM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <[EMAIL PROTECTED]> wrote:
> 
> > Hello!
> > 
> > More progress on CONFIG_PREEMPT_RT-compatible RCU.
> > 
> > o   Continued prototyping Linux-kernel implementation, still
> > in the CONFIG_PREEMPT environment.
> 
> cool! With the debugging code removed it doesnt look all that complex.  
> Do you think i can attempt to plug this into the -RT tree, or should i 
> wait some more? (One observation: if you know some branch is slowpath in 
> a common codepath then it's useful to mark the condition via unlikely().  
> That results in better code layout and is also a guidance for the casual 
> reader of the code.)

I would hold off, as it is still quite fragile.  I can make it break
quite easily, so it is not yet time to unleash it on all users of
CONFIG_PREEMPT_RT.

Good point on the unlikely(), updated the ones checking for races.
The checks in rcu_read_lock() and rcu_read_unlock() for racing
counter flips seem to be the most important, but I also added them
to the checks for concurrent synchronize_rcu() calls, see below.

Thanx, Paul

diff -urpN -X dontdiff linux-2.6.12-rc6/arch/i386/Kconfig 
linux-2.6.12-rc6-ctrRCU/arch/i386/Kconfig
--- linux-2.6.12-rc6/arch/i386/Kconfig  2005-06-17 16:34:16.0 -0700
+++ linux-2.6.12-rc6-ctrRCU/arch/i386/Kconfig   2005-06-25 11:45:40.0 
-0700
@@ -523,6 +523,34 @@ config PREEMPT
  Say Y here if you are building a kernel for a desktop, embedded
  or real-time system.  Say N if you are unsure.
 
+config PREEMPT_RCU
+   bool "Preemptible RCU read-side critical sections"
+   depends on PREEMPT
+   default y
+   help
+ This option reduces the latency of the kernel when reacting to
+ real-time or interactive events by allowing a low priority process to
+ be preempted even if it is in an RCU read-side critical section.
+ This allows applications to run more reliably even when the system is
+ under load.
+
+ Say Y here if you enjoy debugging random oopses.
+ Say N if, for whatever reason, you want your kernel to actually work.
+
+config RCU_STATS
+   bool "/proc stats for preemptible RCU read-side critical sections"
+   depends on PREEMPT_RCU
+   default y
+   help
+ This option provides /proc stats to provide debugging info.
+ real-time or interactive events by allowing a low priority process to
+ be preempted even if it is in an RCU read-side critical section.
+ This allows applications to run more reliably even when the system is
+ under load.
+
+ Say Y here if you want to see RCU stats in /proc
+ Say N if you are unsure.
+
 config PREEMPT_BKL
bool "Preempt The Big Kernel Lock"
depends on PREEMPT
diff -urpN -X dontdiff linux-2.6.12-rc6/fs/proc/proc_misc.c 
linux-2.6.12-rc6-ctrRCU/fs/proc/proc_misc.c
--- linux-2.6.12-rc6/fs/proc/proc_misc.c2005-06-17 16:35:03.0 
-0700
+++ linux-2.6.12-rc6-ctrRCU/fs/proc/proc_misc.c 2005-06-29 11:22:11.0 
-0700
@@ -549,6 +549,48 @@ void create_seq_entry(char *name, mode_t
entry->proc_fops = f;
 }
 
+#ifdef CONFIG_RCU_STATS
+int rcu_read_proc(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+   int len;
+   extern int rcu_read_proc_data(char *page);
+
+   len = rcu_read_proc_data(page);
+   return proc_calc_metrics(page, start, off, count, eof, len);
+}
+
+int rcu_read_proc_gp(char *page, char **start, off_t off,
+int count, int *eof, void *data)
+{
+   int len;
+   extern int rcu_read_proc_gp_data(char *page);
+
+   len = rcu_read_proc_gp_data(page);
+   return proc_calc_metrics(page, start, off, count, eof, len);
+}
+
+int rcu_read_proc_wqgp(char *page, char **start, off_t off,
+  int count, int *eof, void *data)
+{
+   int len;
+   extern int rcu_read_proc_wqgp_data(char *page);
+
+   len = rcu_read_proc_wqgp_data(page);
+   return proc_calc_metrics(page, start, off, count, eof, len);
+}
+
+int rcu_read_proc_ptrs(char *page, char **start, off_t off,
+  int count, int *eof, void *data)
+{
+   int len;
+   extern int rcu_read_proc_ptrs_data(char *page);
+
+   len = rcu_read_proc_ptrs_data(page);
+   return proc_calc_metrics(page, start, off, count, eof, len);
+}
+#endif /* #ifdef CONFIG_RCU_STATS */
+
 void __init proc_misc_init(void)
 {
struct proc_dir_entry *entry;
@@ -571,6 +613,12 @@ void __init proc_misc_init(void)
{"cmdline", cmdline_read_proc},
{"locks",   locks_read_proc},
{"execdomains", execdomains_read_proc},
+#ifdef CONFIG_RCU_STATS
+   {"rcustats",rcu_read_proc},
+   {"rcugp",   rcu_read_proc_gp},
+

1 2 3 4 5 >

1 - 100 of 403 matches

Mail list logo